Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53743
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor雷欽隆
dc.contributor.authorLi-Chung Changen
dc.contributor.author張立中zh_TW
dc.date.accessioned2021-06-16T02:28:44Z-
dc.date.available2015-08-04
dc.date.copyright2015-08-04
dc.date.issued2015
dc.date.submitted2015-08-03
dc.identifier.citation[1] T. Herzog, F. Scheuren, and W. Winkler. Data quality and record linkage techniques.
Springer Verlag, 2007.
[2] HIPAA privacy rule. U.S. Department of Health & Human Services., December
2011.
[3] V.S. Verykios, A. Karakasidis, and V. Mitrogiannis. Privacy preserving record linkage
approaches. International Journal of Data Mining, pages 206–221, 2009.
[4] T. Churches and P. Christen. Some methods for blind folded record linkage. BMC
Medical Informatics and Decision Making, 4(1), 2004.
[5] T. Dalenius. Finding a needle in a haystack-or identifying anonymous census record.
Journal of Official Statistics, 2(3):329–336, 1986.
[6] Oded Goldreich. Foundations of cryptography, volume 2. Cambridge university
press, 2004.
[7] Y. Lindell and B. Pinkas. Secure multi party computation for privacy-preserving data
mining. Journal of Privacy and Confidentiality, 1, 2009.
[8] E. Durham, Y. Xue, M. Kantarcioglu, and B. Malin. Quantifying the correctness,
computational complexity, and security of privacy-preserving string comparators for
record linkage. Inform. Fusion, 13(4):245–259, October 2012.
[9] D. Vatsalan, P. Christen, and V. Verykios. A taxonomy of privacy preserving record
linkage techniques. Inform. Syst, 38(6):946–969, September 2013.
[10] C. Quantin. How to ensure data security of an epidemiological follow-up: Quality
assessment of an anonymous record linkage procedure. Int. J. Med. Inform, 49(1):
117–122, 1998.
[11] J. J. Berman. Zero-check: A zero-knowledge protocol for reconciling patient identities
across institutions. Arch. Pathol. Lab. Med, 128(3):344–346, 2004.
[12] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties
of random data perturbation techniques. IEEE ICDM, page 99¡V106, 2003.
[13] M. Kantarcioglu, W. Jiang, and B. Malin. A privacy-preserving framework for integrating
person-specific databases. Privacy in Statistical Databases, page 298¡V314,
2008.
[14] N. Mohammed, B. Fung, and M. Debbabi. Anonymity meets game theory: secure
data integration with malicious participants. International Journalon Very Large
Databases, 20(4):567¡V588, 2011.
[15] Karakasidis, Alexandros, and Vassilios S. Verykios. Reference table based kanonymous
private blocking. 27th Annual ACM Symposium on Applied Computing,
2012.
[16] P. Christen. A comparison of personal name matching: techniques and practical
issues. IEEE ICDM Work shop on Mining Complex Data, 2006.
[17] A. Karakasidis, V.S. Verykios, and P. Christen. Fake injection strategies for private
phonetic matching. International Work shop on Data Privacy Management, 2011.
[18] R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using
bloom filters. BMC Medical Informatics and Decision Making, 9(1), 2009.
[19] P. Lai, S. Yiu, K. Chow, and L.Hui. An efficient bloom filter based solution for
multiparty private matching. SAM, page 7, 2006.
[20] E. Durham, Y. Xue, M. Kantarcioglu, and B. Malin. Private medical record linkage
with approximate matching. AMIA Annual Symposium, page 182, 2010.
[21] A. Karakasidis and V.S. Verykios. Secure blocking + secure matching = secure record
linkage. Journal of Computing Scienceand Engineering, 5:223–235, 2011.
[22] S. M. Randall, A. M. Ferrante, J. H. Boyd, J. K. Bauer, and J. B. Semmens. Privacypreserving
record linkage on large real world datasets. Journal of biomedical informatics,
pages 205–212, 2014.
[23] M. Kuzu, M. Kantarcioglu, E. Durham, and B. Malin. A constraint satisfaction cryptanalysis
of bloom filters in private record linkage. Privacy Enhancing Technologies
Symposium, July 2011.
[24] E. Durham, C. Toth, and B. Malin. Composite bloom filters for secure record linkage.
IEEE Transactions on Knowledge and Data Engineering, 1(99), 2013.
[25] R. Schnell, T. Bachteler, and J. Reiher. A novel error-tolerant anonymous linking
code. German Record Linkage Center, Working Paper Series No. WP-GRLC-2011-
02, 2011.
[26] Adam Kirsch and Michael Mitzenmacher. Less hashing, same performance: Building
a better Bloom filter. Springer, 2006.
[27] Freely extensible biomedical record linkage (febrl) software. http://sourceforge.net/
projects/febrl/.
[28] C. E. Shannon. A mathematical theory of communication. Bell System Technical
Journal, 27(3):379–423, July 1948.
[29] A. Al-Lawati, D. Lee, and P. McDaniel. Blocking-aware private record linkage.
pages 59–68, 2005.
[30] I. Fellegi and A. Sunter. A theory for record linkage. Journal of the American
Statistical Society, 64(328):1183–1210, 1969.
[31] E. Durham. A framework for accurate, efficient private record linkage. PhD thesis,
University of Texas at Dallas, 2012.
[32] J. Roozenburg. A literature survey on bloom filters. Research Assignment, 2005.
[33] Gu Lifang and Rohan Baxter. Decision models for record linkage. Springer Berlin
Heidelberg, 2006.
[34] P. Christen. Febrl–an open source data cleaning, deduplication and record linkage
system with a graphical user interface. Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pages 1065–1068,
2008.
[35] P. Christen. Data Matching. Data-Centric Systems and Applications. Springer, 2012.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53743-
dc.description.abstract由於個人紀錄常常散布於不同的地方,進行資料合併時,我們必須找出描述同一人的紀錄,進而提升資料的完整性,並刪除重複的部分,雖然透過個人獨立的ID能夠輕易達成這件事情,但在大部分的情況下,我們無法確保兩筆資料同時擁有這個資訊,取得代之的是,我們必須透過多個屬性一起判斷,比如說姓名、性別及街道,若是這些屬性都吻合,我們就將兩筆紀錄視為描述同一人。此外,在現今資訊的時代,已經有諸多法律明文規範了個資保護的條款,尤其在醫學界,病患的隱私是相當受到重視的。因此,綜合上述的問題,「具隱私保護的資料比對」的研究議題近年來一直不斷地被研究與討論。
雖然已經有研究顯示利用「Bloom filter」的技術進行資料鏈結的效果,在正確性及效率方面都比其他方法來的好,但是透過頻率分析的攻擊,第三方卻能夠輕易地破解其編碼,故為了解決這個問題,RBF和CLK的演算法分別被發展了出來,他們各自都有其優缺點。在這篇論文中,我們將兩個方法做結合,擷取他們的長處,並針對缺點的部分進行改進,提出了一個創新的想法:WCLK,另外,我們也使用entropy來計算每個屬性的權重比率,透過給予每個屬性不同的權重,讓最後比對的正確率能夠有效的提升。最後,在此資料鏈結的方法中,使用者必須訂定一個「域值」來區分不同的相似值,由於在實際的狀況下,我們無法透過合併結果的準確度來計算此域值,因此,透過分群的概念,我們能夠準確地估算適當的域值。實驗的部分,我們利用Febel的工具產生資料來源,驗證我們的技術能夠有效地取代現有的方法,甚至達到更好的效果。
zh_TW
dc.description.abstractRecord linkage is the task of identifying records from multiple datasets that refer to the same individual. However, it is not an easy work because unique identities cannot be available most of the time so a set of attributes, such as extit{Forename, Gender} and extit{Street}, can be used in light of quasi-identifiers. In addition, as for privacy, various regulations and policies have been made to prohibit people from disclose of identifies, especially in the medical domain. Therefore, lots of methods of privacy-preserving record linkage (PPRL) have been developed to integrate datasets without revealing identifies associated with the records.
A recent evaluation has shown that a transformation based on Bloom filter is superior to other approaches, but the encoding may be compromised through frequency-based cryptanalysis. Thus, two methods, RBF and CLK, have been proposed to solve this problem. However, both of them have their own strengths and weaknesses. In this dissertation, we merge these two methods and propose an advance one which we call WCLK. Besides, entropy is used to determine field weights. By giving different weighting to each field, we can improve the accuracy of the linkage results. Finally, without being able to access linkage quality and completeness in practice, threshold determination is a big challenge. Thus, we propose a clustering-based method to find a suitable threshold which can also lead to accurate results of record linkage. Using datasets generated by Febrl, we conduct several empirical experiments to show that our work can perform better than previous ones.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T02:28:44Z (GMT). No. of bitstreams: 1
ntu-104-R02921035-1.pdf: 3073574 bytes, checksum: 7a6b1a70614ae5a6027366944398372c (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents口試委員會審定書i
致謝ii
中文摘要iii
Abstract iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
1.1 Privacy-Preserving Record Linkage . . . . . . . . . . . . . . . . . . . . 1
2 Background 3
2.1 Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Field-Level Bloom Filter Encoding . . . . . . . . . . . . . . . . . . . . . 5
2.4 Security of Field-Level Bloom Filter Encoding . . . . . . . . . . . . . . 7
2.5 Record-Level Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Cryptographic Longterm key . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Proposed Method 10
3.1 Outline of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Field Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Field Weighting Based on Entropy . . . . . . . . . . . . . . . . . 12
3.2.2 Treatment of Missing Data . . . . . . . . . . . . . . . . . . . . . 13
3.3 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 The Method of WCLK . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Treatment of Missing Data . . . . . . . . . . . . . . . . . . . . . 17
3.4 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.1 Hamming LSH . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.2 The Iteration of HLSH . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Record Pair Classification . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experimental Evaluation 24
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Experimental Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Evaluation Measure . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 An Overview of the Experiments . . . . . . . . . . . . . . . . . 27
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Results for Encoding Methods . . . . . . . . . . . . . . . . . . . 29
4.4.2 Results for Field Weighting . . . . . . . . . . . . . . . . . . . . 29
4.4.3 Results for Overall Test . . . . . . . . . . . . . . . . . . . . . . 32
4.4.4 Results for Threshold Determination . . . . . . . . . . . . . . . . 33
5 Conclusion 35
Bibliography 36
dc.language.isoen
dc.subject布隆過濾器zh_TW
dc.subject資料比對zh_TW
dc.subject隱私保護zh_TW
dc.subject權重zh_TW
dc.subjectrecord linkageen
dc.subjectbloom filteren
dc.subjectprivacy-preservingen
dc.subjectfield weightingen
dc.title利用布隆過濾器建構具隱私保護資料比對之框架zh_TW
dc.titleA Framework for Privacy-preserving Record Linkage Using Bloom Filteren
dc.typeThesis
dc.date.schoolyear103-2
dc.description.degree碩士
dc.contributor.oralexamcommittee顏嗣鈞,黃秋煌,莊仁輝
dc.subject.keyword資料比對,隱私保護,布隆過濾器,權重,zh_TW
dc.subject.keywordrecord linkage,privacy-preserving,bloom filter,field weighting,en
dc.relation.page39
dc.rights.note有償授權
dc.date.accepted2015-08-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf
  未授權公開取用
3 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved