位置權重法在公司名匹配上的應用

Ching-Kuo Li; 李清國

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	宋玉生(Yusen Sung)
dc.contributor.author	Ching-Kuo Li	en
dc.contributor.author	李清國	zh_TW
dc.date.accessioned	2021-06-15T12:54:59Z	-
dc.date.available	2019-07-26
dc.date.copyright	2016-07-26
dc.date.issued	2016
dc.date.submitted	2016-07-16
dc.identifier.citation	Bilenko, Mikhail, Raymond Mooney, William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16–23. Chakrabarti, Kaushik, Surajit Chaudhuri, Tao Cheng, Dong Xin. 2012. A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 1384–1392. Chin, Wei-Sheng, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin. 2014. Effective string processing and matching for author disambiguation. Journal of Machine Learning Research, 15(1):3037–3064. Cohen, William W., Pradeep Ravikumar and Stephen E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73–78. Damerau, Frederick J. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176. Doan, AnHai, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann, San Francisco. Jimenez, Sergio, Claudia Becerra, Alexander Gelbukh, and Fabio Gonzalez. 2009. Generalized Mongue-Elkan method for approximate text string comparison. In Computational Linguistics and Intelligent Text Processing, Mexico City, pp. 559–570. Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8):707–710. Medvedev, Timofey and Alexander Ulanov. 2011. Company names matching in the large patents dataset. HP Laboratories, Hewlett-Packard Development Company. Mitton, Roger. 1996. English Spelling and the Computer. Longman, London. Monge, Alvaro E. and Charles Elkan. 1996. The field matching problem: Algorithms and applications. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 267–270.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729	-
dc.description.abstract	本研究將針對公司名匹配的問題，我們分析了一些客戶在輸入公司名常犯的錯誤，這些錯誤會使公司名在匹配上更加困難。雖然公司名匹配的問題是一種名稱匹配的問題，但由於公司名擁有特別的特徵，使得一般名稱匹配的方法往往不是最佳的選擇。因此，根據公司名的組成結構，我們提出位置權重法來處理公司名匹配的問題。我們將位置權重法和Soft TF/IDF 法及 Monge-Elkan法在不同的資料上做比較。其結果顯示，在最大F1值及我們定義的評價方式，位置權重法的整體表現最佳。除了公司名稱之外，位置權重法也可以使用在擁有類似結構的名稱匹配問題。	zh_TW
dc.description.abstract	This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:54:59Z (GMT). No. of bitstreams: 1 ntu-105-R03323037-1.pdf: 3986497 bytes, checksum: 2f680b670d61760155596b6698e4f913 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	Contents 口試委員會審定書 ........................................................................................................ i 摘要 ............................................................................................................................... ii Abstract ......................................................................................................................... iii 1. Introduction ............................................................................................................ 1 2. Background ............................................................................................................ 8 2.1. Errors ............................................................................................................. 8 2.2. Complications ................................................................................................ 8 2.3. Similarity Score ........................................................................................... 15 2.4. Data Description .......................................................................................... 18 3. Data Preprocessing ............................................................................................... 21 4. Performance Evaluation ....................................................................................... 22 4.1. Experimental Setup ..................................................................................... 22 4.2. Performance Metrics.................................................................................... 24 4.3. Results ......................................................................................................... 26 5. Conclusion ............................................................................................................ 31 References ................................................................................................................... 32
dc.language.iso	en
dc.subject	名稱匹配	zh_TW
dc.subject	位置權重	zh_TW
dc.subject	字串比對	zh_TW
dc.subject	資料整合	zh_TW
dc.subject	名稱匹配	zh_TW
dc.subject	資料整合	zh_TW
dc.subject	字串比對	zh_TW
dc.subject	公司名	zh_TW
dc.subject	公司名	zh_TW
dc.subject	位置權重	zh_TW
dc.subject	Position weight	en
dc.subject	Data integration	en
dc.subject	String similarity	en
dc.subject	String similarity	en
dc.subject	Name-matching problem	en
dc.subject	Company name	en
dc.subject	Data integration	en
dc.subject	Position weight	en
dc.subject	Company name	en
dc.subject	Name-matching problem	en
dc.title	位置權重法在公司名匹配上的應用	zh_TW
dc.title	Position-Weighted Measures for the Company Name-Matching Problem	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.coadvisor	呂育道(Yuh-Dauh Lyuu)
dc.contributor.oralexamcommittee	張經略(Ching-Lueh Chang),戴天時(Tian-Shyr Dai)
dc.subject.keyword	公司名,字串比對,位置權重,名稱匹配,資料整合,	zh_TW
dc.subject.keyword	Company name,Name-matching problem,String similarity,Position weight,Data integration,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU201600955
dc.rights.note	有償授權
dc.date.accepted	2016-07-18
dc.contributor.author-college	社會科學院	zh_TW
dc.contributor.author-dept	經濟學研究所	zh_TW
顯示於系所單位：	經濟學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	3.89 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。