請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 宋玉生(Yusen Sung) | |
dc.contributor.author | Ching-Kuo Li | en |
dc.contributor.author | 李清國 | zh_TW |
dc.date.accessioned | 2021-06-15T12:54:59Z | - |
dc.date.available | 2019-07-26 | |
dc.date.copyright | 2016-07-26 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-07-16 | |
dc.identifier.citation | Bilenko, Mikhail, Raymond Mooney, William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16–23.
Chakrabarti, Kaushik, Surajit Chaudhuri, Tao Cheng, Dong Xin. 2012. A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 1384–1392. Chin, Wei-Sheng, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin. 2014. Effective string processing and matching for author disambiguation. Journal of Machine Learning Research, 15(1):3037–3064. Cohen, William W., Pradeep Ravikumar and Stephen E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73–78. Damerau, Frederick J. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176. Doan, AnHai, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann, San Francisco. Jimenez, Sergio, Claudia Becerra, Alexander Gelbukh, and Fabio Gonzalez. 2009. Generalized Mongue-Elkan method for approximate text string comparison. In Computational Linguistics and Intelligent Text Processing, Mexico City, pp. 559–570. Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8):707–710. Medvedev, Timofey and Alexander Ulanov. 2011. Company names matching in the large patents dataset. HP Laboratories, Hewlett-Packard Development Company. Mitton, Roger. 1996. English Spelling and the Computer. Longman, London. Monge, Alvaro E. and Charles Elkan. 1996. The field matching problem: Algorithms and applications. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 267–270. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729 | - |
dc.description.abstract | 本研究將針對公司名匹配的問題,我們分析了一些客戶在輸入公司名常犯的錯誤,這些錯誤會使公司名在匹配上更加困難。雖然公司名匹配的問題是一種名稱匹配的問題,但由於公司名擁有特別的特徵,使得一般名稱匹配的方法往往不是最佳的選擇。因此,根據公司名的組成結構,我們提出位置權重法來處理公司名匹配的問題。我們將位置權重法和Soft TF/IDF 法及 Monge-Elkan法在不同的資料上做比較。其結果顯示,在最大F1值及我們定義的評價方式,位置權重法的整體表現最佳。除了公司名稱之外,位置權重法也可以使用在擁有類似結構的名稱匹配問題。 | zh_TW |
dc.description.abstract | This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T12:54:59Z (GMT). No. of bitstreams: 1 ntu-105-R03323037-1.pdf: 3986497 bytes, checksum: 2f680b670d61760155596b6698e4f913 (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | Contents
口試委員會審定書 ........................................................................................................ i 摘要 ............................................................................................................................... ii Abstract ......................................................................................................................... iii 1. Introduction ............................................................................................................ 1 2. Background ............................................................................................................ 8 2.1. Errors ............................................................................................................. 8 2.2. Complications ................................................................................................ 8 2.3. Similarity Score ........................................................................................... 15 2.4. Data Description .......................................................................................... 18 3. Data Preprocessing ............................................................................................... 21 4. Performance Evaluation ....................................................................................... 22 4.1. Experimental Setup ..................................................................................... 22 4.2. Performance Metrics.................................................................................... 24 4.3. Results ......................................................................................................... 26 5. Conclusion ............................................................................................................ 31 References ................................................................................................................... 32 | |
dc.language.iso | en | |
dc.title | 位置權重法在公司名匹配上的應用 | zh_TW |
dc.title | Position-Weighted Measures for the Company Name-Matching Problem | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.coadvisor | 呂育道(Yuh-Dauh Lyuu) | |
dc.contributor.oralexamcommittee | 張經略(Ching-Lueh Chang),戴天時(Tian-Shyr Dai) | |
dc.subject.keyword | 公司名,字串比對,位置權重,名稱匹配,資料整合, | zh_TW |
dc.subject.keyword | Company name,Name-matching problem,String similarity,Position weight,Data integration, | en |
dc.relation.page | 32 | |
dc.identifier.doi | 10.6342/NTU201600955 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2016-07-18 | |
dc.contributor.author-college | 社會科學院 | zh_TW |
dc.contributor.author-dept | 經濟學研究所 | zh_TW |
顯示於系所單位: | 經濟學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf 目前未授權公開取用 | 3.89 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。