Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 社會科學院
  3. 經濟學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor宋玉生(Yusen Sung)
dc.contributor.authorChing-Kuo Lien
dc.contributor.author李清國zh_TW
dc.date.accessioned2021-06-15T12:54:59Z-
dc.date.available2019-07-26
dc.date.copyright2016-07-26
dc.date.issued2016
dc.date.submitted2016-07-16
dc.identifier.citationBilenko, Mikhail, Raymond Mooney, William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16–23.
Chakrabarti, Kaushik, Surajit Chaudhuri, Tao Cheng, Dong Xin. 2012. A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 1384–1392.
Chin, Wei-Sheng, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin. 2014. Effective string processing and matching for author disambiguation. Journal of Machine Learning Research, 15(1):3037–3064.
Cohen, William W., Pradeep Ravikumar and Stephen E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73–78.
Damerau, Frederick J. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176.
Doan, AnHai, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann, San Francisco.
Jimenez, Sergio, Claudia Becerra, Alexander Gelbukh, and Fabio Gonzalez. 2009. Generalized Mongue-Elkan method for approximate text string comparison. In Computational Linguistics and Intelligent Text Processing, Mexico City, pp. 559–570.
Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8):707–710.
Medvedev, Timofey and Alexander Ulanov. 2011. Company names matching in the large patents dataset. HP Laboratories, Hewlett-Packard Development Company.
Mitton, Roger. 1996. English Spelling and the Computer. Longman, London.
Monge, Alvaro E. and Charles Elkan. 1996. The field matching problem: Algorithms and applications. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 267–270.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729-
dc.description.abstract本研究將針對公司名匹配的問題,我們分析了一些客戶在輸入公司名常犯的錯誤,這些錯誤會使公司名在匹配上更加困難。雖然公司名匹配的問題是一種名稱匹配的問題,但由於公司名擁有特別的特徵,使得一般名稱匹配的方法往往不是最佳的選擇。因此,根據公司名的組成結構,我們提出位置權重法來處理公司名匹配的問題。我們將位置權重法和Soft TF/IDF 法及 Monge-Elkan法在不同的資料上做比較。其結果顯示,在最大F1值及我們定義的評價方式,位置權重法的整體表現最佳。除了公司名稱之外,位置權重法也可以使用在擁有類似結構的名稱匹配問題。zh_TW
dc.description.abstractThis thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name.en
dc.description.provenanceMade available in DSpace on 2021-06-15T12:54:59Z (GMT). No. of bitstreams: 1
ntu-105-R03323037-1.pdf: 3986497 bytes, checksum: 2f680b670d61760155596b6698e4f913 (MD5)
Previous issue date: 2016
en
dc.description.tableofcontentsContents
口試委員會審定書 ........................................................................................................ i
摘要 ............................................................................................................................... ii
Abstract ......................................................................................................................... iii
1. Introduction ............................................................................................................ 1
2. Background ............................................................................................................ 8
2.1. Errors ............................................................................................................. 8
2.2. Complications ................................................................................................ 8
2.3. Similarity Score ........................................................................................... 15
2.4. Data Description .......................................................................................... 18
3. Data Preprocessing ............................................................................................... 21
4. Performance Evaluation ....................................................................................... 22
4.1. Experimental Setup ..................................................................................... 22
4.2. Performance Metrics.................................................................................... 24
4.3. Results ......................................................................................................... 26
5. Conclusion ............................................................................................................ 31
References ................................................................................................................... 32
dc.language.isoen
dc.title位置權重法在公司名匹配上的應用zh_TW
dc.titlePosition-Weighted Measures for the Company Name-Matching Problemen
dc.typeThesis
dc.date.schoolyear104-2
dc.description.degree碩士
dc.contributor.coadvisor呂育道(Yuh-Dauh Lyuu)
dc.contributor.oralexamcommittee張經略(Ching-Lueh Chang),戴天時(Tian-Shyr Dai)
dc.subject.keyword公司名,字串比對,位置權重,名稱匹配,資料整合,zh_TW
dc.subject.keywordCompany name,Name-matching problem,String similarity,Position weight,Data integration,en
dc.relation.page32
dc.identifier.doi10.6342/NTU201600955
dc.rights.note有償授權
dc.date.accepted2016-07-18
dc.contributor.author-college社會科學院zh_TW
dc.contributor.author-dept經濟學研究所zh_TW
顯示於系所單位:經濟學系

文件中的檔案:
檔案 大小格式 
ntu-105-1.pdf
  目前未授權公開取用
3.89 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved