Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729
Title: | 位置權重法在公司名匹配上的應用 Position-Weighted Measures for the Company Name-Matching Problem |
Authors: | Ching-Kuo Li 李清國 |
Advisor: | 宋玉生(Yusen Sung) |
Co-Advisor: | 呂育道(Yuh-Dauh Lyuu) |
Keyword: | 公司名,字串比對,位置權重,名稱匹配,資料整合, Company name,Name-matching problem,String similarity,Position weight,Data integration, |
Publication Year : | 2016 |
Degree: | 碩士 |
Abstract: | 本研究將針對公司名匹配的問題,我們分析了一些客戶在輸入公司名常犯的錯誤,這些錯誤會使公司名在匹配上更加困難。雖然公司名匹配的問題是一種名稱匹配的問題,但由於公司名擁有特別的特徵,使得一般名稱匹配的方法往往不是最佳的選擇。因此,根據公司名的組成結構,我們提出位置權重法來處理公司名匹配的問題。我們將位置權重法和Soft TF/IDF 法及 Monge-Elkan法在不同的資料上做比較。其結果顯示,在最大F1值及我們定義的評價方式,位置權重法的整體表現最佳。除了公司名稱之外,位置權重法也可以使用在擁有類似結構的名稱匹配問題。 This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50729 |
DOI: | 10.6342/NTU201600955 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 經濟學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-105-1.pdf Restricted Access | 3.89 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.