基於網路語料之專有名詞翻譯方法於中日韓跨語言資訊檢索之應用

Yu-Chun Wang; 王昱鈞

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40955

Title:	基於網路語料之專有名詞翻譯方法於中日韓跨語言資訊檢索之應用 Web-based Named Entity Translation Method for Korean-Chinese and Japanese-Chinese Cross-language Information Retrieval
Authors:	Yu-Chun Wang 王昱鈞
Advisor:	顏嗣鈞(Hsu-Chun Yen)
Co-Advisor:	許聞廉(Wen-Lian Hsu)
Keyword:	專有名詞翻譯,模板,中日韓,跨語言資訊檢索, Named Entity Translation,pattern,Korean-Chinese,Japanese-Chinese,Cross-language Information Retrieval,
Publication Year :	2008
Degree:	碩士
Abstract:	專有名詞翻譯在許多自然語言處理的研究上，例如資訊檢索與機器翻譯等，扮演了重要的角色。於本篇論文中，我們主要著重在將韓文及日文的專有名詞翻譯成中文，用以增進韓–中及日–中跨語言資訊檢索的效能。中文所使用的漢字為一種形意文字，一個音節可以對應到數個不同的漢字，這造成了專有名詞翻譯上的困難。我們提出一種混合的專有名詞翻譯方法，首先整合數個線上的語料庫來擴增雙語辭典的涵蓋率。我們以維基百科的中英日韓版本的跨語言連結為基礎作為一個翻譯的工具。此外，亦使用了 Naver.com 所提供的人物檢索引擎用以查詢人名的中文或英文翻譯。第二種方法為翻譯模板方法，我們的系統能夠自動從網路的語料庫中學習出韓–中、韓–英、日–中、日–英、及英–中的翻譯模板。而後這些模板便可以用以自 Google 搜尋引擎所回傳的網頁文字片段中抓取出相應的中文翻譯。根據實驗結果，在跨語言資訊檢索系統中加入我們的專有名詞翻譯方法後，在平均準確率 (Mean Average Precision, MAP) 上較單用雙語辭典的方法高出了五倍。平均準確率達到 0.3385，而召回率 (Recall) 亦達到 0.7578。我們的方法可以處理中日韓及非中日韓的專有名詞的翻譯，並可有效提升跨語言資訊檢索系統的效能。 Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean/Japanese to Chinese in order to improve Korean-Chinese and Japanese-Chinese cross-language information retrieval. The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean/Japanese edition and the Chinese or English editions. We also use Naver.com’s people search engine to ﬁnd a query name’s Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We also have the Japanese-Chinese (J-C), Japanese-English (J-E) translation patterns for translating Japanese NEs. We found CLIR performance using this hybrid conﬁguration over ﬁve times better than that a dictionary-based conﬁguration using only the bilingual dictionary. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of CLIR substantially.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40955
Fulltext Rights:	有償授權
Appears in Collections:	電機工程學系

Files in This Item:

File	Size	Format
ntu-97-1.pdf Restricted Access	1.04 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets