Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40955
Title: 基於網路語料之專有名詞翻譯方法於中日韓跨語言資訊檢索之應用
Web-based Named Entity Translation Method for Korean-Chinese and Japanese-Chinese Cross-language Information Retrieval
Authors: Yu-Chun Wang
王昱鈞
Advisor: 顏嗣鈞(Hsu-Chun Yen)
Co-Advisor: 許聞廉(Wen-Lian Hsu)
Keyword: 專有名詞翻譯,模板,中日韓,跨語言資訊檢索,
Named Entity Translation,pattern,Korean-Chinese,Japanese-Chinese,Cross-language Information Retrieval,
Publication Year : 2008
Degree: 碩士
Abstract: 專有名詞翻譯在許多自然語言處理的研究上,例如資訊檢索與機器翻譯等,扮演了
重要的角色。於本篇論文中,我們主要著重在將韓文及日文的專有名詞翻譯成中文,用以增進韓–中及日–中跨語言資訊檢索的效能。中文所使用的漢字為一種形意文字,一個音節可以對應到數個不同的漢字,這造成了專有名詞翻譯上的困難。我們提出一種混合的專有名詞翻譯方法,首先整合數個線上的語料庫來擴增雙語辭典的涵蓋率。我們以維基百科的中英日韓版本的跨語言連結為基礎作為一個翻譯的工具。此外,亦使用了 Naver.com 所提供的人物檢索引擎用以查詢人名的中文或英文翻譯。 第二種方法為翻譯模板方法,我們的系統能夠自動從網路的語料庫中學習出韓–中、韓–英、日–中、日–英、及英–中的翻譯模板。而後這些模板便可以用以自 Google 搜尋引擎所回傳的網頁文字片段中抓取出相應的中文翻譯。根據實驗結果,在跨語言資訊檢索系統中加入我們的專有名詞翻譯方法後,在平均準確率 (Mean Average Precision, MAP) 上較單用雙語辭典的方法高出了五倍。平均準確率達到 0.3385,而召回率 (Recall) 亦達到 0.7578。我們的方法可以處理中日韓及非中日韓的專有名詞的翻譯,並可有效提升跨語言資訊檢索系統的效能。
Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from
Korean/Japanese to Chinese in order to improve Korean-Chinese and Japanese-Chinese cross-language information retrieval. The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean/Japanese edition and the Chinese or English editions. We also use Naver.com’s people search engine to find a query name’s Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We also have the Japanese-Chinese (J-C), Japanese-English (J-E) translation patterns for translating Japanese NEs. We found CLIR performance using this hybrid configuration over five times better than that a dictionary-based configuration using only the bilingual dictionary. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of CLIR substantially.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40955
Fulltext Rights: 有償授權
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-97-1.pdf
  Restricted Access
1.04 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved