從平行語料庫編纂漢英法律雙語詞彙

Hoi-Lam Lee; 李海琳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49106

標題:	從平行語料庫編纂漢英法律雙語詞彙 Using Chinese-English Parallel Corpora for Compiling Bilingual Legal Glossaries
作者:	Hoi-Lam Lee 李海琳
指導教授:	高照明(Zhao-Ming Gao)
關鍵字:	名詞組擷取,平行語料庫,法律用語,兩岸法律用語對比分析, noun phrase extraction,parallel corpora,legal expressions,contrastive analysis in Cross-Strait terminology,
出版年 :	2016
學位:	碩士
摘要:	建立雙語詞彙表能有助譯者掌握專門領域翻譯及維持一致性。本文希望提供有效的半自動方法擷取雙語詞組供譯者使用，並有系統的建立編纂雙語專業詞彙機制。首先，本文透過Anymalign及Pialign兩套自然語言處理軟體，從兩岸刑法平行語料庫中取得詞組對應機率較高的漢英雙語候選詞及詞組。鑑軟體的品質不盡完善，故本文著手改善擷取結果，首先處理英文部分，即利用詞性標記軟體自動標記英文語料找出名詞或名詞組。後透過語言規律，將較不符合中英文名詞組組成規律的候選詞組過濾。所得的有效臺灣法律漢英語詞組為1,852個，中國大陸3,782個，其中包含更長單位的漢英名詞組。基於前述結果篩選出漢英詞組單位對照正確的詞組：臺灣有694組，中國大陸852組。另外，本文採用美國的參考語料庫，設下關鍵性字比值LLR ≥ 3.84的門檻，從一般名詞組中篩選出術語。臺灣487個英文名詞�名詞組中，有394個適合為術語；中國大陸的517個名詞�名詞組中則只有418個合適。擷取所得的有效名詞�名詞組經進一步的處理後製作成可比語料，顯示兩岸在法律中文用語或英譯上的異同。就擷取效果而言，英文的名詞組擷取遠較中文容易，因中文的名詞性僅能在句子中才能顯出。反觀英文單字的後綴明顯，擷取名詞組時較有效率。中文在包含或不包含「之」（臺灣），或含有「的」或「之」（中國大陸）的二大類名詞組中，由於漢語詞組結構及其英語翻譯並未見一定規律，使得成功雙語完整擷取的難度大幅提高。就篩選術語而言，隨着名詞組長度單位增加，包括術語與一般字的組合亦同時增加，造成術語篩選上的困難。另兩岸的刑法內涵不盡相同。擷取單位的不完整性容易產生偽異同詞組。本文研究方法及結果，冀能啟迪其他語言的雙語名詞組擷取或套用於其他專門領域，有助推行編纂雙語術語字彙表及促進翻譯在地化。 Bilingual glossaries enable translators to maintain accurate domain-specific translation and consistency. This study aims to extract bilingual pairs semi-automatically and provides a systematic specialized term compilation for translators to follow. First, two natural language processing tools, Anymalign and Pialign, were adopted to extract Chinese-English candidate pairs that had higher translation probabilities from Taiwan’s and Mainland China’s criminal parallel corpora. As the extraction tools are not perfect, this study focuses on improving the quality of the preliminary extracted results: qualified English noun (phrases) were identified first. English words/phrases were assigned parts-of-speech labels by Stanford POS Tagger automatically. Linguistic information was helpful in identifying and removing non-noun (phrase) patterns in order to locate qualified English noun phrases (NP). 1,852 English NPs with their Chinese pairs were found to be qualified (Taiwan), while 3,782 NPs were found to be valid (Mainland China). These results include noun phrases with longer word units. Based on these candidate results, 694 bilingual pairs were identified to be correctly aligned (Taiwan), while 852 correctly aligned bilingual pairs were identified (Mainland China). An American reference corpus was utilized to mark terms out by the indication of Keyness scores. The threshold was set at the critical value ≥ 3.84 calculated by the log-likelihood ratio. 394 terms were found among 487 qualified noun (phrases) from Taiwan, while 418 terms were identified among 517 Mainland China’s result. Qualified noun (phrases) and terms were adopted to produce a comparable list of noun (phrases), showing the similarities and differences in Chinese legal expressions and English translation across the Straits. Improving the extraction of English NPs was confirmed to be much more effective than that of Chinese NPs because Chinese parts-of-speech can be made clearer only in sentences. In contrast, as suffixes of English words are more distinct, it is shown that English noun (phrase) extraction was more effective by having identified their suffixes according to the parts-of-speech. Based on the result, extraction of bilingual pairs was proven to be difficult because, first, no regular patterns could be identified in the two main Chinese NP groups: those NPs include/exclude之 zhi (in Taiwan’s NP), or的de/之zhi (Mainland China’s NP). Second, the English translation patterns corresponding to Chinese NPs were neither always predictable. Filtering terms was proven to be even more difficult. As the length of an NP unit increases, mixture of terms and common words within an NP will more likely appear. Legal connotations are not entirely the same in Taiwan’s and Mainland China’s legal systems. If the word units of certain phrases are not extracted completely, falsely similar or seemingly different bilingual pairs will be created. The methodology and findings presented in this study are recommended to be applicable to other language pairs and different domain-specific genres, which may facilitate improvements in the compilation of bilingual term glossaries and localization in translation.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49106
DOI:	10.6342/NTU201603217
全文授權:	有償授權
顯示於系所單位：	翻譯碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	10.55 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。