Please use this identifier to cite or link to this item:
A Semi-Automatic Method for Correcting Errors in Chinese-English Machine Translations of Patent Abstracts
Chinese-English machine translation,patent translation,language divergence,error analysis,MT evaluation,post-editing,
|Abstract:||中進英機器翻譯文件之中的翻譯錯誤，多少反映了中英文語言在詞彙、結構等特徵上的顯著差異。也因如此，了解語言歧異（language divergence）對機器翻譯文件品質的影響，一直是十分重要的研究議題。現有的相關研究雖為數頗豐，卻少有針對中進英語言組合的討論。本研究挑選一統計式機器翻譯（statistical machine translation）系統之中進英譯文分析錯誤，並根據結果提出一半自動譯後編輯方法，可用以改善機器翻譯中常見的詞義錯誤問題。
The grammatical structures of Chinese and English are very different. Such divergences are reflected in translation errors produced by machine translation (MT) systems between the two languages. It is therefore necessary to understand the effect such language divergences has on MT outputs. As there was a noticeable absence of such research project on machine translation systems of the Chinese-English language pair, this study presents results of an analysis of errors found in machine translated patent abstracts, and designed accordingly a semi-automatic method for post-editing targeting a specific group of errors.
In the first part of the study, 115 English machine-translated abstracts of Chinese patent abstracts were selected, all of which were done by Google Translate, a statistical machine translation (SMT) system developed for general usage. After errors in the translations were identified, they were categorized based on a hierarchical classification scheme with five categories at the first level: orthographic errors, morphological errors, lexical errors, semantic errors, and syntactic errors.
The distribution of these errors yields important insights about the difficulties encountered by the SMT system during the process of translation. Firstly, tokenization of the SL texts was found problematic, as there are no delimiters between words in the Chinese language. Secondly, assigning parts of speech to words in the source sentences was also found challenging, given that Chinese has no inflections for MT systems to identify parts of speech with, and that each Chinese word may act as a syntactic component of more than one part-of-speech (POS) category in different contexts. Thirdly, due to the different orders of sentence components in Chinese and English, erroneous syntactic orders appeared at high frequency, and were pertinent to the sentence lengths of source language sentences.
In the second part of this study, a semi-automatic method for post-editing was proposed. This method was targeted at semantically incorrect terms in MT. It consisted of three steps (lexical alignment, noun phrase extraction, and term substitution) and was proved to be able to increase the Bilingual Evaluation Understudy (BLEU) score of machine-translated texts. This study provides a descriptive basis for additional research, and has implications for MT developers as well as post-editors in refining and improving the quality of Chinese-English machine translation output.
|Appears in Collections:||翻譯碩士學位學程|
Files in This Item:
|1.02 MB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.