以半自動化譯後編輯修正專利摘要中譯英機器翻譯之錯誤

Jiuan-an Hsu; 許隽安

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51836

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	蔡毓芬(Yvonne Tsai)
dc.contributor.author	Jiuan-an Hsu	en
dc.contributor.author	許隽安	zh_TW
dc.date.accessioned	2021-06-15T13:52:28Z	-
dc.date.available	2017-12-01
dc.date.copyright	2015-12-01
dc.date.issued	2015
dc.date.submitted	2015-09-16
dc.identifier.citation	Alam, Y. S. (2013, 28-30 October). Manual Evaluation and Error Analysis of Machine Translation Output between a Distant Language Pair Focusing on Effects of Sentence Length. Paper presented at the 10th Symposium on Natural Language Processing (SNLP-2013), Phuket, Thailand (pp. 187-194) SIIT, Thammasat University. Allen, J. (2004, September 28-October 2). Case study: implementing MT for the translation of pre-sales marketing and post-sales software deployment documentation at Mycom International. Paper presented at the 6th Conference of the Association for Machine Translation in the Americas (AMTA 2004), Washington, DC, USA (pp. 1-6) Springer. Arnold, D., Balkan, L., Humphreys, R. L., Meijer, S., & Sadler, L. (1994). Machine Translation: An Introductory Guide. London: NCC Blackwell. Banerjee, S., & Lavie, A. (2005, June 29). METEOR: An Automatic Metric for MT evaluation with Improved Correlation with Human Judgments. Paper presented at the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor (pp. 65-72). Barnett, J., Mani, I., & Rich, E. (1994). Reversible Machine Translation: What to Do When the Languages Don’t Match up. Reversible Grammar in Natural Language Processing (pp. 321-364): Springer. Birch, A., Osborne, M., & Blunsom, P. (2010). Metrics for MT Evaluation: Evaluating Reordering. Machine Translation, 24(1), pp. 15-26. Callison-Burch, C., & Osborne, M. (2006). Re-evaluating the Role of BLEU in Machine Translation Research. Paper presented at the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. Chiang, D., Lopez, A., Madnani, N., Monz, C., Resnik, P., & Subotin, M. (2005). The Hiero Machine Translation System: Extensions, Evaluation, and Analysis. Paper presented at the Conference on Human Language Technology and Empirical Methods in Natural Language (pp. 779-786) Association for Computational Linguistics. Church, K. W. (1994). Unix™ for Poets [Lecture notes]. Notes of a course from the European Summer School on Language and Speech Communication, Corpus Based Methods. Retrieved October 2, 2014, from http://www.cs.upc.edu/~padro/Unixforpoets.pdf Costa-Jussà, M. R., & Farrús, M. (2014). Statistical Machine Translation Enhancements through Linguistic Levels: A Survey. ACM Computing Surveys (CSUR), 46(3), pp. 413-431. Dorr, B. J. (1994). Machine Translation Divergences: A Formal Description and Proposed Solution. Computational Linguistics, 20(4), pp. 597-633. Dorr, B. J., Voss, C., Peterson, E., & Kiker, M. (1994, November 4-6). Concept-Based Lexical Selection. Paper presented at the AAAI-94 Fall Symposium on Knowledge Representation for Natural Language Processing in Implemented Systems (pp. 21-30) AAAI. Elming, J., & Habash, N. (2009). Syntactic reordering for English-Arabic phrase-based machine translation. Paper presented at the EACL 2009 Workshop on Computational Approaches to Semitic Languages (pp. 69-77) Association for Computational Linguistics. Farrús, M., Ruiz, M., Mariño, J. B., & Rodríguez, J. A. (2010). Linguistic-based evaluation criteria to identify statistical machine translation errors. Paper presented at the 14th Annual Conference of the European Association for Machine Translation, Saint-Raphaël (pp. 167-173). Flanagan, M. (1994). Error Classification for MT Evaluation. Paper presented at the Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas (pp. 65-72). Font-Llitjós, A., Carbonell, J. G., & Lavie, A. (2005, 30-31 May). A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation. Paper presented at the 10th Annual Conference of the European Association for Machine Translation Conference (EAMT 2005): Practical Applications of Machine Translation, Budapest, Hungary (pp. 87-96) School of Computer Science at Research Showcase, Carnegie Mellon University. Gao, Z.-m. (2014). Corpus Linguistics Handout for Week 4 [Class handout] An Introduction to Corpus Linguistics. Taipei, Taiwan: Chinese-English Translation and Interpretation Program, Department of Foreign Languages and Literatures, National Taiwan University. Hovy, E. (1999). Toward Finely Differentiated Evaluation Metrics for Machine Translation. Paper presented at the EAGLES Workshop on Standards and Evaluation (pp. 127-133). Hsu, J.-a. (2014). Error Analysis of Machine Translation: A Corpus-Based Study on Chinese-to-English Patent Translation. Paper presented at the 6th Annual Master's and PhD Student Research Day, National Taiwan Normal University. Hutchins, W. J. (1986). Machine Translation: past, present, future. UK: Ellis Horwood Chichester. Hutchins, W. J. (1995). Machine Translation: A Brief History. In E. F. K. Koerner & R. E. Asher (Eds.), Concise History of the Language Sciences: From the Sumerians to the Cognitivists (pp. 431-445). Oxford: Pergamon Press. Hutchins, W. J. (1997). Evaluation of Machine Translation and Translation Tools. Iš: Survey of the State of the Art in Human Language Technology, pp. 418-419. Hutchins, W. J. (2003). ALPAC: The (In)Famous Report. In S. Nirenburg, H. Somers & Y. Wilks (Eds.), Readings in machine translation (Vol. 14, pp. 131-135). Massachusetts: Massachusetts Institute of Technology. Hutchins, W. J., & Somers, H. L. (1992). An Introduction to Machine Translation (Vol. 362): Academic Press London. Isabelle, P., & Foster, G. (2006). Machine Translation: Overview. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (Second Edition) (pp. 404-422). Oxford: Elsevier. Kameyama, M., Ochitani, R., & Peters, S. (1991). Resolving translation mismatches with information flow. Paper presented at the 29th annual meeting on Association for Computational Linguistics (pp. 193-200) Association for Computational Linguistics. Koehn, P., Och, F. J., & Marcu, D. (2003, May 27 - June 1). Statistical Phrase-Based Translation. Paper presented at the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada (pp. 48-54) Association for Computational Linguistics. Kudoh, T., & Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Paper presented at the Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-Volume 7 (pp. 142-144) Association for Computational Linguistics. Lardilleux, A., Yvon, F., & Lepage, Y. (2012, May 28-30). Hierarchical sub-sentential alignment with anymalign. Paper presented at the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012), Trento, Italy (pp. 279-286). Lee, J. (2010). A Comparative Study of Fully Automated Machine Translation with Post-editing and Human Translation. (MA), National Taiwan Normal University, Taipei, Taiwan. Levy, R., & Andrew, G. (2006, May 24-26). Tregex and Tsurgeon: tools for querying and manipulating tree data structures. Paper presented at the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (pp. 2231-2234) ELRA. Levy, R., & Manning, C. (2003, July 7-12). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003), Sapporo, Japan (pp. 439-446) Association for Computational Linguistics. Marino, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A., & Costa-Jussà, M. R. (2006). N-gram-based Machine Translation. Computational Linguistics, 32(4), pp. 527-549. Martínez, L. G. (2003). Human translation versus machine translation and full post-editing of raw machine translation output. (MA), Dublin City University, Dublin. McEnery, T., & Xiao, R. (2010). Corpus-based contrastive studies of English and Chinese (Vol. 11): Routledge. Mestre, E. M. M., Pastor, M. L. C., & de Vera, C. (2012). A Pragmatic Analysis of Errors in University Students’ Writings in English. English for Specific Purposes World, 12(35). Mey, J. L. (1993). Pragmatics: An Introduction (Vol. 10). Oxford: Blackwell. Mitamura, T., & Nyberg, E. (1995, 5-7 July). Controlled English for knowledge-based MT: Experience with the KANT system. Paper presented at the The 6th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI95), Leuven, Belgium (pp. 146-147) Centre for Computational Linguistics, Katholieke Universiteit Leuven. Murata, M., Uchimoto, K., Ma, Q., Kanamaru, T., & Isahara, H. (2005). Analysis of machine translation systems’ errors in tense, aspect, and modality. Paper presented at the 19th Pacific Asia Conference on Language, Information and Computation (pp. 155-166). Neubig, G., Watanabe, T., Mori, S., & Kawahara, T. (2012, July 8-14). Machine translation without words through substring alignment. Paper presented at the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea (pp. 165-174) Association for Computational Linguistics. Neubig, G., Watanabe, T., Sumita, E., Mori, S., & Kawahara, T. (2011, June 19-24). An unsupervised model for joint phrase alignment and extraction. Paper presented at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA (pp. 632-641) Association for Computational Linguistics. Och, F. J., & Ney, H. (2004). The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, 30(4), pp. 417-449. Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. Paper presented at the 40th annual meeting on association for computational linguistics (pp. 311-318) Association for Computational Linguistics. Popović, M., & Ney, H. (2011). Towards automatic error analysis of machine translation output. Computational Linguistics, 37(4), pp. 657-688. Quah, C. K. (2006). Translation and technology. London: Palgrave Macmillan. Renfrew, C., McMahon, A. M., & Trask, R. L. (2000). Time depth in historical linguistics: McDonald institute for archaeological research. Shankland, S. (2013). Google Translate now serves 200 million people daily. Retrieved 12 November, 2014, from http://www.cnet.com/news/google-translate-now-serves-200-million-people-daily/ Shilon, R. (2011). Transfer-based Machine Translation between morphologically-rich and resource-poor languages: The case of Hebrew and Arabic. (MA), Tel Aviv University, Tel Aviv, Israel. Retrieved from http://cs.haifa.ac.il/~shuly/publications/reshef-thesis.pdf Somers, H. (2011). Machine Translation: History, Development, and Limitations. In K. Malmkjær & K. Windle (Eds.), The Oxford Handbook of Translation Studies (pp. 427-440): Oxford University Press. Stymne, S. (2011, 19-24 June). Blast: A tool for error analysis of machine translation output. Paper presented at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations, Portland, Oregon, USA (pp. 56-61) Association for Computational Linguistics. Tan, Y., Yao, T., Chen, Q., & Zhu, J. (2005). Chinese Chunk Identification Using SVMs Plus Sigmoid Natural Language Processing–IJCNLP 2004 (pp. 527-536): Springer. Tanner, A. (2007). Google seeks world of instant translations. http://www.reuters.com/article/2007/03/28/us-google-translate-idUSN1921881520070328 Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Paper presented at the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 173-180) Association for Computational Linguistics. Trujillo, A. (1999). Translation Engines: Techniques for Machine Translation: Techniques for Machine Translations. London: Springer. Tsai, Y.-P. (2009). English-Chinese Translation for Patent Specification: An Introduction and a Case Study. (Master of Arts), Chang Jung Christian University, Tainan, Taiwan. Vilar, D., Xu, J., d’Haro, L. F., & Ney, H. (2006, 22-28 May). Error analysis of statistical machine translation output. Paper presented at the LREC-2006: 5th International Conference on Language Resources and Evaluation, Genoa, Italy (pp. 697-702) LREC. Wang, C., Collins, M., & Koehn, P. (2007). Chinese Syntactic Reordering for Statistical Machine Translation. Paper presented at the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning Prague, Czech Republic (pp. 737-745) Citeseer. White, J. S. (2003). How to evaluate machine translation. In H. Somers (Ed.), Computers and Translation: A translator's guide (Vol. 35, pp. 211-244). Amsterdam: John Benjamins Publishing Company. Wojcik, R. H. (2006). Controlled Languages. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (Second Edition) (pp. 139-142). Oxford: Elsevier.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51836	-
dc.description.abstract	中進英機器翻譯文件之中的翻譯錯誤，多少反映了中英文語言在詞彙、結構等特徵上的顯著差異。也因如此，了解語言歧異（language divergence）對機器翻譯文件品質的影響，一直是十分重要的研究議題。現有的相關研究雖為數頗豐，卻少有針對中進英語言組合的討論。本研究挑選一統計式機器翻譯（statistical machine translation）系統之中進英譯文分析錯誤，並根據結果提出一半自動譯後編輯方法，可用以改善機器翻譯中常見的詞義錯誤問題。本研究的第一部分為錯誤分析，程序為將專利摘要的中進英翻譯錯誤加以分類，並按錯誤類型分布狀況提出分析結果；所採用之分類法屬階層性分類，其中五個主要類別為拼寫（orthographic）錯誤、字詞型態（morphological）錯誤、詞彙（lexical）錯誤、語意（semantic）錯誤及語法（syntactic）錯誤。機器翻譯錯誤於各類別中的分佈，讓研究者更加了解該統計式機器翻譯系統於翻譯過程中遭遇的困難，以及設計上的不足。首先，由於中文詞與詞間無分隔符號（即空格），該系統對中文分詞（tokenization）的判斷結果多有錯誤。再者，中文字詞的詞性可因上下文不同而變化，導致該系統在辨識中文字詞的詞性時，亦有許多誤判。最後，中英句構的不同，使得機器翻譯譯文中反覆出現錯誤的語法順序。本研究的第二部分針對機器翻譯常見的專有名詞譯法錯誤，提出半自動的譯後編輯方法。本方法共包含三步驟：（一）詞彙對齊、（二）專有名詞提取及（三）以正確專有名詞替換錯誤用語。本研究以一組對應之英文機器翻譯及人工翻譯測試該方法，發現其有助於提升機器譯文之 BLEU自動評鑑分數。	zh_TW
dc.description.abstract	The grammatical structures of Chinese and English are very different. Such divergences are reflected in translation errors produced by machine translation (MT) systems between the two languages. It is therefore necessary to understand the effect such language divergences has on MT outputs. As there was a noticeable absence of such research project on machine translation systems of the Chinese-English language pair, this study presents results of an analysis of errors found in machine translated patent abstracts, and designed accordingly a semi-automatic method for post-editing targeting a specific group of errors. In the first part of the study, 115 English machine-translated abstracts of Chinese patent abstracts were selected, all of which were done by Google Translate, a statistical machine translation (SMT) system developed for general usage. After errors in the translations were identified, they were categorized based on a hierarchical classification scheme with five categories at the first level: orthographic errors, morphological errors, lexical errors, semantic errors, and syntactic errors. The distribution of these errors yields important insights about the difficulties encountered by the SMT system during the process of translation. Firstly, tokenization of the SL texts was found problematic, as there are no delimiters between words in the Chinese language. Secondly, assigning parts of speech to words in the source sentences was also found challenging, given that Chinese has no inflections for MT systems to identify parts of speech with, and that each Chinese word may act as a syntactic component of more than one part-of-speech (POS) category in different contexts. Thirdly, due to the different orders of sentence components in Chinese and English, erroneous syntactic orders appeared at high frequency, and were pertinent to the sentence lengths of source language sentences. In the second part of this study, a semi-automatic method for post-editing was proposed. This method was targeted at semantically incorrect terms in MT. It consisted of three steps (lexical alignment, noun phrase extraction, and term substitution) and was proved to be able to increase the Bilingual Evaluation Understudy (BLEU) score of machine-translated texts. This study provides a descriptive basis for additional research, and has implications for MT developers as well as post-editors in refining and improving the quality of Chinese-English machine translation output.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:52:28Z (GMT). No. of bitstreams: 1 ntu-104-R01147001-1.pdf: 1039666 bytes, checksum: 7e71648b0c8530eb4ec3e8308e9c0d2e (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	List of Tables and Figures……………………………………………… viii Chapter One: Introduction……………………………………………….. 1 1.1. History of Machine Translation….…………………………… 1 1.2. Types of Machine Translation………………...………………. 6 1.3. Applications of Machine Translation……………………… 12 1.4. Research Questions………………………………………... 14 1.4.1. Basic Terminology….…………………………………... 15 1.5. Research Overview………………………………………….. 17 Chapter Two: Literature Review…………………………….………….. 19 2.1. Types of MT Evaluation…………………………………… 19 2.1.1. Quality Measures………………………………………… 20 2.2. MT Error Typologies.………………………………………… 24 2.3. Preliminary Results from Pilot Study and Discussion……..… 26 2.3.1. Classification Scheme…………………………………… 26 2.3.2. Pilot Results……………………………………………… 32 2.4. Research Gaps…………………………………………………34 2.4.1. Language Divergences………………………………… 34 2.4.2. Characteristics of Patent Translation…………………… 37 2.5. Summary…………………………………………………… 39 Chapter Three: Methodology…………………………………………… 41 3.1. Material…………………………………………………… 42 3.2. Human Editing...…………………………………………..… 46 3.3. Data Analysis...……………………………………………… 47 3.4. Towards an Automatic Method for Post-editing….………… 49 3.4.1. Terminology Errors in MT Outputs…………………… 50 3.4.2. Procedure and Tools……………………………………. 51 3.4.3. Term Alignment………………………………………… 54 3.4.4. Noun Phrase Extraction………………………………… 56 3.4.5. Removal of Incorrect Terms…………….……………… 58 3.5. Summary…………..………………………………………… 60 Chapter Four: Results and Discussion……………………….…………. 63 4.1. Extra and Missing Word Errors……………………….……. 66 4.2. Semantic Errors……………………….…………………….. 71 4.3. Article Errors……………………….……………………….. 72 4.4. Syntactic Order Errors……………………….……………… 73 4.5. Evaluation of the Automatic Method for Post-editing……… 76 4.6. Summary……………………….……………………..…… 77 Chapter Five: Conclusions……………………………………………. 81 5.1. Review of Study…………………………...……………… 81 5.2. Limitations……………………...…………………………. 84 5.3. Recommendations for Future Research…………………… 86 References.……………………………………………………….…… 89 Appendices……………………………………………………………… 95
dc.language.iso	en
dc.subject	譯後編輯	zh_TW
dc.subject	中英機器翻譯	zh_TW
dc.subject	專利翻譯	zh_TW
dc.subject	語言歧異	zh_TW
dc.subject	錯誤分析	zh_TW
dc.subject	機器翻譯評鑑	zh_TW
dc.subject	language divergence	en
dc.subject	post-editing	en
dc.subject	MT evaluation	en
dc.subject	error analysis	en
dc.subject	Chinese-English machine translation	en
dc.subject	patent translation	en
dc.title	以半自動化譯後編輯修正專利摘要中譯英機器翻譯之錯誤	zh_TW
dc.title	A Semi-Automatic Method for Correcting Errors in Chinese-English Machine Translations of Patent Abstracts	en
dc.type	Thesis
dc.date.schoolyear	104-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	高照明(Zhao-Ming Gao),蔡佩舒(Pei-Shu Tsai)
dc.subject.keyword	中英機器翻譯,專利翻譯,語言歧異,錯誤分析,機器翻譯評鑑,譯後編輯,	zh_TW
dc.subject.keyword	Chinese-English machine translation,patent translation,language divergence,error analysis,MT evaluation,post-editing,	en
dc.relation.page	105
dc.rights.note	有償授權
dc.date.accepted	2015-09-17
dc.contributor.author-college	文學院	zh_TW
dc.contributor.author-dept	翻譯碩士學位學程	zh_TW
Appears in Collections:	翻譯碩士學位學程

Files in This Item:

File	Size	Format
ntu-104-1.pdf Restricted Access	1.02 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets