以電腦技術自動採用翻譯資源

Pi-Chien Yang; 楊璧謙

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69939

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	高照明
dc.contributor.author	Pi-Chien Yang	en
dc.contributor.author	楊璧謙	zh_TW
dc.date.accessioned	2021-06-17T03:35:05Z	-
dc.date.available	2023-03-02
dc.date.copyright	2018-03-02
dc.date.issued	2018
dc.date.submitted	2018-02-12
dc.identifier.citation	Academia Sinica. (2011). E-HowNet. Retrieved from http://ehownet.iis.sinica.edu.tw/index.php Anthony, L. (2014). AntConc. Tokyo, Japan: Waseda University. Retrieved from http://www.laurenceanthony.net/ Bai, M.-H., Hsieh, Y.-M., Chen, K.-J., & Chang, J. S. (2012). DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation. In 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea. Baker, M. (1992). In Other Words. London: Routledge. Baroni, M., & Bernardini, S. (2004). BootCaT: Bootstrapping corpora and terms from the web. Centro Cultural de Belem, Lisbon, Portugal: Proceedings of LREC 2004. Bowker, L. (2002). Computer-aided Translation Technology: A Practical Introduction. University of Ottawa Press. Bowker, L., & Barlow, M. (2008). A comparative evaluation of bilingual concordancers and translation memory systems. In E. Y.Rodrigo (Ed.), Topics in Language Resources for Translation and Localisation (pp. 1–22). Amsterdam ; Philadelphia: John Benjamins Publishing Company. Bowker, L., & Pearson, J. (2002). Working with specialized language : a practical guide to using corpora. London: Routledge. Chan, S.-W. (2016). The Future of Translation Technology: Towards a World Without Babel. London: Routledge. Chen, P. (2011). 利用專門可比語料庫結合機器翻譯自動提取雙語對譯N 連詞：以合約文類為例Using comparable specialized corpora with machine translation for extracting N-gram translation equivalents: A case study of Chinese and English contracts. National Taiwan Normal University, Taipei. Delpech, E. M. (2014). Comparable Corpora and Computer-assisted Translation. Hoboken, NJ: John Wiley & Sons Inc. Dong, Z. (1999). HowNet. Retrieved from http://www.keenage.com/ Fung, P., & Church, K. W. (1994). K-vec: a new approach for aligning parallel texts. In 15th conference on Computational linguistics-Volume 2. Kyoto, Japan: Association for Computational Linguistics Stroudsburg, PA, USA. Retrieved from https://dl.acm.org/citation.cfm?id=991328 Gamallo Otero, P., & Gonzalez L’opez, I. (2010). Wikipedia as Multilingual Source of Comparable Corpora. In 3rd Workshop on Building and Using Comparable Corpora (p. 21). Valletta, Malta. Gao, Z.-M. (高照明). (2002). 中英雙語近義句翻譯檢索系統An Online Chinese-English Translation Retrieval System for Near Synonymous Sentences. 翻譯學研究集刊, (7), 75–107. Genette, M. (2016). How reliable are online bilingual concordancers? An investigation of Linguee, TradooIT, WeBiText and ReversoContext and their reliability through a contrastive analysis of complex prepositions from French to English. Université Catholique de Louvain & Universitetet i Oslo. Lee, K. J. (2017). Applications of Comparable Corpora in Promotional Translations. National Taiwan University, Taipei. Linguee. (n.d.). Retrieved October29, 2017, from https://www.linguee.com/englishchinese/page/imprint.php Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press. Mikhailov, M., & Cooper, R. (2016). Corpus Linguistics for Translation and Contrastive Studies: A guide for research. London: Routledge. Olohan, M. (2004). Introducing Corpora in Translation Studies. London and New York: Routledge. Princeton University. (2010). WordNet. Retrieved January27, 2018, from http://wordnet.princeton.edu Quah, C. K. (2006). Translation and Technology. Palgrave Macmillan UK. Rus, V., Lintean, M., Banjade, R., Niraula, N., & Stefanescu, D. (2013). SEMILAR: The Semantic Similarity Toolkit. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria. Securities and Exchange Act. Republic of China. Retrieved from http://law.moj.gov.tw/LawClass/LawAll.aspx?PCode=G0400001 Somers, H. (2003). Translation memory systems. In H.Somers (Ed.), Computers and Translation A translator’s guide (pp. 31 – 47). Sowmya, V., Vardhan, B. V., & Raju, M. S. V. S. B. (2016). Influence of Token Similarity Measures for Semantic Textual Similarity. In 2016 IEEE 6th International Conference on Advanced Computing (IACC) (pp. 41–44). Bhimavaram, Andhra Pradesh, India. Retrieved from http://ieeexplore.ieee.org/document/7544807/authors?reload=true Termsoup. (n.d.). Retrieved January4, 2018, from https://termsoup.com/# Wołk, K., & Marasek, K. (2014). Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs. Procedia Technology, 18, 126–132. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., …Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. http://doi.org/arXiv:1609.08144 Yeh, J.-F., Wu, C.-H., Chen, M.-J., & Yu, L.-C. (2005). Automated Alignment and Extraction of a Bilingual Ontology for Cross-Language Domain-SpecificApplications. Computational Linguistics and Chinese Language Processing, 10(1),33–52. Zanettin, F. (2012). Translation-Driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. Manchester, UK: Manchester ; Kinderhook, NY : St. Jerome.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69939	-
dc.description.abstract	電腦輔助翻譯工具現今廣泛運用於翻譯實務，其中翻譯記憶系統可儲存既有之原文與譯文為平行語料庫形式，再以相似度對應擷取相似內容，由此利用先前翻譯之重複內容，遂廣運用於重複性高之技術性文件。然翻譯記憶資料來源有限，須仰賴既有平行文本，為增加資料來源，本文承前人研究，討論以可比語料庫建立翻譯記憶資料之可行性，並探討其他資料來源。陳碧珠(2011)、李佳陵(2017)等先前研究指出，可比語料庫於翻譯工具中之使用似乎利用率不高，與所建立之資料庫大小不符，因此本文亦從翻譯記憶之相似度對應方法著手，探討如何提高翻譯記憶系統擷取相似內容之效率，以此提出改善翻譯工具之建議。關鍵詞：電腦輔助翻譯工具，翻譯記憶，可比語料庫，相似度比對，機器翻譯	zh_TW
dc.description.abstract	Computer assisted tools (CAT) are widely-used in today’s translation work, and translation memory (TM) systems help translators deal with repetitive expressions or contents in similar contexts for maintaining consistency or saving time and efforts. The data in translation memory systems are stored as parallel corpora, but when there is no previously aligned language pairs for professional or technical texts of a specific field, the TM is empty and translators need to start from scratch. Chen (Chen, 2011) suggests that comparable corpora of naturally produced texts in the working languages may help. With the help of machine translation, comparable corpora can be turned into parallel corpora for use in TM systems. However, current TM systems are limited in matching similar contents. The study suggests that term weighting techniques used in information retrieval(Gao, 2002) be adopted in order to find out truly important and relevant contents while matching. In the proposed method, content words that carry important information are given more weight while functional words or expressions that are less specific to a text are deemed as less important. The study tests the use of comparable corpora and the adoption of term weighting by comparing texts of a prospectus and its translation. Keywords: computer assisted translation tool (CAT), translation memory (TM), comparable corpora, similarity measure, machine translation	en
dc.description.provenance	Made available in DSpace on 2021-06-17T03:35:05Z (GMT). No. of bitstreams: 1 ntu-107-R04147006-1.pdf: 1848201 bytes, checksum: be3c52f7d500d8e51d743912f0ae493c (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	Table of Contents List of Tables vi List of Figures vii 1. Introduction 1 1.1 Research Background 1 1.2 Research Question and Significance of Research 3 1.3 Outline of Research 4 2. Literature Review 5 2.1 Translation Technology 5 2.1.1 Corpora and Translation 5 2.1.2 Computer-Assisted Translation 8 2.1.3 Machine Translation 10 2.1.4 Web as a Corpus 12 2.2 Information Retrieval and Similarity Measures 17 2.2.1 Similarity Measures 17 2.2.2 Statistical Methods and Term Weighting 19 2.2.3 Partial Matching 20 2.3 Application of Comparable Corpora for Translation Use 25 3. Methodology 30 3.1. Test 1 30 3.1.1 Text Selection 30 3.1.2 Alignment and Machine Translation 32 3.1.3 Translation Memory Creation and Project Settings 34 3.1.4 Semantic Similarity 36 3.2 Test 2 39 3.2.1 Keyword Analysis and Comparable Corpora Construction 41 3.2.2 Text Similarity and Semantic Similarity 45 3.3 Test 3 46 3.3.1 N-gram Extraction 46 3.3.2 OBC Search 49 4. Results and Discussions 50 4.1 Test 1 results 50 4.1.1 Chinese to English Translation Task 50 4.1.2 English to Chinese Translation Task 53 4.2 Test 2 results 56 4.3 Comparison with the Machine Translations 57 4.4 Review of the Pre-translations 61 4.4.1 Inconsistencies in Segmentation 62 4.4.2 Discrepancies between the Machine Translations and the Original Texts 64 4.5 Semantic Similarity 65 4.6 Web Translation Memory: Test 3 results 71 4.6.1 Translations Suggested by Linguee 71 4.6.2 Discussions on Pattern Search 74 5. Conclusion 77 5.1 Summary and Discussion 77 5.2 Limitations of the Study 79 References 83 Appendix 87 Appendix A. Comparison with Different Similarity Method Options in SEMILAR 87 Appendix B. The N-gram List in Test 3 94
dc.language.iso	en
dc.subject	翻譯記憶	zh_TW
dc.subject	機器翻譯	zh_TW
dc.subject	相似度比對	zh_TW
dc.subject	可比語料庫	zh_TW
dc.subject	電腦輔助翻譯工具	zh_TW
dc.subject	comparable corpora	en
dc.subject	machine translation	en
dc.subject	translation memory (TM)	en
dc.subject	similarity measure	en
dc.subject	computer assisted translation tool (CAT)	en
dc.title	以電腦技術自動採用翻譯資源	zh_TW
dc.title	Automatically Leveraging Translation Resources Using Computational Techniques	en
dc.type	Thesis
dc.date.schoolyear	106-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	謝舒凱,劉昭麟,蔡毓芬
dc.subject.keyword	電腦輔助翻譯工具,翻譯記憶,可比語料庫,相似度比對,機器翻譯,	zh_TW
dc.subject.keyword	computer assisted translation tool (CAT),translation memory (TM),comparable corpora,similarity measure,machine translation,	en
dc.relation.page	104
dc.identifier.doi	10.6342/NTU201800510
dc.rights.note	有償授權
dc.date.accepted	2018-02-13
dc.contributor.author-college	文學院	zh_TW
dc.contributor.author-dept	翻譯碩士學位學程	zh_TW
顯示於系所單位：	翻譯碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	1.8 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。