Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 文學院
  3. 翻譯碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64838
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor高照明
dc.contributor.author"Jonathan Siu Wai Lee, Jr."en
dc.contributor.author李小慧zh_TW
dc.date.accessioned2021-06-16T23:01:12Z-
dc.date.available2020-03-03
dc.date.copyright2020-03-03
dc.date.issued2020
dc.date.submitted2020-02-25
dc.identifier.citationAnthony, L. (2017). AntPConc (Version 1.2.1) [Computer Software]. Tokyo, Japan: Waseda University. Available from https://www.laurenceanthony.net/software
Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In Mona Baker, Gill Francis & Elena Tognini-Bonelli (Eds.) Text and Technology: In Honour of John Sinclair (pp. 233–50). Amsterdam: John Benjamins B.V.
Bishop, C. (2006). Pattern Recognition and Machine Learning. New York: Springer Publishing.
Chao, Y.R. (1968). A Grammar of Spoken Chinese. Berkeley: University of California Press.
Cook, A. (2014). A Linguistic Analysis of Selected Morpho-syntactic Features of Spoken Mandarin (Doctoral dissertation). Griffith University.
Deng, X. J. (2010). The Acquisition of the RVC in Mandarin Chinese (Master’s Dissertation). Retrieved from http://www.cuhk.edu.hk/lin/new/people/students/dengxiangjun/doc/DengXiangjun2010_thesis.pdf
EMT Expert Group (2009). Competences for professional translators, experts in multilingual and multimedia communication. Retrieved from http://ec.europa.eu/dgs/translation/programmes/emt/key_documents/emt_competences_translators_en.pdf
Gao, Z. M. (2011) Exploring the effects and use of a Chinese-English Bilingual Concordancer. Computer-Assisted Language Learning, 24 (3), 255-275.
Grover, K. (2014). V1-le vs. RVC-le in expressing resultant state in learners’ Mandarin interlanguage: evidence of two states of mind? LSA Annual Meeting Extended Abstracts.
House, J. (2014). Translation Quality Assessment: Past and Present. In Juliane House (Eds.), Translation: A Multidisciplinary Approach (pp. 241-264 of ch. 13). London: Palgrave Macmillan.
Jiang, R. (2004). 狼圖騰 [Wolf Totem]. Wuchang: Changjiang Arts Publishing House.
Jiang, R. (2008). Wolf Totem (H. Goldblatt Trans.). New York: Penguin Press. (Original work published in 2004).
Kawaguchi, Y., Takagaki, T. Tomimori, N, & Tsuruga, Yoichiro (Eds.) (2007). Corpus-based Perspectives in Linguistics. John Benjamins Publishing.
Ke, Z. (1995). The Syntax of the Chinese BA-Constructions and Verb Compounds: a Morpho-Syntactic Analysis (Doctoral dissertation). University of Southern California, Los Angeles. Retrieved from http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll17/id/481164
Kudo, T. (2003). CRF++ [Computer Program]. Retrieved from https://taku910.github.io/crfpp/#tips
Lafferty, J., McCallum, A, and Pereira, F. C. N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning 2001 (pp. 282-289).
Lardilleux, A. and Lepage, Y. (2009). Sampling-based multilingual alignment. In International Conference on Recent Advances in Natural Language Processing RANLP-2009 (pp. 214-218).
Li, C. N. and Thompson, S. A. (1981). Mandarin Chinese: A Functional Reference Grammar. Los Angeles: University of California Press.
Li, W. S. (2008). The First Language Influence on the Second Language Acquisition of Mandarin Resultative Verb Compounds (Master’s Dissertation). Retrieved from National Digital Library of Theses and Dissertations in Taiwan.
Lü, S. X. (1955). The Papers on the Chinese Grammar. Beijing: Science Press.
Manning, C. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.
McEnery, T. (2003). Corpus linguistics. In Ruslan Mitkov (Ed.) Oxford Handbook of Computational Linguistics (pp. 448-463). Oxford: Oxford University Press.
Nida, E. A. (2001). Dynamic Equivalence in Translating. In Chan Sin-Wai & David E. Pollard (Eds.) An Encyclopaedia of Translation (pp. 223-230). Hong Kong: The Chinese University of Hong Kong.
OpenCC [Computer software]. (2013). Retrieved from https://github.com/BYVoid/OpenCC
Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. In Journal of Machine Learning Technologies, 2 (1), 37–63.
Pym, A. (2013). Translation Skill-Sets in a Machine-Translation Age. In Translators' Journal, 58 (3), 487-503.
Riesa, J., Irvine, A. and Marcu, D. (2011). Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation. In Proceedings of EMNLP, pp. 497-507.
Rojo, A. (2013). Review of Michael P. Oakes and Meng Ji (Eds.) Quantitative Methods in Corpus-Based Translation Studies: A Practical Guide to Descriptive Translation Research. In Journal of Research Design and Statistics in Linguistics and Communication Science 1 (1).
Roten, T. (2009). PyNLPIR Documentation.
Roturier, J. (2015). Localizing Apps: A Practical Guide for Translators and Translation Students. New York: Routledge.
Sun, C. F. (2013). Chinese Resultative Verb Compounds: Lexicalization and Grammaticalization. In Breaking Down the Barriers, pp. 625-649.
Tai, H.Y. (2003). On the Equivalent of ‘kill’ in Mandarin Chinese. In Journal of the Chinese Language Teachers Association, 10 (2), 48-52.
Tai, H.Y. (1975).
Tang, T. C. (1989). 漢語詞法與兒童語言習得I:漢語動詞 [Chinese
morphology and child language acquisition, I: Verbs in Chinese]. In Studies in Chinese Morphology and Syntax, 43-92. Taipei: Student Books.
Thompson, S. (1973). Resultative Verb Compounds in Mandarin Chinese: A Case for Lexical Rules. In Language, 49 (2), 361-379. Linguistic Society of America.
Wang, L. (1954). Chinese Modern Grammar. Beijing: Zhonghua Press.
Yong, S. (1997). The grammatical functions of verb complements in Mandarin
Chinese. In Linguistics 35(1), 1-24.
Zhang, K. (2018). Natural Language Processing and Information Retrieval System Platform [Computer Program]. Retrieved from http://ictclas.nlpir.org/index_e.html
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64838-
dc.description.abstract中譯英受到中英文句子結構和語法的巨大差異而變得複雜,其中一個難點是由動詞和結果補語所形成的複合動詞,又稱為動補式複合動詞(RVC)。RVC大部分是兩個字的組合,其中第一個字表示某種動作或方式的動詞,第二個字則表示結果、方向、或程度(例如,吵醒, 跌下,讀熟等)。關於中文動補式複合動詞(RVC)的形成歷史或在現代漢語語法中的功能,過去已有大量的研究,但是在語料庫語言學或翻譯研究領域中,很少有相關的研究。
本研究著重於RVC的辨識和翻譯。我們選擇中國大陸作家姜戎所著的小說《狼圖騰》和漢學家Howard Goldblatt所翻譯的英文本Wolf Totem作為語料。中文原著和英文翻譯以人工方式進行段落對齊,成為一個平行語料庫,並以原著的前18章進行辨識RVC的實驗,我們採用了半監督機器學習方法,以及CRF++套件。我們首先以人工方式擷取原著第一章中所有的動補式複合動詞(RVC)標記作為CRF++ 套件中的種子,然後將前18章中將某些關鍵特徵(包括詞性,單詞中的字符位置)附加到每個字符上,以創建訓練文件。我們發現相對NLPIR詞性標記系統中的“主類”,NLPIR的“次類”標記有較高的正確率。在辨識RVC後,我們創建了一個程式界面,並利用多語詞對應程式Anymalign自動找到這些RVC的英文翻譯,雖然由於語料屬於文學性質,該程式無法找到許多的RVC,但是程式可以讓翻譯研究者和譯者從已經段落對齊的中英平行語料找到動補式複合動詞(RVC)在不同語境下的各種不同的翻譯。
zh_TW
dc.description.abstractDrastic differences in sentence structure and grammar complicate Chinese to English translation, with one particularly inconspicuous grammatical feature of Mandarin Chinese significantly hindering an accurate English translation: The Resultative Verb Compound or RVC. An RVC is a combination of characters (often in pairs, but not always) in which the first character constitutes some action or manner verb, and the second some result, direction, or extent (e.g., 打斷, 坐下, 讀熟, etc.). Vast amount of research on RVCs with respect to the history of its formation or its function in modern Chinese grammar has been conducted, but little to no serious research has been carried out on RVCs in the field of corpus linguistics or translation studies.
This study is thus focused on the identification and subsequent translation of RVCs based on the Chinese novel《狼圖騰》by Jiang Rong and its English translation Wolf Totem by Howard Goldblatt, which was manually aligned by us at the paragraph level to form a parallel corpus. To identify RVCs within the first 18 chapters of a novel, we adopted a method of semi-supervised machine learning with the use of the CRF++ toolkit. By first manually tagging RVCs in the first chapter of the text to act as seeds and then affixing certain key features – NLPIR and NAER part-of-speech tags, the corresponding B, I, and E tags for character positioning in the beginning, middle, and endings of a word, and the RVC seeds – to each character in the first 18 chapters to create a training file, we were able to generate thousands of predicted RVCs in two separate experiments (Experiment 1 used NLPIR “parent” tags, and Experiment 2 used NLPIR “child” tags). We discovered the NLPIR “child” tags produced more accurate results when compared to the NLPIR “parent” tags.
Upon identifying the RVCs, we created an interface to find the English translations of the RVCs using the program Anymalign, which is a multilingual word aligner. Though the program was unable to find many of the RVCs due to their low frequency and the literary nature of the text provided, the interface program allows translation researchers and working translators to manually identify translation equivalents of Mandarin Chinese RVCs and study the different translations based on the previously-aligned parallel corpus.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T23:01:12Z (GMT). No. of bitstreams: 1
ntu-109-R06147014-1.pdf: 1888150 bytes, checksum: 430b02f840f2e3fbab019521f3a718a5 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontentsAcknowledgments ii
Abstract iii
Table of Contents vi
1 Introduction - 1 -
1.1 Motivation and Research Gaps - 1 -
1.2 Overview of Research Methods - 2 -
2 Literature Review - 6 -
2.1 Defining an RVC - 6 -
2.1.1 Result-state RVCs - 8 -
2.1.2 Directional RVCs - 9 -
2.1.3 Extent RVCs - 11 -
2.1.4 Characteristics of an RVC - 13 -
2.1.5 Translation of RVCs - 18 -
2.2 Corpus-based Translation Studies - 20 -
2.2.1 Using AntPConc to Construct Parallel Corpus - 22 -
2.2.2 Parallel Corpora: Comparing Linguee, NAER, and Gao - 25 -
2.2.3 Machine Learning and CRFs - 28 -
2.2.4 The Data Source - 29 -
3 Methodology - 30 -
3.1 Stage 1: RVC Identification - 30 -
3.1.1 Stage 1, Step 1: Data Collection and POS - 32 -
3.1.2 Stage 1, Step 2: Build Training File - 34 -
3.1.2.1 Feature Set: Individual Characters - 35 -
3.1.2.2 Feature Set: POS Tagging - 36 -
3.1.2.3 Feature Set: “BIE” Character Positioning in a Word - 39 -
3.1.2.4 Feature Set: “Seeds” - 41 -
3.1.3 Stage 1, Step 3: Run CRF++ Program - 42 -
3.1.4 Stage 1, Step 4: Output and Evaluation - 45 -
3.2 Stage 2: RVC Translation - 50 -
3.2.1 Stage 2, Step 1: Build Parallel Corpus - 51 -
3.2.2 Stage 2, Step 2: Find Translations - 53 -
4 Results and Discussion - 54 -
4.1 Word Lists - 54 -
4.2 Predicted RVC Label Count (Characters) - 55 -
4.3 Total Predicted RVCs (Words) - 56 -
4.4 Accuracy, Precision, Recall, and F-measure - 58 -
4.5 Evaluation - 62 -
4.6 Translation Interface Evaluation - 63 -
4.6.1 Examples from the Novel - 66 -
5 Conclusion - 72 -
5.1 Findings - 72 -
5.2 Future Research - 73 -
5.3 Contributions - 76 -
6 References - 77 -
Appendix - 81 -
dc.language.isoen
dc.title半自動擷取中文動補式複合動詞及其英文翻譯zh_TW
dc.titleSemi-Automatic Identification of Chinese Resultative Verb Compounds and Their English Translation Equivalentsen
dc.typeThesis
dc.date.schoolyear108-1
dc.description.degree碩士
dc.contributor.oralexamcommittee謝舒凱,白明弘,吳鑑城
dc.subject.keyword動補式複合動詞,語料庫翻譯研究,半監督機器學習,條件隨機域,機器翻譯,zh_TW
dc.subject.keywordRVC,resultative verb compound,corpus-based translation studies,semi-supervised machine learning,conditional random fields,machine translation,en
dc.relation.page85
dc.identifier.doi10.6342/NTU202000527
dc.rights.note有償授權
dc.date.accepted2020-02-25
dc.contributor.author-college文學院zh_TW
dc.contributor.author-dept翻譯碩士學位學程zh_TW
顯示於系所單位:翻譯碩士學位學程

文件中的檔案:
檔案 大小格式 
ntu-109-1.pdf
  目前未授權公開取用
1.84 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved