中間語言特徵及語句之偵測與擷取

Ming-Han Yang; 楊明翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22906

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德
dc.contributor.author	Ming-Han Yang	en
dc.contributor.author	楊明翰	zh_TW
dc.date.accessioned	2021-06-08T04:32:59Z	-
dc.date.copyright	2009-08-21
dc.date.issued	2009
dc.date.submitted	2009-08-20
dc.identifier.citation	Bibliography [ 1 ] Brockett, C., W.B. Dolan, and M. Gamon, Correcting ESL errors using phrasal SMT techniques, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL. 2006, Association for Computational Linguistics: Sydney, Australia. [ 2 ] Corder, S. P. Error Analysis and Interlanguage. (1981)Oxford: Oxford University Press. [ 3 ] Burstein, J. and M. Wolska, Toward evaluation of writing style: finding overly repetitive word use in stud ent essays, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1. 2003, Association for Computational Linguistics: Budapest, Hungary. [ 4 ] Lado,R. Linguistics Across Cultures, Ann Arbor: (1961)University of Michigan Press. [ 5 ] Dulay, H., M. Burt, and S. Krashen, Language Two. . 1982: Oxford. Oxford University Press. [ 6 ] Moshe Koppel, Jonathan Schler, Kfir Zigdon 2005. Determining an Author's Native Language by Mining a Text for Errors. KDD’05, [ 7 ] Koppel, M., J. Schler, and K. Zigdon, Determining an author's native language by mining a text for errors, in Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 2005, ACM: Chicago, Illinois, USA. [ 8 ] Kukich, K., Technique for automatically correcting words in text. ACM Comput. Surv., 1992. 24(4): p. 377-439. [ 9 ] Lapata, M. and F. Keller, Web-based models for natural language processing. ACM Trans. Speech Lang. Process., 2005. 2(1): p. 3. [ 10 ] Liu, T., et al., PENS: a machine-aided english writing system for Chinese users, in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 2000, Association for Computational Linguistics: Hong Kong. [ 11 ] Lonsdale, D. and D. Strong-Krause, Automated rating of ESL essays, in Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2. 2003, Association for Computational Linguistics. [ 12 ] Minnen, G., F. Bond, and A. Copestake, Memory-based learning for article generation, in Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7. 2000, Association for Computational Linguistics: Lisbon, Portugal. [ 13 ] Nagata, R., et al., A feedback-augmented method for detecting errors in the writing of learners of English, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL. 2006, Association for Computational Linguistics: Sydney, Australia. [ 14 ] Nemser, W., Approximate systems of foreign language learners. IRAL 9/2 1971: p. 115-123. [ 15 ] Richards, J.C., Error Analysis: Perspectives on Second Language Acquisition. 1974: Longman Press. 34-36. [ 16 ] Schneider, D. and K.F. McCoy, Recognizing syntactic errors in the writing of second language learners, in Proceedings of the 17th international conference on Computational linguistics - Volume 2. 1998, Association for Computational Linguistics: Montreal, Quebec, Canada. [ 17 ] Selinker, L., Interlanguage. IRAL 10/3 1972: p. 209-231. [ 18 ] Selinker, L., Rediscovering Interlanguage. 1992, London: Longman. [ 19 ] Sun, G., et al., Mining sequential patterns and tree patterns to detect erroneous sentences., in AAAI07. 2007. [ 20 ] Sun, G., et al., Detecting Erroneous Sentences using Automatically Mined Sequential Patterns, in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007, Association for Computational Linguistics: Prague, Czech Republic. p. 81-88. [ 21 ] Tomokiyo, L.M. and R. Jones. You're not from round here, are you? Naive Bayes Detection of Non-Native Utterance Text. in NAACL. 2001. [ 22 ] Tschichold, C., et al., Developing a new grammar checker for English as a second language. ACL Anthology, 2000 [ 23 ] W . Menzel, et al. 'Automatic detection and correction of non-native English pronunciations'. in Proc. of InSTIL, Scotland. 2000. [ 24 ] Yarowsky, D., Unsupervised word sense disambiguation rivaling supervised methods, in Proceedings of the 33rd annual meeting on Association for Computational Linguistics. 1995, Association for Computational Linguistics: Cambridge, Massachusetts. [25] Moshe Koppel Shlomo, Shlomo Argamon and Anat Rachel Shimoni. Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing, 2003. [26] Damerau, F. J. (1975) The use of function word frequencies as indicators of style. Comput. Human., 9,. 271-280. [27] Yule, G. U. (1938). On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship. [28] Baayen,H., van Halteren, H., and Tweedie, F. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution .Literary and Linguistic Computing, 11:121-32 [30] George E. Heidorn. 2000. Intelligent Writing Assistance.Handbook of Natural Language Processing. Robert Dale,Hermann Moisi and Harold Somers (ed.). Marcel Dekker. [31] Lisa N. Michaud, Kathleen F. McCoy, and Christopher A. Pennington. 2000. An intelligent tutoring system for deaf learners of written english. In Proc. 4th International ACM Conference on Assistive Technologies. [32] Emily M. Bender, Dan Flickinger, Stephan Oepen, Annemarie Walsh, and Timothy Baldwin. 2004. Arboretum: Using a precision grammar for grammmar checking in call. [33]In Proc. InSTIL/ICALL Symposium on Computer Assisted Learning.Chodorow and Leacock, 2000; [34] Emi Izumi, Kiyotaka Uchimoto, Toyomi Saiga, Thepchai Supnithi,and Hitoshi Isahara. 2003. Automatic error detection in the japanese learners’ english spoken data. In Proc. ACL. [35] Chris Brockett, William Dolan, and Michael Gamon. 2006. Correcting esl errors using phrasal smt techniques. In Proc.ACL. [36] Baljit Bhela. 1999.Native language interference in learning a second language: Exploratory case studies of native language interference with target language usage. International Education Journal Vol 1, No 1. [37] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22906	-
dc.description.abstract	本研究的目的在於發展一種統計的方法自動的擷取並且辨識這些錯誤的中間語言。中間語言是由語文學家所定義，任何人在學習第二外國語所會產生的一種中介語言，它以學習者正在學的第二外國語的形式表現但卻會受到學習者本身母語相關特性的影響，學習者自己無法辨識出他正在寫的是正確的第二外國語抑或是中間語言。本論文所提出之架構有相當大的彈性，在訓練的過程之中，不需要任何由人辨識標籤過的句子當作訓練資料，故可以很輕易的被轉換至任兩種不同的母語與第二外國語來使用。此系統先使用機器翻譯的技術去模擬中間語言的特徵當作訓練資料去訓練出一可以用來判斷是否相似於機器所模擬之中間語言的辨識器，再以此辨識器去標記訓練資料，以這些訓練資料最像中間語言的句子當作中間語言的訓練資料，重新訓練出一個更佳之中間語言辨識器。本系統以母語為中文和第二外國語為英文的情境下做評估，本實驗把系統應用在中華民國碩博士論文網中由全中華民國碩博士所寫的論文的摘要的句子上，實驗結果發現我們可以達到64.58%的精密度和56.67的偵測率。	zh_TW
dc.description.abstract	This paper describes a statistic method aiming at automatically retrieving and identifying interlanguage sentences. Interlanguage is a kind of language developed by a second language learner who has not become fully proficient yet but trying to approximate the learned language. The framework does not require human annotated and is language universal, thus can be applied to retrieve interlanguage between any two given languages. The framework has three stages, the first is approximating interlanguage with an order-preserved phrasal machine translator, the second is training a classifier to identifying interlanguage sentences, and the last is refining the classifier by retraining a new classifier with the interlanguage indentified by the classifier in second stage. The frame work is applied to extract a set of Chinese-English sentences for evaluation which reveals 64.58% in precision and 56.67% in recall while identifying a set of Chinese-English sentences from normal English sentences in the abstracts of thesis in English written by graduate students in Taiwan.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T04:32:59Z (GMT). No. of bitstreams: 1 ntu-98-R94922134-1.pdf: 1805263 bytes, checksum: 1a9669500825ea1e2a1373bca26e0869 (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	Acknowledgements I Abstract II 摘要 III Table of content………………………………………………………………………...IV List of Figures…………………………………………………………………………V Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Interlanguage 3 1.3 Research Objectives and The Proposed Solution 4 1.4 Evaluation 4 1.5 Thesis Organization 4 Chapter 2 Related Works 6 2.1 Rule-based 7 2.2 Statistic methods and Machine learning 7 2.3 Stylistic analysis 8 Chapter 3 Methodology 10 3.1 The property of Interlanguage 10 3.2 Framework 11 3.2.1 Interlanguage Approximation 13 3.2.2 Classifier Learning 13 3.2.3 Classifier Refining 16 Chapter 4 Evaluation 18 4.1 Data collections 18 4.2 Experiments 19 4.3 Experiment 1 23 4.4 Experiment 2 25 4.5 Results and Discussion 26 Chapter 5 Conclusion 28 Bibliography 29
dc.language.iso	en
dc.subject	中間語言	zh_TW
dc.subject	中介語	zh_TW
dc.subject	自動編輯	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	language model	en
dc.subject	auto editing	en
dc.subject	interlanguage	en
dc.title	中間語言特徵及語句之偵測與擷取	zh_TW
dc.title	Retrieving and Identifying Interlanguage Signatures and Sentences	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張俊盛,劉昭麟,林軒田,鄭卜壬
dc.subject.keyword	中間語言,中介語,自動編輯,語言模型,	zh_TW
dc.subject.keyword	interlanguage,auto editing,language model,	en
dc.relation.page	34
dc.rights.note	未授權
dc.date.accepted	2009-08-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	1.76 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。