協助非中文母語學習者修正介繫詞選詞錯誤

Yen-Chi Shao; 邵衍綺

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51388

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳信希
dc.contributor.author	Yen-Chi Shao	en
dc.contributor.author	邵衍綺	zh_TW
dc.date.accessioned	2021-06-15T13:32:32Z	-
dc.date.available	2016-03-08
dc.date.copyright	2016-03-08
dc.date.issued	2016
dc.date.submitted	2016-02-02
dc.identifier.citation	Buys, J., & Merwe, B. V. D. (2013). A Tree Transducer Model for Grammatical Error Correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 43-50). Association for Computer Linguistics. Chang, P. C., Tseng, H., Jurafsky, D., & Manning, C. D. (2009). Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (pp. 51-59). Association for Computational Linguistics. Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics (pp. 310-318). Association for Computational Linguistics. Cheng, S. M., Yu, C. H., & Chen, H. H. (2014). Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners. Proceedings of COLING’14, 279-289. Chodorow, M., Tetreault, J. R., & Han, N. R. (2007). Detection of grammatical errors involving prepositions. In Proceedings of the fourth ACL-SIGSEM workshop on prepositions (pp. 25-30). Association for Computational Linguistics. De Felice, R., & Pulman, S. G. (2007). Automatically acquiring models of preposition use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions (pp. 45-50). Association for Computational Linguistics. De Felice, R., & Pulman, S. G. (2008). A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 169-176). Association for Computational Linguistics. Felice, M., Yuan, Z., Andersen, Ø. E., Yannakoudakis, H., & Kochmar, E. (2014). Grammatical error correction using hybrid systems and type filtering. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 15-24). Association for Computer Linguistics. Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W. B., Belenko, D., & Vanderwende, L. (2008). Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. In IJCNLP (Vol. 8, pp. 449-456). Johnson, D. H. (1999). The insignificance of statistical significance testing. The journal of wildlife management, 763-772. Kao, T. H., Chang, Y. W., Chiu, H. W., Yen, T. H., Boisson, J., Wu, J. C., & Chang, J. S. (2013). Conll-2013 shared task: Grammatical error correction NTHU system description. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 20-25). Association for Computer Linguistics. Lee, J., & Knutsson, O. (2008). The role of PP attachment in preposition generation. In Computational Linguistics and Intelligent Text Processing (pp. 643-654). Springer Berlin Heidelberg. Lin, C. J., & Chan, S. H. (2014). Description of NTOU Chinese grammar checker in CFL 2014. In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 75-78) Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April. Mikolov, T., Kombrink, S., Deoras, A., Burget, L., & Cernock´, J. H. (2011). RNNLM - Recurrent Neural Network Language Modeling Toolkit. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA, 2nd April. Ng, H. T., Wu, S. M., Wu, Y., Hadiwinoto, C., & Tetreault, J. (2013). The CoNLL-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 1-12). Association for Computer Linguistics. Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto,. R. H., & Bryant, C. (2014). The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 1-14). Association for Computer Linguistics. Pearson, K. (1900). X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157-175. Rozovskaya,. A, Chang, K. W., Sammons, M., Roth, D., & Habash, N. (2014). The Illinois-Columbia System in the CoNLL-2014 Shared Task. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 34-42). Association for Computer Linguistics. Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In INTERSPEECH. Tetreault, J. R., & Chodorow, M. (2008). The ups and downs of preposition error detection in ESL writing. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 865-872). Association for Computational Linguistics. Tetreault, J., Foster, J., & Chodorow, M. (2010). Using parse features for preposition selection and error detection. In Proceedings of the acl 2010 conference short papers (pp. 353-358). Association for Computational Linguistics. Wu, S. H., Chen, Y. Z., Yang, P. C., Ku, T., & Liu, C. L. (2010). Reducing the false alarm rate of Chinese character error detection and correction. In Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010) (pp. 54-61). Yoshimoto, I., Kose, T., Mitsuzawa, K., Sakaguchi, K., Mizumoto, T., Hayashibe, Y., Komachi, M., & Matsumoto, Y. (2013). NAIST at 2013 CoNLL Grammatical Error Correction Shared Task. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 26-33). Association for Computer Linguistics. Yuan, Z., & Felice, M. (2013). Constrained grammatical error correction using Statistical Machine Translation. CoNLL-2013, 52. Zampiperi, M., & Tan, L. (2014). Grammatical error detection with limited training data: the case of Chinese. In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 69-74) Zhang H., & Wang, L. (2014). A Unified Framework for Grammar Error Correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 96-102). Association for Computer Linguistics. Zhao, Y., Kimachi, M., & Ishikawa H. (2014). Extraction a Chinese leaner corpus from the web: grammatical error correction for learning Chinese as as foreign language with statistical machine translation In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 56-62)
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51388	-
dc.description.abstract	目前漢語的學習人數激增，知道中文詞彙；明瞭漢語語法；通曉華語文化儼然成為優勢。這樣的技能水平標準，是未來潮流必備的。然而，縱查自然語言之研究，卻極少有關於中文語法的相關課題，再者，中文文法上介繫詞的選詞，決定了中文所要表達的含意。反觀英文的介繫詞研究在近年來成長許多，也應用的相當廣泛，使得英文在學習上不再受空間或時間所限，進展到可以由人工智慧來協助。本研究期望針對外國人學習漢語時，介繫詞詞彙選擇錯誤，所造成意境上的誤差，來做為研究目標對象。研究中以HSK（漢語水平標準考試）語料庫為出發點，把外國人學習中文的真實情境語料資源作為研究目標，並擇以介繫詞偏誤為主的句子，透過不同語言模型修正其錯誤。而參照的標準語料庫為中研院所收集之巨量資料集，CGW（Chinese Giga Word），其遍及的中文極廣，含括華爾街日報、中央社新聞等等，透過此大數據的資料，藉以不同策略建立語言模型，建立一套專門修正介繫詞錯誤的中文語言模組。針對研究的重點介繫詞，在無需人工修正下，最佳模組在介繫詞選擇上，於漢語為母語的文章裡，可以達到將近68%的準確率，而在外國人撰寫的文句則可以修正句子達到45%。	zh_TW
dc.description.abstract	The increase of the Chinese language leaners has become a currently trend. Knowing Chinese words, realizing Chinese grammar, and comprehending Chinese culture is an advantage in the word. Those skills will be an necessary standard. However, there is not much research dedicated to detect and correct Chinese grammatical errors. Moreover, the Chinese preposition contain most meaning of a context. For preposition research, there are many investigations on English part, and the application is also widespread. It makes learning English not restricted by time and space. The goal of this research focus on the preposition error; mostly investigate the mistake from the Chinese as a second language leaners. In this research, the experiment dataset is extracted from HSK dynamic composition corpus that was built by Beijing Language and Culture University. This is a real circumstances of CSL and sentences extracted with “preposition error”. Using Chinese Giga Word as the stander dataset and training different language models in order to choose the correct preposition. Without any rule-based model, our model for selecting the proper preposition can reach 68% accuracy in the L1 context, and the 45% in L2 dataset.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:32:32Z (GMT). No. of bitstreams: 1 ntu-105-R02922040-1.pdf: 8970772 bytes, checksum: 84bd95e25f819d8ee082bef292536ded (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員審定書 I 誌謝 II 摘要 III ABSTRACT IV 目錄 V 圖目錄 IX 表目錄 XI 第一章緒論 1 1.1 研究背景 1 1.2 研究動機及目的 1 1.3 論文架構 2 第二章相關研究 4 2.1 中文文法改正相關研究 4 2.2 英文文法改正相關研究 5 第三章資料集分析 7 3.1 介繫詞詞性分佈情形 7 3.1.1 中英文資料及來源 7 3.1.2 分析步驟圖 7 3.1.3 詞彙與詞性關連性 8 3.2 中文與英文介詞詞彙詞性歧義度比較 10 3.3 外語學習者語料庫 13 3.3.1 資料前置處理 13 3.3.2 中研院窮舉詞彙 14 3.3.3 候選測試集 14 3.3.4 重點詞彙分析 16 3.3.5 重點詞彙與外語學習者使用分佈 16 第四章實驗規劃及架構 18 4.1 實驗規劃 18 4.2 實驗架構 18 4.2.1 語言模型 18 4.2.2 實驗整體架構 19 4.2.3 策略及參數 22 4.2.4 驗證方式 22 第五章 N-gram Language結果分析 24 5.1 L1 Testing Set 1總體結果 24 5.2 詞彙表現結果 26 5.2.1 Word w POS 26 5.2.2 Word w/o POS 26 5.2.3 CharBase 28 5.3 最佳模型詞彙混淆分析 30 第六章 NgramLM 與 RNNLM結果分析 40 6.1 語言模型結果 40 6.1.1 L1 Testing Set 2 - Accuracy 40 6.1.2 L1 Testing Set 2 - MRR 44 6.2 詞彙表現度分析 46 6.2.1詞彙搭配詞性結果分析（Word w POS） 46 6.2.2 詞彙獨立基礎結果分析（Word w/o POS） 47 6.2.3 字元基礎結果分析（CharBase) 47 6.3 相同類型詞彙合併推薦結果 52 第七章第二外語（HSK）測試結果 55 7.1 第二外語測資結果 55 7.1.1 總體結果 55 7.1.2 L2合併推薦結果 59 7.2 語言模型與詞彙結果分析 59 7.2.1 RNNLM於第二外語結果分析 59 7.2.2 NgramLM於第二外語結果分析 60 7.3 L2最佳模型推薦結果分析 62 第八章結論及未來研究 64 附錄一：CTB Word#Tag appear-time count 65 附錄二：PTB Word#Tag appear-time count 76 附錄三：中研院介繫詞窮舉列表 85 附錄四：訓練資料(CGW)和介繫詞POS分佈 87 附錄五：重點介繫詞彙於資料整體分佈 89 附錄六：L1 Testing Set 1 - NgramLM - Accuracy 91 附錄七：L1 Testing Set 1 - NgramLM - MRR 92 附錄八：NgramLM-Word w POS-L1 Testing Data 1-Pre. 93 附錄九：NgramLM-Word w POS-L1 Testing Data 1-Recall 95 附錄十：NgramLM-Word w POS-L1 Testing Data 1-F1 97 附錄十一：L1 Testing Data 1 詞彙分佈 99 附錄十二：NgramLM-Word w/o POS-L1 Testing Data 1-Pre. 100 附錄十三：NgramLM-Word w/o POS-L1 Testing Data 1-Rec. 102 附錄十四：NgramLM-Word w/o POS-L1 Testing Data 1-F1. 104 附錄十五：NgramLM-CharacterBase-L1 Testing Set 1-Pre. 106 附錄十六：NgramLM-CharacterBase-L1 Testing Data 1-Rec. 108 附錄十七：NgramLM-CharacterBase-L1 Testing Data 1-F1. 110 附錄十八：L1 Testing Data 2 詞彙分佈 112 附錄十九：L1 Testing Set 2 - Accuracy 113 附錄二十：L1 Testing Set 2 - MRR 114 附錄二十ㄧ：L1 Testing Set 2 - RNNLM-Word w POS 115 附錄二十二：L1 Testing Set 2 - RNNLM-Word w/o POS 117 附錄二十三：L1 Testing Set 2 - RNNLM-CharBase 119 附錄二十四：L1 Testing Set 2 - NgramLM 121 附錄二十五：C.M.-L1 Testing Set 2 -Word w POS-10gramLM 123 附錄二十六：C.M.-L1 Testing Set 2 -Word w POS-RNNLM 124 附錄二十七：C.M.-L1 Testing Set 2-Word w/o POS-10gramLM 125 附錄二十八：C.M.-L1 Testing Set 2-Word w/o POS-RNNLM 126 附錄二十九：C.M.-L1 Testing Set 2-CharBase-13gramLM 127 附錄三十：C.M.-L1 Testing Set 2-CharBase-RNNLM 128 附錄三十一：L1 Testing Set 2 -Combination- Accuracy 129 附錄三十二：L2 Testing Set（236）-Accuracy & Sen.count 130 附錄三十三：L2 Testing Set-MRR 131 附錄三十四：L2 Testing Set-RNNLM-Word w POS 132 附錄三十五：L2 Testing Set（236）-RNNLM-Word w/o POS 134 附錄三十六：L2 Testing Set（236）-RNNLM-CharBase 136 附錄三十七：L2 Testing Set-NgramLM-Word w POS 138 附錄三十八：L2 Testing Set-NgramLM-Word w/o POS 140 附錄三十九：L2 Testing Set-NgramLM-CharBase 142 附錄四十：C.M.-L2 Testing Set 2-5gramLM-Word w POS 144 附錄四十一：C.M.-L2 Testing Set 2-5gramLM-Word w/o POS 145 附錄四十二：C.M.-L2 Testing Set 2-6gramLM-CharBase 146 附錄四十三：C.M.-L2 Testing Set-RNNLM hd512-Word w POS 147 附錄四十四：C.M.-L2 Testing Set-RNNLM hd512-Word w/o POS 148 附錄四十五：C.M.-L2 Testing Set-RNNLM hd512-CharBase 149 附錄四十六：L2 Testing Set-合併推薦結果 150 附錄四十七：本論文使用之符號對照表 151 參考文獻 153
dc.language.iso	zh-TW
dc.subject	語言模型	zh_TW
dc.subject	HSK語料庫	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	中文介繫詞選詞	zh_TW
dc.subject	中文文法改正	zh_TW
dc.subject	中文文法改正	zh_TW
dc.subject	中文介繫詞選詞	zh_TW
dc.subject	HSK語料庫	zh_TW
dc.subject	Chinese Preposition Selection	en
dc.subject	Chinese Grammar Correction	en
dc.subject	Language Model	en
dc.subject	HSK Corpus	en
dc.subject	Chinese Grammar Correction	en
dc.subject	Chinese Preposition Selection	en
dc.subject	HSK Corpus	en
dc.subject	Language Model	en
dc.title	協助非中文母語學習者修正介繫詞選詞錯誤	zh_TW
dc.title	Chinese Preposition Error Correction for Non-Native Chinese Language Learners	en
dc.type	Thesis
dc.date.schoolyear	104-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林川傑,古倫維
dc.subject.keyword	中文文法改正,中文介繫詞選詞,HSK語料庫,語言模型,	zh_TW
dc.subject.keyword	Chinese Grammar Correction,Chinese Preposition Selection,HSK Corpus,Language Model,	en
dc.relation.page	156
dc.rights.note	有償授權
dc.date.accepted	2016-02-02
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	8.76 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。