Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51388
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希
dc.contributor.authorYen-Chi Shaoen
dc.contributor.author邵衍綺zh_TW
dc.date.accessioned2021-06-15T13:32:32Z-
dc.date.available2016-03-08
dc.date.copyright2016-03-08
dc.date.issued2016
dc.date.submitted2016-02-02
dc.identifier.citationBuys, J., & Merwe, B. V. D. (2013). A Tree Transducer Model for Grammatical Error Correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 43-50). Association for Computer Linguistics.
Chang, P. C., Tseng, H., Jurafsky, D., & Manning, C. D. (2009). Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (pp. 51-59). Association for Computational Linguistics.
Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics (pp. 310-318). Association for Computational Linguistics.
Cheng, S. M., Yu, C. H., & Chen, H. H. (2014). Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners. Proceedings of COLING’14, 279-289.
Chodorow, M., Tetreault, J. R., & Han, N. R. (2007). Detection of grammatical errors involving prepositions. In Proceedings of the fourth ACL-SIGSEM workshop on prepositions (pp. 25-30). Association for Computational Linguistics.
De Felice, R., & Pulman, S. G. (2007). Automatically acquiring models of preposition use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions (pp. 45-50). Association for Computational Linguistics.
De Felice, R., & Pulman, S. G. (2008). A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 169-176). Association for Computational Linguistics.
Felice, M., Yuan, Z., Andersen, Ø. E., Yannakoudakis, H., & Kochmar, E. (2014). Grammatical error correction using hybrid systems and type filtering. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 15-24). Association for Computer Linguistics.
Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W. B., Belenko, D., & Vanderwende, L. (2008). Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. In IJCNLP (Vol. 8, pp. 449-456).
Johnson, D. H. (1999). The insignificance of statistical significance testing. The journal of wildlife management, 763-772.
Kao, T. H., Chang, Y. W., Chiu, H. W., Yen, T. H., Boisson, J., Wu, J. C., & Chang, J. S. (2013). Conll-2013 shared task: Grammatical error correction NTHU system description. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 20-25). Association for Computer Linguistics.
Lee, J., & Knutsson, O. (2008). The role of PP attachment in preposition generation. In Computational Linguistics and Intelligent Text Processing (pp. 643-654). Springer Berlin Heidelberg.
Lin, C. J., & Chan, S. H. (2014). Description of NTOU Chinese grammar checker in CFL 2014. In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 75-78)
Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., & Cernock´, J. H. (2011). RNNLM - Recurrent Neural Network Language Modeling Toolkit. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA, 2nd April.
Ng, H. T., Wu, S. M., Wu, Y., Hadiwinoto, C., & Tetreault, J. (2013). The CoNLL-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 1-12). Association for Computer Linguistics.
Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto,. R. H., & Bryant, C. (2014). The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 1-14). Association for Computer Linguistics.
Pearson, K. (1900). X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157-175.
Rozovskaya,. A, Chang, K. W., Sammons, M., Roth, D., & Habash, N. (2014). The Illinois-Columbia System in the CoNLL-2014 Shared Task. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 34-42). Association for Computer Linguistics.
Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In INTERSPEECH.
Tetreault, J. R., & Chodorow, M. (2008). The ups and downs of preposition error detection in ESL writing. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 865-872). Association for Computational Linguistics.
Tetreault, J., Foster, J., & Chodorow, M. (2010). Using parse features for preposition selection and error detection. In Proceedings of the acl 2010 conference short papers (pp. 353-358). Association for Computational Linguistics.
Wu, S. H., Chen, Y. Z., Yang, P. C., Ku, T., & Liu, C. L. (2010). Reducing the false alarm rate of Chinese character error detection and correction. In Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010) (pp. 54-61).
Yoshimoto, I., Kose, T., Mitsuzawa, K., Sakaguchi, K., Mizumoto, T., Hayashibe, Y., Komachi, M., & Matsumoto, Y. (2013). NAIST at 2013 CoNLL Grammatical Error Correction Shared Task. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task (pp. 26-33). Association for Computer Linguistics.
Yuan, Z., & Felice, M. (2013). Constrained grammatical error correction using Statistical Machine Translation. CoNLL-2013, 52.
Zampiperi, M., & Tan, L. (2014). Grammatical error detection with limited training data: the case of Chinese. In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 69-74)
Zhang H., & Wang, L. (2014). A Unified Framework for Grammar Error Correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task (pp. 96-102). Association for Computer Linguistics.
Zhao, Y., Kimachi, M., & Ishikawa H. (2014). Extraction a Chinese leaner corpus from the web: grammatical error correction for learning Chinese as as foreign language with statistical machine translation In ICCE-2014 workshop on Natural Language Procession Techniques for Education Application (pp. 56-62)
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51388-
dc.description.abstract目前漢語的學習人數激增,知道中文詞彙;明瞭漢語語法;通曉華語文化儼然成為優勢。這樣的技能水平標準,是未來潮流必備的。
然而,縱查自然語言之研究,卻極少有關於中文語法的相關課題,再者,中文文法上介繫詞的選詞,決定了中文所要表達的含意。反觀英文的介繫詞研究在近年來成長許多,也應用的相當廣泛,使得英文在學習上不再受空間或時間所限,進展到可以由人工智慧來協助。本研究期望針對外國人學習漢語時,介繫詞詞彙選擇錯誤,所造成意境上的誤差,來做為研究目標對象。
研究中以HSK(漢語水平標準考試)語料庫為出發點,把外國人學習中文的真實情境語料資源作為研究目標,並擇以介繫詞偏誤為主的句子,透過不同語言模型修正其錯誤。而參照的標準語料庫為中研院所收集之巨量資料集,CGW(Chinese Giga Word),其遍及的中文極廣,含括華爾街日報、中央社新聞等等,透過此大數據的資料,藉以不同策略建立語言模型,建立一套專門修正介繫詞錯誤的中文語言模組。
針對研究的重點介繫詞,在無需人工修正下,最佳模組在介繫詞選擇上,於漢語為母語的文章裡,可以達到將近68%的準確率,而在外國人撰寫的文句則可以修正句子達到45%。
zh_TW
dc.description.abstractThe increase of the Chinese language leaners has become a currently trend. Knowing Chinese words, realizing Chinese grammar, and comprehending Chinese culture is an advantage in the word. Those skills will be an necessary standard.
However, there is not much research dedicated to detect and correct Chinese grammatical errors. Moreover, the Chinese preposition contain most meaning of a context. For preposition research, there are many investigations on English part, and the application is also widespread. It makes learning English not restricted by time and space. The goal of this research focus on the preposition error; mostly investigate the mistake from the Chinese as a second language leaners.
In this research, the experiment dataset is extracted from HSK dynamic composition corpus that was built by Beijing Language and Culture University. This is a real circumstances of CSL and sentences extracted with “preposition error”. Using Chinese Giga Word as the stander dataset and training different language models in order to choose the correct preposition.
Without any rule-based model, our model for selecting the proper preposition can reach 68% accuracy in the L1 context, and the 45% in L2 dataset.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T13:32:32Z (GMT). No. of bitstreams: 1
ntu-105-R02922040-1.pdf: 8970772 bytes, checksum: 84bd95e25f819d8ee082bef292536ded (MD5)
Previous issue date: 2016
en
dc.description.tableofcontents口試委員審定書 I
誌謝 II
摘要 III
ABSTRACT IV
目錄 V
圖目錄 IX
表目錄 XI
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機及目的 1
1.3 論文架構 2
第二章 相關研究 4
2.1 中文文法改正相關研究 4
2.2 英文文法改正相關研究 5
第三章 資料集分析 7
3.1 介繫詞詞性分佈情形 7
3.1.1 中英文資料及來源 7
3.1.2 分析步驟圖 7
3.1.3 詞彙與詞性關連性 8
3.2 中文與英文介詞詞彙詞性歧義度比較 10
3.3 外語學習者語料庫 13
3.3.1 資料前置處理 13
3.3.2 中研院窮舉詞彙 14
3.3.3 候選測試集 14
3.3.4 重點詞彙分析 16
3.3.5 重點詞彙與外語學習者使用分佈 16
第四章 實驗規劃及架構 18
4.1 實驗規劃 18
4.2 實驗架構 18
4.2.1 語言模型 18
4.2.2 實驗整體架構 19
4.2.3 策略及參數 22
4.2.4 驗證方式 22
第五章 N-gram Language結果分析 24
5.1 L1 Testing Set 1總體結果 24
5.2 詞彙表現結果 26
5.2.1 Word w POS 26
5.2.2 Word w/o POS 26
5.2.3 CharBase 28
5.3 最佳模型詞彙混淆分析 30
第六章 NgramLM 與 RNNLM結果分析 40
6.1 語言模型結果 40
6.1.1 L1 Testing Set 2 - Accuracy 40
6.1.2 L1 Testing Set 2 - MRR 44
6.2 詞彙表現度分析 46
6.2.1詞彙搭配詞性結果分析(Word w POS) 46
6.2.2 詞彙獨立基礎結果分析(Word w/o POS) 47
6.2.3 字元基礎結果分析(CharBase) 47
6.3 相同類型詞彙合併推薦結果 52
第七章 第二外語(HSK)測試結果 55
7.1 第二外語測資結果 55
7.1.1 總體結果 55
7.1.2 L2合併推薦結果 59
7.2 語言模型與詞彙結果分析 59
7.2.1 RNNLM於第二外語結果分析 59
7.2.2 NgramLM於第二外語結果分析 60
7.3 L2最佳模型推薦結果分析 62
第八章 結論及未來研究 64
附錄一:CTB Word#Tag appear-time count 65
附錄二:PTB Word#Tag appear-time count 76
附錄三:中研院介繫詞窮舉列表 85
附錄四:訓練資料(CGW)和介繫詞POS分佈 87
附錄五:重點介繫詞彙於資料整體分佈 89
附錄六:L1 Testing Set 1 - NgramLM - Accuracy 91
附錄七:L1 Testing Set 1 - NgramLM - MRR 92
附錄八:NgramLM-Word w POS-L1 Testing Data 1-Pre. 93
附錄九:NgramLM-Word w POS-L1 Testing Data 1-Recall 95
附錄十:NgramLM-Word w POS-L1 Testing Data 1-F1 97
附錄十一:L1 Testing Data 1 詞彙分佈 99
附錄十二:NgramLM-Word w/o POS-L1 Testing Data 1-Pre. 100
附錄十三:NgramLM-Word w/o POS-L1 Testing Data 1-Rec. 102
附錄十四:NgramLM-Word w/o POS-L1 Testing Data 1-F1. 104
附錄十五:NgramLM-CharacterBase-L1 Testing Set 1-Pre. 106
附錄十六:NgramLM-CharacterBase-L1 Testing Data 1-Rec. 108
附錄十七:NgramLM-CharacterBase-L1 Testing Data 1-F1. 110
附錄十八:L1 Testing Data 2 詞彙分佈 112
附錄十九:L1 Testing Set 2 - Accuracy 113
附錄二十:L1 Testing Set 2 - MRR 114
附錄二十ㄧ:L1 Testing Set 2 - RNNLM-Word w POS 115
附錄二十二:L1 Testing Set 2 - RNNLM-Word w/o POS 117
附錄二十三:L1 Testing Set 2 - RNNLM-CharBase 119
附錄二十四:L1 Testing Set 2 - NgramLM 121
附錄二十五:C.M.-L1 Testing Set 2 -Word w POS-10gramLM 123
附錄二十六:C.M.-L1 Testing Set 2 -Word w POS-RNNLM 124
附錄二十七:C.M.-L1 Testing Set 2-Word w/o POS-10gramLM 125
附錄二十八:C.M.-L1 Testing Set 2-Word w/o POS-RNNLM 126
附錄二十九:C.M.-L1 Testing Set 2-CharBase-13gramLM 127
附錄三十:C.M.-L1 Testing Set 2-CharBase-RNNLM 128
附錄三十一:L1 Testing Set 2 -Combination- Accuracy 129
附錄三十二:L2 Testing Set(236)-Accuracy & Sen.count 130
附錄三十三:L2 Testing Set-MRR 131
附錄三十四:L2 Testing Set-RNNLM-Word w POS 132
附錄三十五:L2 Testing Set(236)-RNNLM-Word w/o POS 134
附錄三十六:L2 Testing Set(236)-RNNLM-CharBase 136
附錄三十七:L2 Testing Set-NgramLM-Word w POS 138
附錄三十八:L2 Testing Set-NgramLM-Word w/o POS 140
附錄三十九:L2 Testing Set-NgramLM-CharBase 142
附錄四十:C.M.-L2 Testing Set 2-5gramLM-Word w POS 144
附錄四十一:C.M.-L2 Testing Set 2-5gramLM-Word w/o POS 145
附錄四十二:C.M.-L2 Testing Set 2-6gramLM-CharBase 146
附錄四十三:C.M.-L2 Testing Set-RNNLM hd512-Word w POS 147
附錄四十四:C.M.-L2 Testing Set-RNNLM hd512-Word w/o POS 148
附錄四十五:C.M.-L2 Testing Set-RNNLM hd512-CharBase 149
附錄四十六:L2 Testing Set-合併推薦結果 150
附錄四十七:本論文使用之符號對照表 151
參考文獻 153
dc.language.isozh-TW
dc.subject語言模型zh_TW
dc.subjectHSK語料庫zh_TW
dc.subject語言模型zh_TW
dc.subject中文介繫詞選詞zh_TW
dc.subject中文文法改正zh_TW
dc.subject中文文法改正zh_TW
dc.subject中文介繫詞選詞zh_TW
dc.subjectHSK語料庫zh_TW
dc.subjectChinese Preposition Selectionen
dc.subjectChinese Grammar Correctionen
dc.subjectLanguage Modelen
dc.subjectHSK Corpusen
dc.subjectChinese Grammar Correctionen
dc.subjectChinese Preposition Selectionen
dc.subjectHSK Corpusen
dc.subjectLanguage Modelen
dc.title協助非中文母語學習者修正介繫詞選詞錯誤zh_TW
dc.titleChinese Preposition Error Correction for Non-Native Chinese Language Learnersen
dc.typeThesis
dc.date.schoolyear104-1
dc.description.degree碩士
dc.contributor.oralexamcommittee林川傑,古倫維
dc.subject.keyword中文文法改正,中文介繫詞選詞,HSK語料庫,語言模型,zh_TW
dc.subject.keywordChinese Grammar Correction,Chinese Preposition Selection,HSK Corpus,Language Model,en
dc.relation.page156
dc.rights.note有償授權
dc.date.accepted2016-02-02
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-105-1.pdf
  未授權公開取用
8.76 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved