基於異質回收式生成的中文文法錯誤更正

Charles Hinson; 查爾斯

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58372

標題:	基於異質回收式生成的中文文法錯誤更正 Heterogeneous Recycle Generation for Chinese Grammatical Error Correction
作者:	Charles Hinson 查爾斯
指導教授:	陳信希(Hsin-Hsi Chen)
關鍵字:	中文文法錯誤更正,異質回收式生成, Chinese Grammatical Error Correction,GEC,Heterogeneous Recycle Generation,
出版年 :	2020
學位:	碩士
摘要:	近年來，語法錯誤糾正系統都依賴於神經機器翻譯（NMT）的模型。儘管這些模型表現出令人印象深刻的結果，但它們仍存在幾個主要的缺點。它們不僅需要大量數據來進行適當的訓練，而且還需要通過將來源句子轉換成目標句子的機制來完成，而非直接對其進行修正。本文提出了一種中文語法錯誤糾正系統，該系統由神經機器翻譯的模型、序列編輯模型及拼寫檢查器所組成。這個由三個模型組成的異構系統使用回收生成進行組合，其中一個模型的輸出用作另一個模型的輸入。該方法不僅在NLPCC2018 數據集上實現了最先進的表現，而且在沒有GEC 特定體系結構更改或數據擴充的情況下也可以實現。我們更透過不同的模型組成順序和生成迭代次數進行試驗，以找到組成系統的最佳方式。此外，我們修改了英文GEC 的ERRANT 評分器，使其能夠自動註釋和評分中文句子，不僅使我們，而且使未來的研究人員能夠基於不同錯誤類型來檢驗模型的性能。 In recent years, grammatical error correction systems have all relied on neural machine translation based (NMT-based) models. Although these model can yield impressive results, they have several major drawbacks. Not only do they require a massive amount of data to properly train, but also they work by translating a source sentence into a target sentence, an are unable to simply edit it. In this thesis, we propose a system for Chinese grammatical error correction (GEC) that consists of a neural machine translation based model, a sequence editing model, and a spell checker. This heterogeneous system of three models is combined using recycle generation, where the output from one model serves as input to another. This method not only achieves a new state-of-the-art performance on the NLPCC2018 dataset, but also does it without GEC specific architecture changes or data augmentation. We experiment with model composition order and number of generation iterations to find the optimal way compose our system. Furthermore, we modify the ERRANT scorer for English GEC to be able to automatically annotate and score Chinese sentences, giving not only us but also future researchers the ability to report model performance with respect to error type.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58372
DOI:	10.6342/NTU202001483
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1307202019160200.pdf 目前未授權公開取用	1.89 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。