請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16055
標題: | 準則式中文句子重組還原 A Principle Based Approach for Chinese Word Reordering |
作者: | Daniel Li 李德堯 |
指導教授: | 顏嗣鈞(Hsu-Chun Yen) |
共同指導教授: | 許聞廉(Wen-Lian Hsu) |
關鍵字: | 語意理解,自然語言處理,句子重組,語言模型,廣義知識本體, Semantic Understanding,NLP,Word Reordering,Language Model,EHowNet, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 句子重組問題是指將通順的句子隨機排序打亂後,轉成詞袋模型,並將其中的詞彙重組,還原回一個通順的句子。此目的是提高機器生成的文本的語法性和流暢性,儘管句子重組問題在英文自然語言處理上已有多篇相關研究,但在中文自然語言處理中尚未有相關研究。 本論文基於英文上的相關研究,進行語言模型的更改,以及有別於以往集束搜索的方式,透過從訓練語料中建立詞類模板有效降低搜尋成本,使其和我們使用的語言模型BERT更為融合,在中文的樹庫資料集中拿到了0.82的BLEU分數。並在研究過程中訓練出一個EHowNet分類器,透過BERT詞向量的群聚,將其投影到正確的類別,可以有效解決資料庫中out-of-vocabulary的問題,對基於知識的自然語言處理有很大的幫助。 The problem of so-called sentence reordering means to randomly scramble the sequence of an orderly sentence and transforms it into a bag of words model among which it ends up with the tokens permutated and restored to an orderly way. The purpose is to increase the fluency and grammatical structure of the sentence generated by machines. In fact, there have been plenty of researches in English in terms of sentence reordering and natural language processing (NLP) but not in the scope of Chinese language. This paper hinges on the study of related works in English and is aimed at modification of language model to differentiate the existing method of beam search. Through the process of training corpus to establish POS patterns can effectively reduce the search efforts. By seamlessly combining it with BERT language model, it got a high mark of 0.82 BLEU points on Chinese Treebank. By virtue of creating the EHowNet classifier incorporating with the cluster of BERT word vectors, the system can project the accurate category and successfully address the issue of out-of-vocabulary in the database and did a great help in the knowledge-based NLP. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16055 |
DOI: | 10.6342/NTU202002141 |
全文授權: | 未授權 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-3107202002091400.pdf 目前未授權公開取用 | 6.79 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。