以自動問題生成實現機器閱讀理解之半監督式學習與轉移學習

Shang-Ming Wang; 王上銘

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52042

Title:	以自動問題生成實現機器閱讀理解之半監督式學習與轉移學習 Explore the use of Automatic Question Generation in Semi-supervised learning and Transfer learning on Machine Reading Comprehension
Authors:	Shang-Ming Wang 王上銘
Advisor:	李宏毅(Hung-Yi Lee)
Keyword:	問題生成,半監督式學習,轉移學習, Question Generation,Semi-Supervised Learning,Transfer Learning,
Publication Year :	2020
Degree:	碩士
Abstract:	大量訊息的傳遞在現今社會中已經成了日常生活的一部份，人們透過網路蒐集、散播各種知識與技術，在此前提下，大量資料的取得變得更加容易，如何利用未標記資料增進模型效能的做法成為這幾年大家關注的焦點。未標記資料預訓練加上標記資料微調已經成了這幾年自然語言處理領域的標準配備，尤其在自然語言生成領域此方法帶來了十足的進步，使得利用自然語言生成的下游任務成為可能。本論文嘗試利用問題生成的方式實現機器閱讀理解的半監督式學習與轉移學習，然而透過模型生成機器閱讀理解的標記資料較分類任務困難許多，原因是分類任務的標記多存在唯一且固定數量的種類，然機器閱讀理解需同時針對文章生成答案與問題，由於問題是自然語言型式因此存在無限多種可能結果，且答案與問題並不是簡單的一對一對應，這些都為閱讀理解資料的生成帶來了相當大的困難。本論文首先探討使用預訓練模型進行問題生成的情況下，不同特徵以及不同問題選擇方式下對模型表現的影響，並發現預訓練模型在問題生成的品質以及問題的可回答性方面都帶來了大幅的進步，使得使用問題生成產生資料成為可能。再來是將問題生成應用於機器閱讀理解的半監督式學習，希望機器可以透過自問自答的方式在大量文章上生成更多的訓練資料，我們比較了兩種常見的答案生成方式以及不同生成資料數量對問答模型表現的影響，並發現此方法確實能讓問答模型的表現更進一步。最後是將此方法用於不同領域的問答集上，希望模型能在一個資料集上學習發問的技巧，當遇到領域相差較大的情況時能透過在該領域自我學習的方式提升模型的表現，結果也顯示了此方法應用於不同資料集上也能帶來一定程度的進步。 The transmission of a large amount of information has become a part of daily life in today's society. People collect and disseminate various knowledge and technologies through the Internet. Under this premise, the acquisition of large amounts of data becomes easier. How to use unlabeled data to improve the performance of model has come into the focus in recent years. Unlabeled data pre-training and labeled data fine-tuning have become standard equipment in the field of natural language processing recently. Especially in the field of natural language generation, this method has brought great progress, making it possible to use natural language generation in downstream tasks. This thesis attempts to use question generation to implement semi-supervised learning and transfer learning for machine reading comprehension. However, it is much more difficult to generate labeled data through models in machine reading comprehension than in classification tasks. The reason is that there are often unique and fixed numbers of labels for classification tasks. But, in machine reading comprehension, answers and questions for the article need to be generated at the same time. Furthermore, there are an infinite variety of possible results for natural language questions which have the same semantic, and the answer and the question are not a simple one-to-one mapping. All the above bring considerable difficulties to the reading comprehension data generation. In this thesis, we first discussed the impact of different features and different question selection methods on the performance of the model when using the pre-training model for question generation, and found that the pre-training model brings about the quality and the answerability of generated questions. Significant progress has made it possible to generate data using question generation. Second, we applied question generation to semi-supervised learning on machine reading comprehension. We hope that machines could generate more training data on a large number of articles through self-questioning and self-answering methods. We compared two common answer generation methods and different amounts of generated data, and found that this method can indeed make the performance of the question answering model further. Finally, we applied the same data generation methods to question answering datasets in different fields. We hoped that the model can learn the skills of asking questions on previous datasets and use self-questioning and self-answering methods to improve the performance of the question answering model on different datasets. The results also show that this method can also bring a certain degree of progress when applied to different datasets.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52042
DOI:	10.6342/NTU202002649
Fulltext Rights:	有償授權
Appears in Collections:	電機工程學系

Files in This Item:

File	Size	Format
U0001-0708202016225500.pdf Restricted Access	5.38 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets