半監督學習模型以改善標籤數稀少的中文新聞的立場偵測分類

Yu Ran; 冉昱

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59974

Title:	半監督學習模型以改善標籤數稀少的中文新聞的立場偵測分類 Semi-supervised method for Improving Stance Classification on Insufficient Labeled Chinese Newspaper
Authors:	Yu Ran 冉昱
Advisor:	林守德(Shou-De Lin)
Keyword:	立場偵測,半監督學習,梯子網絡,深度學習, Stance Classification,Semi-supervised Learning,Ladder Network,Deep Learning,
Publication Year :	2017
Degree:	碩士
Abstract:	本論文主要基於先前收集的爭議性新聞語料，意圖發展一智慧程式，分辨中文爭議性新聞之立場。本問題難點主要在於標記語料量較少，模型難以學到足夠的知識。對於此問題，韋銘學長主要對特征進行劃分，特征集群達到特征降維進而提升準確率，他主要使用了監督管理方法。本文目標主要從如何完全利用無標記信息角度和使用深度學習表示特征的方法出發，最終超越韋銘學長方法。首先利用文檔向量作為文章特征，並且與普通字特征和依賴特征作對比；然後利用半監督學習方法，主要使用自學習模型和梯子網絡。我們的自學習模型在話題二，梯子網絡在其他三個話題上超過了韋銘學長的方法。 We aim at developing an intelligent program to classify the stance on the Chinese news article on several controversial topics based on the former crawled data. The difficulty in this problem is the insufficient labeled news so that the model cannot learn enough knowledge. Wei-Ming mainly focus on the feature division, feature clustering to reduct the feature dimension and get higher accuracy with supervised method. We aimed at how to make full use of unlabeled data and use deep learning representation vector as feature to get the result beyond the Wei-Ming’s method. We first use paragraph vector as news’ feature and compare them with word feature and dependency feature, then we use the semi-supervised method, that is self-learning and ladder network with paragraph vector feature. We get the better result in topic 2 with self-learning and other 3 topics beyond the Wei-Ming’s method.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59974
DOI:	10.6342/NTU201700163
Fulltext Rights:	有償授權
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-106-1.pdf Restricted Access	3.63 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets