Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61314| Title: | 以分類學為基礎之跨領域情緒分析方法研究 Taxonomy-based Cross Domain Sentiment Classification |
| Authors: | Cong-Kai Lin 林琮凱 |
| Advisor: | 陳信希(Hsin-Hsi Chen) |
| Keyword: | 跨領域情緒分類,分類學,結合式模型,回歸模型,遷移學習, cross domain sentiment classification,taxonomy,ensemble modeling,regression model,transfer learning, |
| Publication Year : | 2013 |
| Degree: | 碩士 |
| Abstract: | 在網際網路蓬勃發展的世代,人們經常在網路平台上,分享他們的生活經驗和對事物的看法,這對於有類似需求的人具有相當的參考作用。情緒分類(sentiment classification)的目的是運用過去眾人之經驗,預測文章的正面或負面情緒極性,具有多面向的實際應用,近年來受到人們極度的關注。然而,在進行情緒分類預測時,如果所擁有的資源與擬標記資料的所屬領域差異很大,例如運用『電器』領域之資料對『書籍』領域進行情緒分類之預測,分類的效能很容易就會劇烈地下降。這種來源領域與目標領域不同的情緒分類問題,稱之為跨領域情緒分類(cross domain sentiment classification)問題。
近年來,跨領域情緒分類這個議題,有很多相關研究被提出來。過去的跨領域情緒分類研究,都將領域視為單一、不能分割的類別,這無法反映真實世界的情況。在許多線上購物網中,如亞馬遜、億貝(Amazon、eBay)等,對於商品的分類是以分類學為基礎(taxonomy-based)的樹狀分類方式呈現。在這篇論文中,我們提出有別於以往較粗領域的分類觀點,以樹狀分類架構為基礎,探討在相同領域以及跨領域的情緒分類問題。首先對於樹狀分類資料進行細膩分析,了解訓練資料的多樣性有助於跨領域情緒分類預測。接著運用這個概念,提出以分類學為基礎之模型組合演算法(taxonomy-based model combination, TBMC),參考樹狀架構調整模型的權重,將多個模型組合,用以解決跨領域情緒分類的問題。同時,我們也針對樹狀分類下的來源挑選問題,提出了以分類學為基礎之回歸模型(taxonomy-based regression model, TBRM),來幫助最佳來源節點之挑選。 實驗結果顯示:TBMC方法對跨領域情緒分類之預測具有更佳的效能,在最佳來源的選擇議題,也反映TBRM方法比未運用樹狀資訊的回歸模型要來得優良。最後,我們也進一步結合這兩個方法,並有效地搭配遷移學習(transfer learning)以達到更好的效果。 In the Internet era, human are usually willing to share their experiences on different subjects. Those form very good references for the similar needs of human. Sentiment classification aims to employ the past experiences to predict the polarity of current documents. It attracts much attention in the recent years because of its various applications. One of the challenging issues in sentiment classification is: if the source and target domains of sentiment classification are different, for example, the use of knowledge in the electric domain to predict the polarity in the book domain, the classification performance may be decreased drastically. To deal with the sentiment classification between different domains are so called cross-domain sentiment classification. In the recent years, many cross domain sentiment classification methods have been proposed. They consider a domain as a whole set of instances for training. However, many online shopping websites such as Amazon and eBay organize their data in terms of taxonomy. In this paper, we consider taxonomy as a basis to discuss the in-domain and cross-domain sentiment classification problem. We first show that the diversity of training data is indeed beneficial in cross-domain prediction. Then, we propose a taxonomy-based model combination algorithm (TBMC), which combines several models and reweigh their weights by tree-structured information. Besides, we also propose a taxonomy-based regression (TBRM) model to help the selection of the best source node. The experimental results show that TMBC is really effective to deal with the cross domain sentiment classification problem, and TBRM also achieves better performance than the regression models without considering the taxonomy information in the best source selection problem. Finally, we further combine the two methods and integrate a transfer learning method to reach a better performance. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61314 |
| Fulltext Rights: | 有償授權 |
| Appears in Collections: | 資訊工程學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-102-1.pdf Restricted Access | 4.29 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
