Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38881| Title: | 中文情緒詞彙自動學習及在意見擷取之應用 Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
| Authors: | Tung-Ho Wu 吳東和 |
| Advisor: | 陳信希 |
| Keyword: | 情緒詞,意見擷取, Sentiment Word,Opinion Extraction, |
| Publication Year : | 2005 |
| Degree: | 碩士 |
| Abstract: | 文章中的意見可能以明顯或隱含的方式表達出來。針對行政機關的效率提升及公司產品的改進等來說,意見提供了寶貴的資訊及不同人所表達的觀點。在這篇論文中,我們認為意見句就是句子的持有者或作者對某特定主題所發表的一段陳述性句子,同時在這句子中含有情緒成分。在此,我們將含有情緒的詞彙當成情緒詞。在意見句中,句子中的情緒詞會決定此意見句的意見傾向;在一篇文章中,文章裡的情緒詞會決定整篇文章的意見傾向。因此,對意見擷取來說,情緒詞是很關鍵的特徵。
在建立完初步的情緒字典後,我們提出了三個方法來判別一個未知詞是正面情緒詞、負面情緒詞或非情緒詞,分別是Thesaurus-Based Approach, Character-Based Approach及Combined Approach。在這三個方法中,Combined Approach是最好的方法,它利用同義詞資訊及情緒分數來判別一個未知詞,而情緒分數的計算方式是根據一個詞的中文字組成方式來計算。在實驗中,對動詞類詞彙其 F-measure 是 73.18%,對名詞類詞彙是 63.75% ,平均來看其F-measure 是 70.40%。然後,我們根據Combined Approach提出Sentiment Miner來學習新的正負面情緒詞。 在意見擷取層面,我們提出Passage Level Algorithm來偵測含在文章當中的意見句,而這演算法利用到情緒詞及句子中的內文資訊。在句子層次,最好的實驗結果其 F-measure 是62.16% 。我們也提出Document Level Algorithm來偵測整篇文章的情緒傾向。在文件層次,最好的實驗結果其 F-measure 是76.56% 。 Opinions may be explicitly or implicitly embedded in documents. They are useful information and viewpoints to improve services of government or products of companies. We consider that an opinion is a statement expressed towards a topic and contains sentiments. Sentiment words determine the opinion type of an opinion passage and the overall opinion tendency of a document. Sentiment words are the key features in opinion extraction. We propose three approaches, including the Thesaurus-Based Approach, the Character-Based Approach and the Combined Approach, to determine whether an unknown word is positive, negative or non-sentiment. The Thesaurus-Based Approach utilizes the synonym information to classify an unknown word. The Character-Based Approach computes the sentiment score of a Chinese word based on its composite characters and classifies a word by its sentiment score information. The Combined Approach utilizes the synonym information and sentiment scores to classify an unknown word. This approach is the best among these approaches. The F-measure is 73.18% and 63.75% for verbs and nouns, respectively under strict assessment by human. The average F-measure is 70.40%. Finally, we propose the Sentiment Miner based on the Combined Approach to acquire new positive and negative sentiment words from documents. For opinion extraction, we propose the Passage Level Algorithm to detect the opinion passages inside a document. This algorithm utilizes sentiment words and context information. We also propose the Document Level Algorithm to determine the overall opinion tendency of a document based on the opinion passages inside the document. In experiments, the best F-measure is 62.16% at the passage level and 76.56% at the document level. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38881 |
| Fulltext Rights: | 有償授權 |
| Appears in Collections: | 資訊工程學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-94-1.pdf Restricted Access | 192.16 kB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
