中文情緒詞彙自動學習及在意見擷取之應用

Tung-Ho Wu; 吳東和

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38881

標題:	中文情緒詞彙自動學習及在意見擷取之應用 Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction
作者:	Tung-Ho Wu 吳東和
指導教授:	陳信希
關鍵字:	情緒詞,意見擷取, Sentiment Word,Opinion Extraction,
出版年 :	2005
學位:	碩士
摘要:	文章中的意見可能以明顯或隱含的方式表達出來。針對行政機關的效率提升及公司產品的改進等來說，意見提供了寶貴的資訊及不同人所表達的觀點。在這篇論文中，我們認為意見句就是句子的持有者或作者對某特定主題所發表的一段陳述性句子，同時在這句子中含有情緒成分。在此，我們將含有情緒的詞彙當成情緒詞。在意見句中，句子中的情緒詞會決定此意見句的意見傾向；在一篇文章中，文章裡的情緒詞會決定整篇文章的意見傾向。因此，對意見擷取來說，情緒詞是很關鍵的特徵。在建立完初步的情緒字典後，我們提出了三個方法來判別一個未知詞是正面情緒詞、負面情緒詞或非情緒詞，分別是Thesaurus-Based Approach, Character-Based Approach及Combined Approach。在這三個方法中，Combined Approach是最好的方法，它利用同義詞資訊及情緒分數來判別一個未知詞，而情緒分數的計算方式是根據一個詞的中文字組成方式來計算。在實驗中，對動詞類詞彙其 F-measure 是 73.18%，對名詞類詞彙是 63.75% ，平均來看其F-measure 是 70.40%。然後，我們根據Combined Approach提出Sentiment Miner來學習新的正負面情緒詞。在意見擷取層面，我們提出Passage Level Algorithm來偵測含在文章當中的意見句，而這演算法利用到情緒詞及句子中的內文資訊。在句子層次，最好的實驗結果其 F-measure 是62.16% 。我們也提出Document Level Algorithm來偵測整篇文章的情緒傾向。在文件層次，最好的實驗結果其 F-measure 是76.56% 。 Opinions may be explicitly or implicitly embedded in documents. They are useful information and viewpoints to improve services of government or products of companies. We consider that an opinion is a statement expressed towards a topic and contains sentiments. Sentiment words determine the opinion type of an opinion passage and the overall opinion tendency of a document. Sentiment words are the key features in opinion extraction. We propose three approaches, including the Thesaurus-Based Approach, the Character-Based Approach and the Combined Approach, to determine whether an unknown word is positive, negative or non-sentiment. The Thesaurus-Based Approach utilizes the synonym information to classify an unknown word. The Character-Based Approach computes the sentiment score of a Chinese word based on its composite characters and classifies a word by its sentiment score information. The Combined Approach utilizes the synonym information and sentiment scores to classify an unknown word. This approach is the best among these approaches. The F-measure is 73.18% and 63.75% for verbs and nouns, respectively under strict assessment by human. The average F-measure is 70.40%. Finally, we propose the Sentiment Miner based on the Combined Approach to acquire new positive and negative sentiment words from documents. For opinion extraction, we propose the Passage Level Algorithm to detect the opinion passages inside a document. This algorithm utilizes sentiment words and context information. We also propose the Document Level Algorithm to determine the overall opinion tendency of a document based on the opinion passages inside the document. In experiments, the best F-measure is 62.16% at the passage level and 76.56% at the document level.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38881
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 目前未授權公開取用	192.16 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。