中文情緒詞庫的建造與標記

Pei-Yu Lu; 呂珮瑜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49107

標題:	中文情緒詞庫的建造與標記 Affective Lexicon in Chinese – Construction and Annotation
作者:	Pei-Yu Lu 呂珮瑜
指導教授:	謝舒凱(Shu-Kai Shie)
關鍵字:	情緒指稱詞,情緒示意詞,情緒詞,語意韻律,詞組塊, emotion denoting words,emotion signaling words,emotion words,semantic prosody,chunk,
出版年 :	2015
學位:	碩士
摘要:	情緒詞表為情緒偵測研究的基礎資源，考量現有開放性中文詞表大都以情緒指稱詞(affect-denoting)為主，關於情緒示意詞(affect-signaling)的收錄較為缺乏，然而從認知語意以及語用學的角度而言，情緒示意詞卻在情緒表達的語言使用上扮演極為關鍵的角色，情緒韻律(semantic prosody)說明看似中性的詞，其實隱涵了正負偏向的關聯；而情緒與語言的對應往往跨越了字詞的邊界，詞組塊(chunk)也能表達情緒，而非固定的一詞對照一情緒。因此本研究將現有中文情緒指稱詞詞表整合分類，並且人工收集與標記中文情緒示意詞，作為中文情緒偵測研究的基礎資源，同時也證明功能語法在文本中情緒辨識的功用。本研究分為兩階段，第一階段為人工收集、標記與分類，第二階段為詞表的評測與應用。第一階段將情緒指稱詞從現有詞表整合且分類，分為高興、難過、害怕、生氣、驚訝五類，再依據該詞指稱的情緒強度與持續時間細分至情緒(emotion)、心情(mood)、脾氣(temperament)三類之中。另一方面，情緒示意詞的收集則從兩個角度的語料庫進行: 作者分類的情緒文章(PTT心情版900篇)，讀者分類的情緒文章(Yahoo心情新聞1000篇)，從中進行詞組塊的人工標記與分類。此外，也收錄常見的情緒用語，如：感嘆詞、表情符號、髒話與辱罵詞等。第二階段評測分為兩部分，第一步估算每個情緒示意詞的情緒預測能力，該數值為文本語料庫中每次該詞出現後接十個詞的情緒分數平均值。第二部為檢驗該預測能力，將情緒示意詞抽取正負各十組，由情緒詞加總的簡易計算法，以人工評分的情緒文本為標準，比較有情緒示意詞的情況，在準確率上的提升：正向詞組平均提升4.78%，負向詞組18.18%。最後，應用方面，使用於Magistry et al (2015)的中文短文情緒偵測機器學習研究，F1分數提升近2%。 Affective lexicon is the fundamental resource for sentiment detection. However, most existing Chinese affective lexicon is mainly about affect-denoting words and lacks of affect-signaling words. From the aspect of cognitive semantics and pragmatics, affect-signaling words play a critical role in emotion expression of language use. Semantic prosody explains neutral words would have association with positive or negative polarity, while the functional theory shows the connection between words and meaning is not one-on-one, neither is the connection between words and emotion. The corresponding of emotion and language expression might beyond the boundaries of words: chunks. Therefore, the research aims to collect annotate affect-signaling words and organize it with affect-denoting words into a multi-dimensional affective lexicon in Chinese. The function of the result is not only for the open resource for sentiment analysis, but also as an evidence of how functional grammar works in sentiment detection in texts. Two phases of process involve in the research. First is manual collection, annotation, and categorization of affective lexicon. Second is the evaluation and application. In first stage, affect-denoting words are categorized into 5 categories (happy, sad, scared, angry, and surprised) and 3 levels (emotion, mood, temperament), according to the strength and duration. On the other hand, affect-signaling words are collected and annotated from two sources of database: author-oriented emotional articles (from BBS) and reader-oriented emotional news (from yahoo news). Besides, the common emotion expression words are collected as well, including interjections, emoticons, and expletives. In phase two, the emotion-prediction ability of each affect-signal words is calculated by the mean scores of emotion value in the following ten words. To measure the result, the random sample of affect-signaling words are added in the NTUSD as the affective lexicon for sentiment analysis to compare the accuracy with/without affect-signaling words. The promotion of the accuracy in positive affect-signaling words is 4.78% while the negative one is 18.18%. In the application, the whole affective lexicon is applied on an unsupervised machine leaning approach to sentiment detection of micro-blog data in Chinese (Magistry et al, 2015), and yields the promising result of nearly 2% improvement in the original F1-score.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49107
DOI:	10.6342/NTU201602978
全文授權:	有償授權
顯示於系所單位：	語言學研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	3.39 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。