請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42572
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳信希(Hsin-Hsi Chen) | |
dc.contributor.author | Chia-Ying Lee | en |
dc.contributor.author | 李佳穎 | zh_TW |
dc.date.accessioned | 2021-06-15T01:16:37Z | - |
dc.date.available | 2009-07-29 | |
dc.date.copyright | 2009-07-29 | |
dc.date.issued | 2009 | |
dc.date.submitted | 2009-07-28 | |
dc.identifier.citation | [1] Avrim Blum and Tom Mitchell. “Combining labeled and unlabeled data with co-training.” Conference on Computational Learning Theory, pp. 92-100, 1998
[2] Eric Breck and Yejin Choi and Claire Cardie. “Identifying expressions of opinion in context.” Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2683-2688, 2007 [3] Chih-Chung Chang and Chih-Jen Lin. “LIBSVM: a library for support vector machines”, 2001 [4] Yejin Choi, Claire Cardie, Ellen Riloff and Siddharth Patwardhan. “Identifying sources of opinions with conditional random fields and extraction patterns.” Proceedings of EMNLP conference, pp. 355-362, 2005 [5] Soo-Min Kim and Eduard Hovy. “Determining the sentiment of opinions.” Proceedings of the COLING conference, pp.1367-1374 , 2004 [6] Soo-Min Kim and Eduard Hovy. “Extracting opinions, opinion holders, and topics expressed in online news media text.” Proceedings of the Workshop on Sentiment and Subjectivity in Text at the joint COLING-ACL conference, pp. 1-8, 2006 [7] Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng. “Identifying opinion holders in opinion text from online newspapers.” International Conference on Granular Computing, pp. 699-702, 2007 [8] Youngho Kim, Seongchan Kim and Sung-Hyon Myaeng. “Extracting topic-related opinions and their targets in NTCIR-7.” Proceedings of the Seventh NTCIR Workshop, pp. 247-254, 2008 [9] Lun-Wei Ku and Hsin-Hsi Chen. “Mining opinions from the web: beyond relevance retrieval.” Journal of American Society for Information Science and Technology, pp. 1838-1850, 2007 [10] Lun-Wei Ku, I-Chien Liu, Chia-Ying Lee, Kuan-hua Chen and Hsin-Hsi Chen “Sentence-level opinion analysis by CopeOpi in NTCIR-7.” Proceedings of the Seventh NTCIR Workshop, pp. 260-267, 2008 [11] Taku Kudo, “CRF++: yet another CRF toolkit.” http://crfpp.sourceforge.net/ , 2003 [12] John Lafferty, Andrew McCallum, Fernando Pereira. “Conditional random fields: probabilistic models for segmenting and labeling sequence data” In Proceedings of International Conference on Machine Learning, pp. 282-289, 2001 [13] Kang Liu and Jun Zhao. “NLPR at Multilingual Opinion Analysis Task in NTCIR7.” Proceedings of the Seventh NTCIR Workshop, pp. 226-231, 2008 [14] Xinfan Meng and Houfeng Wang. “Detecting opinionated sentences by extracting context information.” Proceedings of the Seventh NTCIR Workshop, pp. 268-271, 2008 [15] Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz and Timm Euler. “YALE: rapid prototyping for complex data mining tasks” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935-940, 2006 [16] Bo Pang and Lillian Lee. “Opinion mining and sentiment analysis” Foundations and Trends in Information Retrieval, Vol. 2, pp. 1-135, 2008 [17] Guang Qiu, Kangmiao Liu, Jiajun Bu, Chun Chen and Zhiming Kang. “Extracting opinion topics for Chinese opinions using dependence grammar.” Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, pp. 40-45, 2007 [18] Yohei Seki, David Kirk Evans, Lun-Wei Ku, Le Sun, Hsin-Hsi Chen and Noriko Kando. ”Overview of multilingual opinion analysis task at NTCIR-7.” Proceedings of the Seventh NTCIR Workshop, pp.185-203, 2008 [19] Yohei Seki, Noriko Kando and Masaki Aono. “Multilingual opinion holder identification using author and authority viewpoints.” Journal of Information Processing and Management, pp. 189-199, 2009 [20] Yu-Chieh Wu, Li-Wei Yang, Jeng-Yan Shen, Liang-Yu Chen and Shih-Tung Wu. “Tornado in multilingual opinion analysis: a transductive learning approach for Chinese sentimental polarity recognition.” Proceedings of the Seventh NTCIR Workshop, pp. 301-306, 2008 [21] Ruifeng Xu and Kam-Fai Wong. “Coarse-Fine opinion mining – WIA in NTCIR-7 MOAT task.” Proceedings of the Seventh NTCIR Workshop, pp. 307-313, 2008 [22] 羅永聖,“結合多類型字典與條件隨機域之中文斷詞與詞性標記系統研究.”國立台灣大學碩士論文, 2008 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42572 | - |
dc.description.abstract | 意見代表人們對某個議題的主觀想法,人們常透過文章表述意見,意見探勘 (opinion mining) 的技術可以幫助使用者自動分析大量文章中的意見。意見包括意見傾向 (opinion polarity) 、意見強度 (opinion strength) 、意見持有者 (opinion holder) 及評論目標 (opinion target) 四個要素。意見中表述意見的人或組織稱為意見持有者,意見持有者在不同議題中發表過的意見代表他的意見立場。在意見探勘中,意見持有者辨識的技術對於了解有哪些人或組織在表述意見、某個意見持有者的意見立場及兩個意見持有者的意見立場是否相似等相關資訊特別重要。意見持有者辨識主要有五個挑戰:同指涉解析 (coreference resolution) 、巢狀結構 (nested structure) 、處理歧異的標記、完整的意見持有者及候選詞選擇。
意見持有者辨識的目的是從意見句中擷取表述意見的人或組織,本研究提出一個以機器學習為基礎的意見持有者辨識方法。本研究將意見持有者辨識分為作者意見辨識及意見持有者標記兩部分。在作者意見辨識中,本研究提出詞彙相關資訊、詞性相關資訊、具名實體資訊、標點符號資訊、文句組成資訊及意見相關資訊等特徵值並使用支援向量機來解決此問題。在意見持有者標記中,本研究提出詞彙相關資訊、詞性相關資訊、具名實體資訊、標點符號資訊、文句組成資訊、前後文相關資訊及意見相關資訊等特徵值並使用條件隨機域模型來解決此問題。最後結合作者意見辨識結果及意見持有者標記結果產生最後的意見持有者。 本研究所實作出來的系統,應用在NTCIR7多語意見分析評比項目繁體中文語料的評估上,可以達到F值為0.734的效能,是採取機器學習方法的參賽隊伍中效能最佳的,也相當接近目前最佳系統的效能。對於意見持有者辨識語料中標記歧異的情形,本研究加以分析,並提出使用此語料來訓練模型的方法,本研究也針對系統辨識錯誤之結果分析,並提出兩種解決方法:具名實體修復及意見持有者尾詞標記。 本研究將意見持有者辨識應用於意見立場分析,我們以意見持有者在不同主題中的意見傾向來代表意見立場,並以餘弦相似度代表兩個意見持有者的意見立場相似程度。我們分別使用正確答案與系統辨識出之答案做意見立場分析,雖然本系統辨識之答案的效能不是最佳的,但應用在意見立場分析上可以達到與正確答案類似的結果。 | zh_TW |
dc.description.abstract | People write various articles in order to express their opinions. The opinion includes opinion polarity, opinion strength, opinion target and opinion holder. In this paper, we focus on the identification of opinion holders. In each article, the opinion holder could be the post-author or a nominal (noun, noun phrase or named entity) which issues some opinions in the article. In this paper, the task of opinion holder identification is divided into two subtasks: identification of author’s opinions and labeling of opinion holders, respectively. In this paper, we apply SVM (Support Vector Machine) and CRF (Conditional random field) to automatically extract opinion holders. The SVM is adopted to identify author’s opinions, and the CRF is utilized to label opinion holders (i.e., nominals). We propose some features including lexical features, part-of-speech features, named entity features, punctuation mark features, position features, context features and opinion-word features in the SVM and the CRF. Finally, the mining process will combine the result of the SVM and CRF. In experiments, the proposed method achieves the F-score 0.734 in NTCIR7 MOAT task at traditional Chinese side. It is best than other teams who utility learning methods. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T01:16:37Z (GMT). No. of bitstreams: 1 ntu-98-R96922084-1.pdf: 2593263 bytes, checksum: abb46190d0c34e14a33c16879f5d3ccb (MD5) Previous issue date: 2009 | en |
dc.description.tableofcontents | 誌謝 I
中文摘要 II 英文摘要 III 目錄 IV 附圖目錄 VI 附表目錄 VII 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 問題挑戰 2 1.4 相關研究 3 1.4.1 以經驗法則為基礎的相關研究 4 1.4.2 以機器學習為基礎的相關研究 4 1.4.3 本研究提出的方法 5 1.5 論文架構 5 第二章 意見持有者辨識方法 6 2.1 辨識流程 6 2.2 針對斷詞與詞性標記的特殊處理 7 2.3 作者意見辨識 8 2.3.1 特徵值擷取 9 2.3.2 支援向量機 10 2.4 意見持有者標記 11 2.4.1 特徵值擷取 11 2.4.2 決策樹演算法 14 2.4.3 條件隨機域模型 14 2.4.4 協同訓練 15 2.5 後置處理 16 2.5.1 意見持有者為詞組時之特殊處理 16 2.5.2 具名實體修復 17 2.6 合併作者意見辨識及意見持有者標記之結果 18 第三章 實驗與討論 20 3.1 實驗語料 20 3.1.1 NTCIR 7多語意見分析評比項目介紹 20 3.1.2 意見分析標記工具 21 3.1.3 意見持有者標記原則與答案產生方式 22 3.2 實驗資源 23 3.2.1 意見詞詞典 23 3.2.2 具名實體詞典 23 3.3 作者意見辨識實驗 24 3.3.1 實驗一、特徵集對作者意見辨識效能的影響 26 3.3.2 實驗二、訓練語料對作者意見辨識效能的影響 27 3.3.3 實驗三、標記歧異的語料對作者意見辨識效能的影響 29 3.4 意見持有者標記實驗 30 3.4.1 實驗一、分類演算法對意見持有者標記效能的影響 32 3.4.2 實驗二、特徵集對意見持有者標記效能的影響 35 3.4.3 實驗三、CRF標籤集對意見持有者標記效能的影響 36 3.4.4 實驗四、協同訓練與具名實體修復對意見持有者標記效能的影響 39 3.5 意見持有者辨識整體實驗 40 3.5.1 結果合併策略對效能的影響 40 3.5.2 與NTCIR 7參賽隊伍比較 41 第四章 意見立場分析 44 4.1 意見立場分析方法 44 4.2 意見立場分析結果 46 4.2.1 以所有意見持有者實體分析 46 4.2.2 以代表國家政府的意見持有者實體分析 47 4.3 意見立場分析的比較與討論 50 第五章 結論與未來展望 51 5.1 結論 51 5.2 未來展望 52 參考文獻 53 附錄一:意見持有者實體近義詞列表 55 | |
dc.language.iso | zh-TW | |
dc.title | 意見持有者辨識及其意見立場分析 | zh_TW |
dc.title | A Study on Identification of Opinion Holders and Analysis of Their Viewpoints | en |
dc.type | Thesis | |
dc.date.schoolyear | 97-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 鄭卜壬,林川傑,郭俊桔 | |
dc.subject.keyword | 意見持有者辨識,意見立場分析,意見探勘,條件隨機域,支援向量機, | zh_TW |
dc.subject.keyword | opinion holder identification,opinion viewpoint analysis,opinion mining,conditional random field,CRF,support vector machine,SVM, | en |
dc.relation.page | 55 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2009-07-28 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-98-1.pdf 目前未授權公開取用 | 2.53 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。