基於少數關鍵字之半監督式學習法進行評論文件分類

Ya-Ting Chen; 陳雅婷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61493

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬
dc.contributor.author	Ya-Ting Chen	en
dc.contributor.author	陳雅婷	zh_TW
dc.date.accessioned	2021-06-16T13:04:13Z	-
dc.date.available	2016-08-09
dc.date.copyright	2013-08-09
dc.date.issued	2013
dc.date.submitted	2013-08-05
dc.identifier.citation	[1] HC Yu, TH Huang, HH Chen, “Domain Dependent Word Polarity Analysis for Sentiment Classification”, ROCLING, 2012. [2] Justin Martineau and Tim Finin. “Delta TFIDF: An Improved Feature Space for Sentiment Analysis”, In Proceedings of the Third AAAI International Conference on Weblogs and Social Media, San Jose, CA, May. AAAI Press. 2009. (Poster paper) [3] Youngjoong Ko, “A Study of Weighting Schemes Using Class Information for Text Classification”, Proceedings of the 35th international ACM SIGIR conference on Research and development in information (SIGIR’12), pp.1029-1030. (Poster paper) [4] CC Huang, SL Chuang, LF Chien, “LiveClassiﬁer: Creating Hierarchical Text Classiﬁers through Web Corpora”, Proceedings of the 13th international conference on World Wide Web (WWW’04), pp.184-192. [5] CC Chang and CJ Lin. “Libsvm: A Library for Support Vector Machines.” ACM Transactions on Intelligant Systems and Technology, 2:27:1-27:27,2011. [6] Lun-Wei Ku and Hsin-Hsi Chen. (2007). “Mining opinions from the web: beyond relevance retrieval.” Journal of American Society for Information Science and Technology, Special Issue on Mining web Resources for Enhancing Information Retrieval, 58(12), pp.1838-1850. [7] Pekar, V. and Ou, S (2008). Discovery of Subjective Evaluations of Product Features in Hotel Reviews. Journal of Vocational Marketing, 14: 145-155. (Previously published in Proceedings of the First Conference on Blogs in Tourism. Kitzbuehel, Austria) [8] Turney, P.D. and Littman, M.L., “Measuring Praise and Criticism: Inference of semantic Orientation from Association”, ACM transactions on Information Systems, Vol 21, pp. 315-346, 2003. [9] Turney, P.D., “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). Philadelphia, Pennsylvania, USA. [10] Chaovalit P. and Zhou, L., “Movie Review Mining: a Comparison between Supervised and Unsupervised”, Proceedings of the 38th Hawaii International Conference on System Sciences, pp.112c-112c, 2005. [11] Zhongchao Fei; Jian Liu; Gengfeng Wu; “Sentiment Classification Using Phrase Patterns ”, Computer and Information Technology, 2004. CIT’04. The Fourth International Conference on , vol., no., pp.1147 –1152, 14-16 Sept. 2004. [12] Joachims, T., “Text Categorization with Support Vector Machines: Learning with Many Relenvant Features”, Proceedings of th Eourpean Conference on Machine Learning, pp. 21-24 (pp. 137-142), 1998. [13] Yang, Y. and Liu, X. “A Re-examination of Text Categorization Methods”, Proceedings of 22th ACM International Conference on Research and Development inn Information Retrieval pp.42-49 1999. [14] G. Salton, A. Wong, and C. S. Yang (1975), “A Vector Space Model for Automatic Indexing”, Communications of the ACM, vol. 18, nr. 11, pages 613-620. (Article in which a vector space model was presented) 1975. [15] G. Salton, C. Buckley. “Term Weighting Approaches in Automatic Text Retrieval.” Information Processing and Management, 24(5):513-523, 1988. [16] Pang, B., Lee, L., and Vaithyanathan, S., “Thumbs up? Sentiment Classification Using Machine Learning Techniques”, The 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02), pp.79-86, 2002. [17] P Chesley, B Vincent, L Xu, RK Srihari Training. “Using Verbs and Adjectives to Automatically Classify Blog Sentiment”, 2006, American Association for Artificial Intelligence
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61493	-
dc.description.abstract	評論文件分類(Review Classification)跟情緒分析有些類似，而情緒分析(Sentiment Analysis)主要是探討撰寫者的情緒狀態，具有高度領域相關(Domain Dependent)的特性。評論文件的極性分類最主要的問題是希望能夠自動化的分類那些沒有被標記過的評論文件極性，針對不同領域的文章進行分析，可能有不同的結果。本論文利用少量分類的關鍵字進行文件分類，使用兩種完全不同領域的語料進行研究，探討不同領域的代表詞彙，進而分類文件極性，實驗結果顯示分類效能有不錯的效果。	zh_TW
dc.description.abstract	Review classification and sentiment analysis are similar. Sentiment analysis mainly aims at exploring the emotional state of writers. The analysis highly depends on the application domains. The goal of review classification is the task of automatically classifying unlabeled documents. Analyzing polarity of the articles in different domains may have different results. In this study, we focus on two different domains of data, and use a few positive and negative keywords about that domain to classify the sentiment of articles. The experiments show that the proposed methods have a better classification performance.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T13:04:13Z (GMT). No. of bitstreams: 1 ntu-102-R00922066-1.pdf: 1360291 bytes, checksum: 460e83499c175cdea43554f1372990bc (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	目錄口試委員會審定書 # 誌謝 i 中文摘要 iii ABSTRACT v 目錄 vii 圖目錄 xi 表目錄 xiii 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 2 第二章文獻探討 5 2.1 非監督式學習法 5 2.1.1 情緒辭典 5 2.2 監督式機器學習法 6 2.2.1 特徵擷取 7 2.2.2 支援向量機 7 2.3 半監督式學習法 9 第三章研究方法 11 3.1 實驗語料 11 3.1.1 旅館評論 11 3.1.2 旅館評論 12 3.2 斷詞系統 13 3.3 實驗架構 15 3.4 實驗設定 15 3.5 實驗方法 16 3.5.1 累加分類法(Counting Classification) 17 3.5.2 雙重樸素貝葉斯分類器(Dual Naïve Bayes Classification) 18 3.5.3 回饋 20 3.6 評量方法 22 第四章實驗結果 23 4.1 類別極性分類結果與分析 23 4.1.1 旅館評論分類結果 23 4.1.2 旅館評論分類之分析 24 4.1.3 餐廳評論分類結果 25 4.1.4 餐廳評論分類之分析 26 4.1.5 綜合比較 27 4.2 回饋影響與分析 28 4.3 錯誤分析 30 4.3.1 旅館錯誤分析 30 4.3.2 餐廳錯誤分析 31 4.4 訓練語料分析 34 4.4.1 旅館訓練語料分析 34 4.4.2 餐廳訓練語料分析 35 4.5 網路擷取關鍵字分析 36 4.5.1 旅館擷取關鍵字分析 36 4.5.2 餐廳擷取關鍵字分析 37 4.5.3 綜合分析 38 4.6 偽標記語料擷取關鍵字分析 38 4.6.1 旅館擷取關鍵字分析 38 4.6.2 餐廳擷取關鍵字分析 39 4.6.3 綜合分析 40 4.7 門檻值分析 40 4.7.1 旅館門檻值分析 40 4.7.2 餐廳門檻值分析 42 4.8 LibSVM效能分析與比較 43 4.8.1 旅館效能分析與比較 43 4.8.2 餐廳效能分析與比較 44 第五章結論與未來研究方向 45 5.1 研究結論 45 5.2 未來研究方向 46 REFERENCE 47 圖目錄圖 1 語料分類流程 6 圖 2 線性分類器 8 圖 3 支援向量機模型 8 圖 4 實驗架構流程圖 15 圖 5 透過搜尋引擎擴充字彙 17 圖 6 分類結果示意圖 18 圖 7 假回饋(Pseudo Feedback)方法 21 圖 8 旅館評論分類比較圖 24 圖 9 網路擷取旅館詞彙分布圖 25 圖 10 餐廳評論分類比較圖 26 圖 11 網路擷取餐廳詞彙分布圖 27 圖 12 分類綜合比較圖(Accuracy) 28 圖 13 回饋影響圖(Accuracy) 29 圖 14 網路擷取旅館關鍵字分析圖 37 圖 15 網路擷取餐廳關鍵字分析圖 38 圖 16 偽標記語料擷取旅館關鍵字分析圖 39 圖 17 偽標記語料擷取餐廳關鍵字分析圖 40 圖 18 LibSVM旅館預測分析圖 43 圖 19 LibSVM餐廳預測分析圖 44 表目錄表 1 語料比較表 11 表 2 旅館評論範例 12 表 3 餐廳評論範例 13 表 4 斷詞方法比較 14 表 5 斷詞結果範例 14 表 6 使用者提供極性詞彙 16 表 7 分類結果表 22 表 8 旅館評論極性分類結果 23 表 9 餐廳評論極性分類結果 25 表 10 分類綜合比較表(Accuracy) 27 表 11 回饋影響比較(Accuracy) 28 表 12 詞彙出現次數與比例 29 表 13 誤判評論為負面 30 表 14 誤判評論為正面 31 表 15 誤判評論為負面 32 表 16 誤判評論為正面 33 表 17 旅館訓練語料分析表 34 表 18 餐廳訓練語料分析表 35 表 19 網路擷取關鍵字分析－旅館 36 表 20 網路擷取關鍵字分析－餐廳 37 表 21 偽標記語料擷取關鍵字分析－旅館 39 表 22 偽標記語料擷取關鍵字分析－餐廳 39 表 23 旅館門檻值比較 41 表 24 旅館回饋門檻值比較 41 表 25 旅館綜合比較 41 表 26 餐廳門檻值比較 42 表 27 餐廳回饋門檻值比較 42 表 28 餐廳綜合比較 42 表 29 LibSVM旅館預測分析 43 表 30 LibSVM餐廳預測分析 44
dc.language.iso	zh-TW
dc.subject	評論文件分類	zh_TW
dc.subject	情緒分析	zh_TW
dc.subject	關鍵字	zh_TW
dc.subject	Review Classification	en
dc.subject	Sentiment Analysis	en
dc.subject	Category Keyword	en
dc.title	基於少數關鍵字之半監督式學習法進行評論文件分類	zh_TW
dc.title	Semi-supervised Review Classification with a Few Polarity Keywords	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林正偉,邱志義
dc.subject.keyword	評論文件分類,情緒分析,關鍵字,	zh_TW
dc.subject.keyword	Review Classification,Sentiment Analysis,Category Keyword,	en
dc.relation.page	49
dc.rights.note	有償授權
dc.date.accepted	2013-08-05
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	1.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。