以半監督式方法來萃取線上評論者的用戶資訊

Shih-Yu Shu; 書世祐

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67411

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	魏志平(Chih-Ping Wei)
dc.contributor.author	Shih-Yu Shu	en
dc.contributor.author	書世祐	zh_TW
dc.date.accessioned	2021-06-17T01:31:08Z	-
dc.date.available	2017-08-04
dc.date.copyright	2017-08-04
dc.date.issued	2017
dc.date.submitted	2017-08-03
dc.identifier.citation	Cheung, C. M., Shek, S. P., and Sia, C. L. 2004. 'Virtual Community of Consumers: Why People Are Willing to Contribute,' Proceedings of the 8th Pacific-Asia Conference on Information Systems, pp. 2100-2107. Dellarocas, C. 2003. 'The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms,' Management Science (49:10), pp. 1407-1424. Goldenberg, J., Libai, B., and Muller, E. 2001. 'Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth,' Marketing Letters (12:3), pp. 211-223. Gou, L., Zhou, M. X., and Yang, H. 2014. 'Knowme and Shareme: Understanding Automatically Discovered Personality Traits from Social Media and User Sharing Preferences,' Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems: ACM, pp. 955-964. Miyazaki, A. D., and Fernandez, A. 2001. 'Consumer Perceptions of Privacy and Security Risks for Online Shopping,' Journal of Consumer Affairs (35:1), pp. 27-44. Oliver, M. B., Weaver, I., James B, and Sargent, S. L. 2000. 'An Examination of Factors Related to Sex Differences in Enjoyment of Sad Films,' Journal of Broadcasting & Electronic Media (44:2), pp. 282-300. Otterbacher, J. 2010. 'Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content and Metadata,' Proceedings of the 19th ACM International Conference on Information and Knowledge Management: ACM, pp. 369-378. Pan, S. J., and Yang, Q. 2010. 'A Survey on Transfer Learning,' IEEE Transactions on Knowledge and Data Engineering (22:10), pp. 1345-1359. Pennacchiotti, M., and Popescu, A.-M. 2011. 'A Machine Learning Approach to Twitter User Classification,' Icwsm (11:1), pp. 281-288. Rao, D., Yarowsky, D., Shreevats, A., and Gupta, M. 2010. 'Classifying Latent User Attributes in Twitter,' Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents: ACM, pp. 37-44. Schler, J., Koppel, M., Argamon, S., and Pennebaker, J. W. 2006. 'Effects of Age and Gender on Blogging,' AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199-205. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., and Seligman, M. E. 2013. 'Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach,' PlOS One (8:9), p. e73791. Ye, Q., Law, R., Gu, B., and Chen, W. 2011. 'The Influence of User-Generated Content on Traveler Behavior: An Empirical Investigation on the Effects of E-Word-of-Mouth to Hotel Online Bookings,' Computers in Human Behavior (27:2), pp. 634-639.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67411	-
dc.description.abstract	在這個資訊爆炸的時代，人們可以輕易地取得各種資訊，而線上評論是其中一個重要的資訊來源，且會深深影響使用者的決策。如果能進一步知道評論者的個人資料，則對消費者和電商業者都會很有幫助。然而，大部分的線上評論網站基於個人隱私的關係，並沒有提供這些資訊，因此我們只能從評論內容來推測評論者的個人資訊。「用戶分析」這個研究領域專門透過文本或其他資訊來萃取使用者的相關特徵，但是用來訓練分類器的標準答案通常難以取得。因此許多研究人員決定請專家幫忙標注，但人為標注十分耗時又費工。在這篇論文中，我們提出了一個半監督式的方法，希望不用借助人為標注就可以取得標準答案來訓練分類器。我們進行了幾組實驗來比較我們的方法和傳統方法的表現，並描述所觀察到的現象。希望有一天，我們的方法能被實際應用在用戶資訊萃取中，幫助研究人員省下收集標準答案的時間，能更專注在特徵值的萃取和分類器的訓練。	zh_TW
dc.description.abstract	In the modern age where everyone can easily access a variety of information, online review has become an important source and will deeply affect one’s decision. The ability of knowing reviewers’ profiles is helpful for both customers and online retailers in many ways. However, most of online review websites do not provide personal information of reviewers for the privacy concern, and the only clue that can be found is content of review. There is a research field called ‘user profiling’ which focuses on extracting user-profile attributes from corpus by using labeled datasets to train classifiers. Nevertheless, it is hard to get gold-standard datasets because of the lack of ground truth. As a result, many researchers found experts to help them label datasets, yet the manual annotation was a time-consuming and laborious task. In this paper, we propose a semi-supervised approach, trying to get labeled datasets without manual annotation. We conduct experiments to demonstrate the performance of our approach, comparing it with the ideal performance, and describe our observation. We hope that, one day, our method can be applied in user profiling, helping researchers save time on collecting gold-standard datasets, and focus on features extraction and classifier building.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:31:08Z (GMT). No. of bitstreams: 1 ntu-106-R04725054-1.pdf: 964639 bytes, checksum: ffb5f63409dd09ba795bf7691b10a439 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	致謝 I 中文摘要 II Abstract III Contents IV List of Tables VI List of Figures VII Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Motivation and Objective 3 Chapter 2 Literature Review 5 2.1 Overview of User Profiling 5 2.2 Traditional Approaches of User Profiling 8 Chapter 3 Methodology 15 3.1 Semi-supervised Approach 16 3.2 Description of Features Employed 19 3.2.1 Style-Based Features 19 3.2.2 Content-Based Features 21 Chapter 4 Evaluation 22 4.1 Description of Datasets 22 4.1.1 Datasets for Our Gender Classification Task 23 4.1.2 Datasets for Our Nationality Classification Task 24 4.2 Evaluation Criteria and Procedure 25 4.3 Benchmark (Ideal Performance) 26 4.4 Experiment results 26 4.4.1 Comparing with Ideal Performance 26 4.4.2 Comparing with sampled benchmark 32 4.4.3 Random sampling of proxy dataset 35 Chapter 5 Conclusion and Discussion 38 References 40
dc.language.iso	en
dc.subject	半監督式學習	zh_TW
dc.subject	產品評論	zh_TW
dc.subject	文字探勘	zh_TW
dc.subject	用戶資訊萃取	zh_TW
dc.subject	semi-supervised	en
dc.subject	user profiling	en
dc.subject	text mining	en
dc.subject	online reviews	en
dc.title	以半監督式方法來萃取線上評論者的用戶資訊	zh_TW
dc.title	A Semi-supervised Approach for Profiling Online Reviewers	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	盧信銘(Hsin-Min Lu),吳基逞(Chi-Cheng Wu)
dc.subject.keyword	用戶資訊萃取,半監督式學習,文字探勘,產品評論,	zh_TW
dc.subject.keyword	user profiling,semi-supervised,text mining,online reviews,	en
dc.relation.page	42
dc.identifier.doi	10.6342/NTU201702262
dc.rights.note	有償授權
dc.date.accepted	2017-08-03
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	942.03 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。