Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73825
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor魏志平
dc.contributor.authorHung-Chen Chenen
dc.contributor.author陳泓志zh_TW
dc.date.accessioned2021-06-17T08:11:11Z-
dc.date.available2021-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-15
dc.identifier.citationBarrick, M. R. and Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1):1–26.
Barrick, M. R., Mount, M. K., and Strauss, J. P. (1993). Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology, 78(5):715–722.
Bhat, S. and Reddy, S. K. (1998). Symbolic and functional positioning of brands. Journal of Consumer Marketing, 15(1):32–43.
Bleidorn, W. and Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 23(2):190–203. PMID: 29792115.
Buettner, R. (2017). Predicting user behavior in electronic markets based on personality-mining in large online social networks. Electronic Markets, 27(3):247–265.
Caprara, G. V., Barbaranelli, C., Borgogni, L., and Perugini, M. (1993). The “big five questionnaire”: A new questionnaire to assess the five factor model. Personality and Individual Differences, 15(3):281–288.
Celli, F., Pianesi, F., Stillwell, D., and Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. In Proceedings of Seventh International AAAI Conference on Weblogs and Social Media.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
Chen, Z., Jiang, F., Cheng, Y., Gu, X., Liu, W., and Peng, J. (2018). Xgboost classifier for ddos attack detection and analysis in sdn-based cloud. In 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 251–256.
Coltheart, M. (1981). The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4):497–505.
Costa, P. T., J. and McCrae, R. R. (1980). Still stable after all these years: Personality as a key to some issues in adulthood and old age. Life span development and behavior, 3:65–102. New York: Academic Press.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Computing Research Repository (CoRR), abs/1810.04805.
Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1):417–440.
Eysenck, H. J. and Eysenck, S. B. G. (1964). Manual of the eysenck personality inventory. London: University of London Press.
Eysenck, H. J. and Eysenck, S. B. G. (1975). Manual of the eysenck personality questionnaire. San Diego: EdITS.
Fadaee, M., Bisazza, A., and Monz, C. (2017). Data augmentation for low-resource neural machine translation. Computing Research Repository (CoRR), abs/1705.00440.
Gaikwad, G. and Joshi, D. J. (2016). Multiclass mood classification on twitter using lexicon dictionary and machine learning algorithms. In Proceedings of 2016 International Conference on Inventive Computation Technologies (ICICT), volume 1, pages 1–6.
Garretsen, H., Stoker, J. I., Soudis, D., Martin, R., and Rentfrow, J. (2018). The relevance of personality traits for urban economic growth: making space for psychological factors. Journal of Economic Geography, 19(3):541–565.
Goldberg, L. (1981). Language and individual differences: The search for universals in personality lexicons. In L. Wheeler (Ed.), Review of Personality and Social Psychology, pages 141–165. Beverly Hills, CA: Sage Publication.
Goldberg, L. R. (1990). An alternative” description of personality”: the big-five factor structure. Journal of Personality and Social Psychology, 59(6):1216.
Goldberg, L. R. et al. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. Personality Psychology in Europe, 7(1):7–28.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., and Gough, H. G. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1):84–96.
Gosling, S. D., Rentfrow, P. J., and Swann, W. B. (2003). A very brief measure of the big-five personality domains. Journal of Research in Personality, 37(6):504–528.
Gupta, V., Varshney, D., Jhamtani, H., Kedia, D., and Karwa, S. (2014). Identifying purchase intent from social posts. In Eighth International AAAI Conference on Weblogs and Social Media.
Halder, S., Roy, A., and K. Chakraborty, P. (2010). The influence of personality traits on information seeking behaviour of students. Malaysian Journal of Library & Information Science, 15:41–53.
He, W., Wu, H., Yan, G., Akula, V., and Shen, J. (2015). A novel social media competitive analytics framework with sentiment benchmarks. Information and Management, 52(7):801–812.
Howard, J. and Ruder, S. (2018). Fine-tuned language models for text classification. Computing Research Repository (CoRR), abs/1801.06146.
Hu, T., Xiao, H., Luo, J., and Nguyen, T.-v. T. (2016). What the language you tweet says about your occupation. In Tenth International AAAI Conference on Web and Social Media.
Johnson, J. A. (2014). Measuring thirty facets of the five factor model with a 120-item public domain inventory: Development of the ipip-neo-120. Journal of Research in Personality, 51:78–89.
Judge, T. A., Higgins, C. A., Thoresen, C. J., and Barrick, M. R. (1999). The big five personality traits, general mental ability, and career success across the life span. Personnel Psychology, 52(3):621–652.
Kobayashi, S. (2018). Contextual augmentation: Data augmentation by words with paradigmatic relations. Computing Research Repository (CoRR), abs/1805.06201.
Liu, F., Perez, J., and Nowson, S. (2016). A language-independent and compositional model for personality trait recognition from short texts. Computing Research Repository (CoRR), abs/1610.04345.
Liu, L., Liu, K., Cong, Z., Zhao, J., Ji, Y., and He, J. (2018). Long length document classification by local convolutional feature aggregation. Algorithms, 11(8).
Lounsbury, J. W., Loveland, J. M., Sundstrom, E. D., Gibson, L. W., Drost, A. W., and Hamrick, F. L. (2003). An investigation of personality traits in relation to career satisfaction. Journal of Career Assessment, 11(3):287–307.
Luckner, M., Topolski, B., and Mazurek, M. (2017). Application of xgboost algorithm in fingerprinting localisation task. In Saeed, K., Homenda, W., and Chaki, R., editors, Computer Information Systems and Industrial Management, pages 661-671, Cham. Springer International Publishing.
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30:457–500.
Majumder, N., Poria, S., Gelbukh, A., and Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems,32(2):74–79.
McCrae, R. R. and Costa, P. T. (1985). Comparison of epi and psychoticism scales with measures of the five-factor model of personality. Personality and Individual Differences, 6(5):587–597.
McCrae, R. R. and Costa Jr., P. T. (1989). Reinterpreting the myers-briggs type indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1):17–40.
McCrae, R. R. and John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2):175–215.
Merity, S., Keskar, N. S., and Socher, R. (2017a). Regularizing and optimizing LSTM language models. Computing Research Repository (CoRR), abs/1708.02182.
Merity, S., Xiong, C., Bradbury, J., and Socher., R. (2017b). Pointer sentinel mixture models. In Proceedings of the International Conference on Learning Representations.
Mugge, R., Govers, P. C., and Schoormans, J. P. (2009). The development and testing of a product personality scale. Design Studies, 30(3):287–302.
Nguyen, T., Phung, D., Adams, B., Tran, T., and Venkatesh, S. (2010). Classification and pattern discovery of mood in weblogs. In Zaki, M. J., Yu, J. X., Ravindran, B., and Pudi, V., editors, Advances in Knowledge Discovery and Data Mining, pages 283–290, Berlin, Heidelberg. Springer Berlin Heidelberg.
Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6):574.
Ortigosa, A., Carro, R. M., and Quiroga, J. I. (2014). Predicting user personality by mining social interactions in facebook. Journal of Computer and System Sciences, 80(1):57–71.
Park, C. W., Jaworski, B. J., and MacInnis, D. J. (1986). Strategic brand concept image management. Journal of Marketing, 50(4):135–145.
Peng, K., Liou, L., Chang, C., and Lee, D. (2015). Predicting personality traits of chinese users based on facebook wall posts. In Proceedings of 2015 24th Wireless and Optical Communication Conference (WOCC), pages 9–14.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Pennebaker, J. W. and King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6):1296.
Pennebaker, J. W. and Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology, 85(2):291.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. Computing Research Repository (CoRR), abs/1802.05365.
Pratama, B. Y. and Sarno, R. (2015). Personality classification based on twitter text using naive bayes, knn and svm. In 2015 International Conference on Data and Software Engineering (ICoDSE), pages 170–174.
Rangel Pardo, F. M., Celli, F., Rosso, P., Potthast, M., Stein, B., and Daelemans, W. (2015). Overview of the 3rd author profiling task at pan 2015. In Proceedings of CLEF 2015 Evaluation Labs and Workshop Working Notes Papers, pages 1–8.
Salgado, J. F. (1997). The five factor model of personality and job performance in the european community. Journal of Applied Psychology, 82(1):30.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9):e73791.
Seibert, S. E. and Kraimer, M. L. (2001). The five-factor model of personality and career success. Journal of Vocational Behavior, 58(1):1–21.
Sennrich, R., Haddow, B., and Birch, A. (2015). Improving neural machine translation models with monolingual data. Computing Research Repository (CoRR),abs/1511.06709.
Sun, X., Liu, B., Cao, J., Luo, J., and Shen, X. (2018). Who am i? personality detection based on deep learning for texts. In 2018 IEEE International Conference on Communications (ICC), pages 1–6.
Vainik, U., Dagher, A., Realo, A., Colodro-Conde, L., Mortensen, E. L., Jang, K., Juko, A., Kandler, C., Sørensen, T.I., and Mo ̃ttus, R.(2019).Personality-obesity associations are driven by narrow traits: A meta-analysis. Obesity Reviews, 20(8):1121–1131.
Vinciarelli, A. and Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3):273–291.
Wang, W. Y. and Yang, D. (2015). That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# petpeeve tweets. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2557–2563.
Wei, H., Zhang, F., Yuan, N. J., Cao, C., Fu, H., Xie, X., Rui, Y., and Ma, W.-Y. (2017). Beyond the words: Predicting user personality from heterogeneous information. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pages 305–314, New York, NY, USA. ACM.
Xue, D., Wu, L., Hong, Z., Guo, S., Gao, L., Wu, Z., Zhong, X., and Sun, J. (2018). Deep learning-based personality recognition from text posts of online social networks. Applied Intelligence, 48(11):4232–4246.
Youyou, W., Kosinski, M., and Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4):1036–1040.
Yu, J. and Markov, K. (2017). Deep learning based personality recognition from facebook status updates. In Proceedings of 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST), pages 383–387.
Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., and Si, Y. (2018). A data-driven design for fault detection of wind turbines using random forests and xgboost. IEEE Access, 6:21020–21031.
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., and Carin, L. (2017). Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning, pages 4006–4015.
Zhu, Y., Lu, S., Zheng, L., Guo, J., Zhang, W., Wang, J., and Yu, Y. (2018). Texygen: A benchmarking platform for text generation models. In Proceesings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 1097–1100, New York, NY, USA. ACM.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73825-
dc.description.abstract在過往的許多研究中已經證明人的個性跟人的生活、行為和喜好有 非常大的關聯。根據這些關聯,知道一個人的個性便有助於企業進行 人力資源管理,幫助企業找到他們的目標客群,以及幫助其他任何需 要對人有初步了解的任務。為了有效率的偵測一個人的個性,目前已 經有很多方法利用使用者生成的資料來進行自動化的個性推測。隨著 人工智慧的快速發展,許多前人的研究中開始應用深度學習方法從文 章中萃取出複雜的語意特徵來幫助他們建立更強大的分類模型。然而 要訓練一個深度學習模型通常需要非常大量的資料,在這個領域中有 標記的資料卻越來越難取得。因此在使用深度學習方法時就必須留意 資料量不足的問題。在這個領域中,長文章也是一個需要特別處理的 問題,因為使用者生成資料有時會是一篇很長的文章,但是某些深度 學習架構像是遞歸神經網路(RNN)並無法記憶這樣過長的內容,所以 就可能會跑出不理想的結果。 我們的研究中,我們提出一個綜合了深 度學習、傳統預先定義好的特徵以及極限梯度提升分類器(XGBoost)的 模型架構。我們利用遷移學習的技巧來處理對深度學習來說資料量不 足的問題。我們使用了兩種不同挑選重要句子的方式來增加我們的資 料量並且解決長文章的問題。最後的結果顯示我們的模型中的每一個 部分都有助於提升模型的表現。我們的方法也比現有的技術可得到更 高的準確率。zh_TW
dc.description.abstractHuman personality has been proved to be highly correlated to individual’s life, behaviors, and preferences. Because of these relationships, knowing a people’s personalities is helpful for firms’ effective human resource management, finding firms’ target customers, and other tasks that can be supported with users’ profiles. To efficiently detect a person’s personality traits, several methods have been proposed to infer the personality automatically by user-generated content (UGC). With the rapid development of AI, prior studies started to exploit the deep learning approach to discover latent and complex linguistic features and to develop a more effective classification model. However, training a deep learning model usually needs a very large set of training data, but in this specific task, labeled data are hard to obtain. Therefore, the use of deep learning methods for personality prediction will need to address the limited training data problem. Another problem in this task is that sometimes UGC data will be long documents while some deep learning models such as Recurrent Neural Networks cannot memorize such huge context.
In this work, we propose a hybrid model structure containing deep learning, traditional hand-crafted features, and XGBoost classifier. We employ transfer learning to address the insufficient training data problem for deep learning models. We propose two sentence selection schemes to increase our training data set and, at the same time, to address the long document problem. Our empirical evaluation results show that each part of our proposed method helps to improve the prediction effectiveness and outperforms our benchmark method.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:11:11Z (GMT). No. of bitstreams: 1
ntu-108-R06725012-1.pdf: 1370623 bytes, checksum: b5360634b15cf6cf35c445d5ef5fcc7f (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsChapter 1 Introduction .. 1
1.1 Background .. 1
1.2 Research Motivations and Objectives .. 3
Chapter 2 Literature Review .. 5
2.1 Overview of Human Personality .. 5
2.2 Measuring Human Personality by Questionnaire .. 8
2.3 Measuring Human Personality by UGC .. 9
2.3.1 Leveraging the User Generated Content .. 9
2.3.2 Open Dataset .. 10
2.3.3 Overview of Personality Prediction Methods .. 11
2.3.4 Previous Research Using Deep Learning .. 14
Chapter 3 Methodology .. 16
3.1 Overall Process of Our Proposed Method .. 17
3.2 Text Preprocessing .. 18
3.2.1 Data Cleaning and Sentence Splitting .. 18
3.2.2 Sentence Selection .. 18
3.2.3 Padding .. 19
3.3 Mairesse’s Features .. 19
3.4 Fine-tune Language Model .. 20
3.5 Train Personality Classifiers .. 22
3.5.1 Classifier Model Structure .. 22
3.5.2 Language Model to Document Representation .. 23
3.5.3 XGBoost Model .. 24
3.5.4 Using Data Augmentation .. 24
Chapter 4 Empirical Evaluations .. 25
4.1 Empirical Setup .. 25
4.1.1 Dataset .. 25
4.1.2 Mairesse’s Features .. 25
4.1.3 Sentence Selection .. 26
4.1.4 Language Model .. 26
4.1.5 Feature Concatenation and Classification .. 28
4.1.6 Benchmarks .. 28
4.1.7 Variants of Our Method .. 29
4.1.8 Evaluation Criterion and Procedure .. 30
4.2 Evaluation Results .. 30
4.3 Other Experimental Results .. 32
4.3.1 Effects of Fine-tuning Language Model .. 33
4.3.2 Effects of Document Length .. 34
4.3.3 Effects of Sentence Selection Schemes .. 35
4.3.4 How to Merge Two Sentence Selectors .. 36
4.3.5 Effects of Mairesse’s Features .. 37
Chapter 5 Conclusions and Future Works .. 39
5.1 Contributions .. 39
5.2 Future Works .. 40
References 42
Appendix 53
A Top 1% Chi-square Words for Each Factor .. 53
A.1 Extraversion .. 53
A.2 Neuroticism .. 54
A.3 Agreeableness .. 55
A.4 Conscientiousness .. 55
A.5 Openness .. 56
dc.language.isoen
dc.subject個性zh_TW
dc.subject深度學習zh_TW
dc.subject遷移學習zh_TW
dc.subject少量資料集zh_TW
dc.subject使用者生成資料zh_TW
dc.subject文字探勘zh_TW
dc.subjectTransfer learningen
dc.subjectPersonalityen
dc.subjectDeep learningen
dc.subjectText miningen
dc.subjectSmall dataseten
dc.subjectUser-generated contenten
dc.title基於深度學習方法根據使用者生成資料進行個性評估zh_TW
dc.titleA Deep Learning Based Approach for Personality Detection from User Generated Contenten
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee簡立峰,楊錦生
dc.subject.keyword個性,深度學習,遷移學習,少量資料集,使用者生成資料,文字探勘,zh_TW
dc.subject.keywordPersonality,Deep learning,Transfer learning,Small dataset,User-generated content,Text mining,en
dc.relation.page57
dc.identifier.doi10.6342/NTU201901942
dc.rights.note有償授權
dc.date.accepted2019-08-16
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
1.34 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved