Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84724
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
---|---|---|
dc.contributor.advisor | 盧信銘 | zh_TW |
dc.contributor.advisor | Hsin-Min Lu | en |
dc.contributor.author | 陳君儒 | zh_TW |
dc.contributor.author | Chun-Ju Chen | en |
dc.date.accessioned | 2023-03-19T22:22:24Z | - |
dc.date.available | 2023-11-10 | - |
dc.date.copyright | 2022-09-16 | - |
dc.date.issued | 2022 | - |
dc.date.submitted | 2002-01-01 | - |
dc.identifier.citation | Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). Deep learning for hate speech detection in tweets. 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223 Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515. https://ojs.aaai.org/index.php/ICWSM/article/view/14955 Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 4171–4186. https://arxiv.org/abs/1810.04805v2 ElSherief, M., Kulkarni, V., Nguyen, D., Yang Wang, W., & Belding, E. (2018a). Hate Lingo: A Target-Based Linguistic Analysis of Hate Speech in Social Media. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://ojs.aaai.org/index.php/ICWSM/article/view/15041 ElSherief, M., Nilizadeh, S., Nguyen, D., Vigna, G., & Belding, E. (2018b). Peer to Peer Hate: Hate Speech Instigators and Their Targets. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://ojs.aaai.org/index.php/ICWSM/article/view/15038 Fortuna, P., & Nunes, S. (2018). A Survey on Automatic Detection of Hate Speech in Text. ACM Comput. Surv., 51(4). https://doi.org/10.1145/3232676 Gao, L., Kuppersmith, A., & Huang, R. (2017). Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach (pp. 774–782). He, B., Ziems, C., Soni, S., Ramakrishnan, N., Yang, D., & Kumar, S. (2021). Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis. Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 90–94. https://doi.org/10.1145/3487351.3488324 Jiang, A., Yang, X., Liu, Y., & Zubiaga, A. (2021). SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection. ArXiv Preprint ArXiv:2108.03070. https://doi.org/10.5281/zenodo.4773875 Mathew, B., Kumar, N., Goyal, P., & Mukherjee, A. (2020). Interaction dynamics between hate and counter users on Twitter. 7th ACM IKDD CoDS and 25th COMAD (CoDS COMAD 2020), 116–124. https://doi.org/10.1145/3371158.3371172 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. https://doi.org/10.48550/arxiv.1301.3781 Mladenović, M., Ošmjanski, V., & Stanković, S. V. (2021). Cyber-Aggression, Cyberbullying, and Cyber-Grooming: A Survey and Research Challenges. ACM Comput. Surv., 54(1). https://doi.org/10.1145/3424246 Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2021). Arabic Offensive Language on Twitter: Analysis and Experiments. Proceedings of the Sixth Arabic Natural Language Processing Workshop, 126–135. Nikolov, A., & Radivchev, V. (2019). Nikolov-Radivchev at SemEval-2019 Task 6: Offensive Tweet Classification with BERT and Ensembles. 691–695. https://doi.org/10.18653/V1/S19-2123 Nina-Alcocer, V. (2018). AMI at IberEval2018 Automatic Misogyny Identification in Spanish and English Tweets. IberEval@SEPLN. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. 25th International World Wide Web Conference, WWW 2016, 145–153. https://doi.org/10.1145/2872427.2883062 Pamungkas, E. W., Basile, V., & Patti, V. (2020). Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study. Information Processing & Management, 57(6), 102360. https://doi.org/10.1016/J.IPM.2020.102360 Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1532–1543. https://doi.org/10.3115/V1/D14-1162 Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., & Patti, V. (2021). Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 55(2), 477–523. https://doi.org/10.1007/s10579-020-09502-8 Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 3982–3992. https://doi.org/10.48550/arxiv.1908.10084 Reimers, N., & Gurevych, I. (2020). Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 4512–4525. https://doi.org/10.48550/arxiv.2004.09813 Ribeiro, M., Calais, P., Santos, Y., Almeida, V., & Meira Jr., W. (2018). Characterizing and Detecting Hateful Users on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://ojs.aaai.org/index.php/ICWSM/article/view/15057 Safi Samghabadi, N., Patwa, P., PYKL, S., Mukherjee, P., Das, A., & Solorio, T. (2020). Aggression and Misogyny Detection using {BERT}: A Multi-Task Approach. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, 126–131. Saleem, H. M., Dillon, K. P., Benesch, S., & Ruths, D. (2017). A Web of Hate: Tackling Hateful Speech in Online Social Spaces. Proceedings of the 1st Workshop on Text Analytics for Cybersecurity and Online Safety. http://arxiv.org/abs/1709.10159 Schmidt, A., & Wiegand, M. (2017). A Survey on Hate Speech Detection using Natural Language Processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 1–10. https://doi.org/10.18653/v1/W17-1101 Silva, L., Mondal, M., Correa, D., Benevenuto, F., & Weber, I. (2016). Analyzing the Targets of Hate in Online Social Media. Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, 10(1), 687–690. Sumner, S. A., Ferguson, B., Bason, B., Dink, J., Yard, E., Hertz, M., Hilkert, B., Holland, K., Mercado-Crespo, M., Tang, S., & Jones, C. M. (2021). Association of Online Risk Factors With Subsequent Youth Suicide-Related Behaviors in the US. JAMA Network Open, 4(9), e2125860–e2125860. https://doi.org/10.1001/jamanetworkopen.2021.25860 Waseem, Z., & Hovy, D. (2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop, 88–93. https://doi.org/10.18653/v1/N16-2013 Yang, H., & Lin, C.-J. (2020). {TOCP}: A Dataset for {C}hinese Profanity Processing. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, May, 6–12. https://www.aclweb.org/anthology/2020.trac-1.2 Yin, W., & Zubiaga, A. (2021). Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science, 7, e598. https://doi.org/10.7717/PEERJ-CS.598 Yu, L. C., Lee, L. H., Hao, S., Wang, J., He, Y., Hu, J., Lai, K. R., & Zhang, X. (2016). Building Chinese Affective Resources in Valence-Arousal Dimensions. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference, 540–545. https://doi.org/10.18653/V1/N16-1066 Zannettou, S., Elsherief, M., Belding, E., Nilizadeh, S., & Stringhini, G. (2020). Measuring and Characterizing Hate Speech on News Websites. WebSci 2020 - Proceedings of the 12th ACM Conference on Web Science, 10(20), 125–134. https://doi.org/10.1145/3394231.3397902 Zhang, Z., & Luo, L. (2019). Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web, 10(5), 925–945. https://doi.org/10.3233/SW-180338 余貞誼 (2019) 〈我說妳是妳就是:PTT「母豬教」中的厭女與性別挑釁〉,王曉丹編,《這是愛女,也是厭女:如何看穿這世界拉攏與懲戒女人的兩手策略?》,頁29-55,台北:大家出版。 | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84724 | - |
dc.description.abstract | 近年來,網路仇恨性言論受到各社群平臺的重視,仇恨性言論偵測亦成為國際上之一項研究主題;然而,仇恨性言論偵測在華語使用圈中受到的重視遠不及西方社會,目前仇恨性言論偵測所使用的資料集仍多為推特上的英文留言,忽略了其他語系。有鑑於此,本研究欲建立華文仇恨性言論偵測模型並進行華文仇恨性用戶之分析。本研究提出了新的華文仇恨性言論資料集 PTT_HateSpeech,使用臺灣之電子布告欄系統 PTT 作為資料來源,收集 12 個月的資料,並人工標記了 38,950 則推文。我們根據此資料集訓練了分類模型來進行性別歧視相關之仇恨性言論偵測,最終之平均 F1-score 為 0.5976。透過訓練結果,我們將模型應用在使用者分析上,探討仇恨性用戶和一般用戶在網路使用習慣上之不同,並發現仇恨性用戶有群聚現象,且傾向於使用較為激烈和情緒性之字眼。本研究旨在提供臺灣本地之仇恨性言論資料集,補齊現階段研究中缺乏的華文資料集,並且從自然語言處理的角度分析臺灣網路社群之生態。 | zh_TW |
dc.description.abstract | Hate speech (HS) is an increasing problem worldwide, and HS detection is an urgent issue. However, only few related studies have been addressed in East Asia society. The majority of research resources are Twitter corpora in English, which may only provide hateful content generated from English users and ignore diverse cultural features. Therefore, in this work, we develop PTT_HateSpeech, a novel sexism HS dataset collected from a Taiwanese bulletin board system named PTT (telnet://ptt.cc), to analyze Chinese linguistic patterns in HS detection task. The dataset contains 38,950 comments across 12 months with hand-annotated “hateful” or “non-hateful”. We train classification models to detect HS, and the average F1-score is 0.5976. Equipped with the proposed model, we conduct further user behavior analysis and sentiment analysis to compare hateful users and normal users. We find out that (1) intensive interactions can be observed in the group of hateful users, and (2) hateful users tend to use fiercer and angrier words, showing low-valence but high-arousal emotions. Our research bridges the gap by covering Taiwanese local data in the investigation, and we present a comprehensive study of Taiwanese online ecology from the field of natural language processing. | en |
dc.description.provenance | Made available in DSpace on 2023-03-19T22:22:24Z (GMT). No. of bitstreams: 1 U0001-0409202202565600.pdf: 2381681 bytes, checksum: a228880ac3abc2b54313b3f32cf228a8 (MD5) Previous issue date: 2022 | en |
dc.description.tableofcontents | 誌謝 i 摘要 ii ABSTRACT iii Table of Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 Chapter 2 Literature Review 5 2.1 Corpus Collection, Annotation, and Datasets 5 2.2 Hate Speech Detection Models 13 2.3 Hate Speech Analysis 17 Chapter 3 Research Gaps and Research Questions 19 3.1 Research Gaps 19 3.2 Research Questions 19 Chapter 4 Data 20 4.1 Data Overview 20 4.2 Corpus Collection 22 4.3 Data Annotation 25 Chapter 5 Methodology 29 5.1 Model Training 29 5.2 Baseline Model 32 Chapter 6 Results and Discussion 33 Chapter 7 Error Analysis 35 7.1 False Negative 35 7.2 False Positive 36 Chapter 8 User Analysis 38 8.1 Account Characteristics 38 8.2 User Interaction 41 8.3 Linguistic Characteristics 46 Chapter 9 Conclusion 48 References 50 | - |
dc.language.iso | en | - |
dc.title | 仇恨性言論偵測及仇恨性用戶分析 | zh_TW |
dc.title | Who is the Hate-Speech Speaker? Hate Speech Detection and User-Level Analysis | en |
dc.type | Thesis | - |
dc.date.schoolyear | 110-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 洪茂蔚;畢南怡 | zh_TW |
dc.contributor.oralexamcommittee | Mao-Wei Hung;Nan-Yi Bi | en |
dc.subject.keyword | 仇恨性言論偵測,分類模型,用戶分析,社群平臺,自然語言處理, | zh_TW |
dc.subject.keyword | hate speech detection,text classification task,user analysis,online social network,natural language processing, | en |
dc.relation.page | 56 | - |
dc.identifier.doi | 10.6342/NTU202203125 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2022-09-06 | - |
dc.contributor.author-college | 管理學院 | - |
dc.contributor.author-dept | 資訊管理學系 | - |
dc.date.embargo-lift | 2022-09-06 | - |
Appears in Collections: | 資訊管理學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-110-2.pdf Access limited in NTU ip range | 2.33 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.