Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84184
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡政安(Chen-An Tsai)
dc.contributor.authorYi-Ling Hsuen
dc.contributor.author許苡鈴zh_TW
dc.date.accessioned2023-03-19T22:05:56Z-
dc.date.copyright2022-07-08
dc.date.issued2022
dc.date.submitted2022-07-03
dc.identifier.citation[1] Oren Melamud, Jacob Goldberger, and Ido Dagan. context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of the 20th SIGNLL conference on computational natural language learning, pages 51–61, 2016. [2] Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling. arXiv preprint arXiv:1611.06639, 2016. [3] Che-Wen Chen, Shih-Pang Tseng, Ta-Wen Kuan, and Jhing-Fa Wang. Outpatient text classification using attention-based bidirectional lstm for robot-assisted servicing in hospital. Information, 11(2):106, 2020. [4] Weijiang Li, Fang Qi, Ming Tang, and Zhengtao Yu. Bidirectional lstm with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing, 387:63–77, 2020. [5] Chenbin Li, Guohua Zhan, and Zhihua Li. News text classification based on improved bi-lstm-cnn. In 2018 9th International conference on information technology in medicine and education (ITME), pages 890–893. IEEE, 2018. [6] Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. A c-lstm neural net- work for text classification. arXiv preprint arXiv:1511.08630, 2015. [7] Liang Yao, Chengsheng Mao, and Yuan Luo. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7370–7377, 2019. [8] Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. Text level graph neural network for text classification. arXiv preprint arXiv:1910.02356, 2019. [9] Ralitsa Angelova and Gerhard Weikum. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485–492, 2006. [10] Zeynep Hilal Kilimci and Selim Akyokuş. The analysis of text categorization rep- resented with word embeddings using homogeneous classifiers. In 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pages 1–6. IEEE, 2019. [11] Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4821–4830, 2019. [12] Marie Wehenkel, Antonio Sutera, Christine Bastin, Pierre Geurts, and Christophe Phillips. Random forests based group importance scores and their statistical interpretation: application for alzheimer’s disease. Frontiers in neuroscience, 12:411, 2018. [13] Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, and Yan Liu. Combating fake news: A survey on identification and mitigation techniques. ACM Trans. Intell. Syst. Technol., 2019. [14] Sarah A Alkhodair, Steven HH Ding, Benjamin CM Fung, and Junqiang Liu. Detecting breaking news rumors of emerging topics in social media. Information Processing & Management, 57(2):102018, 2020. [15] Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks. 2016. [16] Dongxu Zhang and Dong Wang. Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006, 2015. [17] Shu Zhang, Dequan Zheng, Xinchen Hu, and Ming Yang. Bidirectional long short- term memory networks for relation classification. In Proceedings of the 29th Pacific Asia conference on language, information and computation, pages 73–78, 2015. [18] Ghada Alfattni, Niels Peek, and Goran Nenadic. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical dis- charge summaries. Journal of Biomedical Informatics, 123:103915, 2021. [19] Yixuan Chen, Jie Sui, Liang Hu, and Wei Gong. Attention-residual network with cnn for rumor detection. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1121–1130, 2019. [20] Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015. [21] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. A convolutional approach for misinformation identification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, page 3901–3907. AAAI Press, 2017. [22] Jeff Z. Pan, Siyana Pavlova, Chenxi Li, Ningxi Li, Yangmei Li, and Jinshuo Liu. Content based fake news detection using knowledge graphs. In D Vrandečis, editor, The Semantic Web–ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 669– 683, Germany, December 2018. Springer Verlag. 17th International Semantic Web Conference, ISWC 2018 ; Conference date: 08-10-2018 Through 12-10-2018. [23] Vaibhav Vaibhav, Raghuram Mandyam, and Eduard Hovy. Do sentence interactions matter? leveraging sentence level representations for fake news classification. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pages 134–139, Hong Kong, November 2019. Association for Computational Linguistics. [24] Xiaofan Zhi, Li Xue, Wengang Zhi, Ziye Li, Bo Zhao, Yanzhen Wang, and Zhen Shen. Financial fake news detection with multi fact cnn-lstm model. In 2021 IEEE 4th International Conference on Electronics Technology (ICET), pages 1338–1341. IEEE, 2021. [25] Yang Liu and Yi-Fang Wu. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. [26] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news online. Science, 359(6380):1146–1151, 2018. [27] Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining, pages 312–320, 2019. [28] William Yang Wang. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 422–426, Vancouver, Canada, July 2017. Association for Computational Linguistics. [29] Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Jun- zhou Huang. Rumor detection on social media with bi-directional graph convolutional networks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 549–556, 2020. [30] Jing Ma, Wei Gao, and Kam-Fai Wong. Rumor detection on twitter with tree- structured recursive neural networks. Association for Computational Linguistics, 2018. [31] Jiawei Han, Micheline Kamber, and Jian Pei. Data mining trends and research frontiers. Data Min, pages 585–631, 2012. [32] Xinyi Zhou and Reza Zafarani. Network-based fake news detection: A pattern-driven approach. ACM SIGKDD explorations newsletter, 21(2):48–60, 2019. [33] Jiawei Zhang, Bowen Dong, and S Yu Philip. Fakedetector: Effective fake news detection with deep diffusive neural network. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1826–1829. IEEE, 2020. [34] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [35] Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. Lstm neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, 2012. [36] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [37] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [38] Michael Neumann and Ngoc Thang Vu. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv preprint arXiv:1706.00612, 2017. [39] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), pages 207–212, 2016. [40] Juan Ramos et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 29–48. Citeseer, 2003. [41] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [42] Jiexiong Tang, Chenwei Deng, and Guang-Bin Huang. Extreme learning machine for multilayer perceptron. IEEE transactions on neural networks and learning systems, 27(4):809–821, 2015. [43] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 793–803, 2019. [44] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The world wide web conference, pages 2022–2032, 2019. [45] Elena Kochkina, Maria Liakata, and Arkaitz Zubiaga. All-in-one: Multi-task learning for rumour verification. arXiv preprint arXiv:1806.03713, 2018. [46] Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Peter Tolmie. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS one, 11(3):e0150989, 2016. [47] Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah. Real-time rumor debunking on twitter. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 1867–1870, 2015. [48] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. Fake- newsnet: A data repository with news content, social context and spatialtemporal in formation for studying fake news on social media. arXiv preprint arXiv:1809.01286, 2018. [49] William S Noble. What is a support vector machine? Nature biotechnology, 24(12):1565–1567, 2006. [50] Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Well-read students learn better: On the importance of pre-training compact models. arXiv preprint arXiv:1908.08962, 2019. [51] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. [52] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019. [53] Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. Bertweet: A pre-trained language model for english tweets. arXiv preprint arXiv:2005.10200, 2020. [54] Martin Müller, Marcel Salathé, and Per E Kummervold. Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503, 2020. [55] John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. Declutr: Deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659, 2020. [56] Petar Veličkovis, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017. [57] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [58] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84184-
dc.description.abstract近年來,隨著網路快速的發展,社群媒體成為人們生活中不可分割的一部分,藉由網路可以更容易獲取資訊或表達自己的想法,但也衍生出資訊過度氾濫的問題,因此,文章的真實性成為大家很重視的議題。在這項研究中,我們提出一種基於異質圖 (Heterogeneous graph)的模型,結合了基於內容 (content-based) 和基於網絡 (network-based) 方法的優點,並且利用文章、關鍵字、轉發者的交互作用來檢測社交媒體上的假文章。 在該模型中,不僅利用兩種分詞方法 (General Tokenizer 和 BertTokenizer) 來存取原始文本的資訊,也記錄了轉發者的傳播路徑,以從不同方面捕捉假新聞的模式。此外,我們會用幾個關鍵詞來表示每篇文章,因為一些假文章會有特定的詞,並且使用圖注意力神經網絡,來學習文章、轉發者和關鍵字的交互作用。最後,結合雙向長短期記憶 (Bidirectional Long Short-Term memory) 和圖注意力神經網絡 (Graph attention network),以更新後的向量來預測文章的真實性。論文中也透過廣泛的實驗對不同的方法和我們的模型進行了全面比較。結果顯示,我們提出的模型在不同的資料集 (Twitter 15&16、FakeNewsNet) 中普遍具有優越的表現,對於假文章偵測的準確率高達95%,明顯優於其他方法4%。zh_TW
dc.description.abstractIn recent years, with the rapid development of the Internet communication, social media has become an inseparable part of people's lives. It is easier for people to obtain information or express their thoughts through the Internet. Therefore, the authenticity of articles rises as a primary concern for public affairs. In this research, we propose a novel method to detect fake news on social media. Given the article and the corresponding retweeters without text comments, we aim to predict whether the article is fake or not. We propose a graph-based model that combines the advantage of both content-based and network-based learning models. In this model, we not only consider the information of the raw text via two tokenizer methods, namely General Tokenizer and BertTokenizer, but also record the propagation of the retweeter to capture the pattern of fake news from different aspects. In addition, we would represent each article by several keywords because some fake news would have specific words. We construct a heterogeneous graph with a graph attention network to capture the interactions of news, retweeters, and keywords. Finally, we apply two other methods which are Bidirectional Long Short-Term memory (BiLSTM) and Graph attention network (GAT) to learn the representation of articles to determine whether an incoming article is fake or not. We perform a comprehensive comparison of different content-based and network-based methods via extensive experiments. Results show that our proposed model generally has superior performance in different datasets (Twitter 15&16, FakeNewsNet), and the accuracy rate is up to 95% and significantly outperforms baseline methods by 4%.en
dc.description.provenanceMade available in DSpace on 2023-03-19T22:05:56Z (GMT). No. of bitstreams: 1
U0001-1906202200284800.pdf: 2453237 bytes, checksum: 911d71f262c7246a4ad8ac436b2b5db4 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontentsContents Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures ix List of Tables x Denotation xi Chapter 1 INTRODUCTION 1 Chapter 2 RELATED WORK 5 2.1 Content-based Fake News Detection 5 2.2 Network-based Fake News Detection 7 Chapter 3 METHODOLOGY 9 3.1 Terminology Definition 9 3.2 Problem Statement 11 3.3 Content-based Representation 11 3.3.1 Tokenization & Word Embeddings 11 3.3.2 Bidirectional Network 12 3.3.3 Attention 14 3.4 Keywords Embedding 15 3.4.1 Keywords Filtering 15 3.4.2 Representation of Keywords 17 3.4.3 Connection of Keywords 17 3.5 Retweeter Embedding 19 3.5.1 Representation of Retweeter 19 3.6 Network-based Representation 19 3.6.1 Heterogeneous Graph Construction 19 3.6.2 Heterogeneous Convolution Layer 20 3.7 Embedding Fusion 22 3.8 Make Prediction 22 Chapter 4 Experimental Results 23 4.1 Datasets 23 4.2 Competing Methods 24 4.2.1 Content-based Baseline 24 4.2.2 Network-based Baseline 26 4.3 Setup 27 4.4 Evaluation Metrics 28 4.5 Main experimental results 29 4.5.1 Main results 29 4.5.2 Visualization 30 4.6 Discuss of selected keywords 34 4.6.1 Conditional fake ratio 34 4.6.2 Co-occurrence keywords 35 4.6.3 Hierarchical clustering 38 4.7 Hyperparameter Analysis 39 4.7.1 Ratio of keywords 39 4.7.2 Quantile of chi-square statistics 41 4.8 Ablation Study 42 Chapter 5 CONCLUSION 44 References 46 ------------------------------------------------------ List of Figures 1.1 A toy example of Network Architecture ................... 2 3.1 The architecture of our proposed model ................... 9 3.2 Chi-square statistic .............................. 18 4.1 Confusion Matrix ............................... 28 4.2 Visualization of results by t-SNE ...................... 33 4.3 Conditional fake ratio of keywords on different datasets ...................... 36 4.4 Heatmap of keywords ........................... 37 4.5 Highly relevant keywords ......................... 38 4.6 Hierarchical clustering of keywords ..................... 40 4.7 Performance by varying ratio of keywords ..................... 41 4.8 Performance by varying quantile of chi-square statistics ..................... 42 4.9 Results on ablation study of the proposed model ..................... 43 ------------------------------------------------------ List of Tables 4.1 Statistics of datasets ............................ 23 4.2 Set up on different datasets .......................... 27 4.3 Experimental results on PHEME ....................... 30 4.4 Experimental results on Twitter15 and Twitter16 ....................... 31 4.5 Experimental results on FakeNewsNet ....................... 32
dc.language.isoen
dc.subject深度學習zh_TW
dc.subject假新聞偵測zh_TW
dc.subject圖注意力神經網絡zh_TW
dc.subject異質圖zh_TW
dc.subject雙向長短期記憶zh_TW
dc.subjectFake News Detectionen
dc.subjectDeep Learningen
dc.subjectBidirectional Long Short Term Memoryen
dc.subjectHeterogeneous Graphen
dc.subjectGraph Attention Networken
dc.title基於異質圖神經網路與使用者文章關鍵字交互學習應用於假新聞偵測zh_TW
dc.titleFake news detection based on heterogeneous graph neural network via user-post-keyword interaction learningen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳春樹(Chun-Shu Chen),陳錦華(Jin-Hua Chen)
dc.subject.keyword假新聞偵測,深度學習,異質圖,圖注意力神經網絡,雙向長短期記憶,zh_TW
dc.subject.keywordFake News Detection,Deep Learning,Heterogeneous Graph,Graph Attention Network,Bidirectional Long Short Term Memory,en
dc.relation.page54
dc.identifier.doi10.6342/NTU202200999
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-07-04
dc.contributor.author-college共同教育中心zh_TW
dc.contributor.author-dept統計碩士學位學程zh_TW
dc.date.embargo-lift2022-07-08-
顯示於系所單位:統計碩士學位學程

文件中的檔案:
檔案 大小格式 
U0001-1906202200284800.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
2.4 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved