基於對比學習的討論串辯論參與者回合狀態表示

刁彥斌; Yan-Bin Diau

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88856

標題:	基於對比學習的討論串辯論參與者回合狀態表示 Contrastively Learning Participant Representations Per Round in Thread-Based Debates
作者:	刁彥斌 Yan-Bin Diau
指導教授:	陳信希 Hsin-Hsi Chen
關鍵字:	論點探勘,辯論,討論串,說服力預測,討論熱度預測,表徵學習,對比學習, Argument Mining,Debates,Texts in Discussion Threads,Persuasiveness Prediction,Popularity Prediction,Representation Learning,Contrastive Learning,
出版年 :	2023
學位:	碩士
摘要:	隨著網絡社群蓬勃發展，辯論活動的形式也在逐漸轉型。與傳統面對面的辯論方式相比，在社群媒體、網絡論壇等等場域廣為使用的討論串制辯論與其有諸多相異之處。過去辯論所經常受限於的場域（學術會議、政治辯論等等）不再適用，只要擁有網路的存取就可以在線上發表意見，並和世界各地的人展開辯論。此種新的溝通及交流方式，無疑為人類的進步、興盛，提供了強大的動力與潛力。然而，日趨複雜的社會議題、組成、發展、政策，加以人類在網路上正逐漸被大量的資訊輸入淹沒，辯論的廣度、篇幅和規模可能會前所未有地龐大，許多參與者各自關注不同的焦點。因應這樣的轉變，為了協助人類以各個角色──辯論者、旁觀者、管理者等等──參與辯論，電腦對大篇幅的網絡辯論進行自動處理和分析的能力，逐漸得到研究關注，以方便人類高效地參與和管理辯論討論串，並以新時代公民的身份投身於公共討論。一個辯論的「討論串」（Thread）通常由一個初始的論點、問題或文章開始。參與者可以通過回覆、評論等等的互動，將不同的觀點投入討論。這個形式使討論形成樹狀的結構，使得辯論整體來說更加有組織和易於關注在特定的焦點上，然而，也創造了一些新的挑戰。第一，大量的參與者和訊息，使得資訊量相當分散和龐大，討論串制的可無限擴充特性，對於人類有限的認知來說同時也是缺點。要針對一則辯論，理解和分析數以千計的回覆和評論，變得非常困難，此時電腦和相關的技術若能快速處理大篇幅的網路辯論，將會十分便利。為此，在自然語言處理、機器學習、資訊探索和擷取，以及論點探勘（Argumentation Mining）等研究領域裡，存在著眾多關鍵技術。舉凡文字嵌入表示發展至今的 Transformers 系列模型、論點抽取、整理和結構分析、文字摘要、情感分析、立場偵測、論證手法分類、仇恨言論偵測、知識庫建置與事實查核、辯論獲勝者分析、圖神經網路、以及各式特徵表示（語法、語義、特定的詞彙等）對辯論的影響等等，這些研究和技術，成為了自動資訊抽取、組織、分析、摘要，以及建模一場辯論的重要工具。然而，論點探勘領域一直以來都有資料標註成本的需求，也有各資料集特性和標註差異大的情況。本論文試圖探討，在討論串制辯論的前提下，其資料自然形成的架構，是否包含充足的資訊，用以在不需監督的狀況下，形成有效的辯論表示法，並以說服力預測、討論熱度預測作為下游任務，驗證此方法能夠抓取對辯論和討論建模所需的資訊，並儘量簡化輸入資料規定，探討簡化資料下所能達成的模型表現，期能減輕論點探勘中資料集特性相異的狀況。 How humans debate is gradually transforming as network forums and social media platforms blossomed over the past decades. The widely used thread-based discussions differ from traditional debates in many ways. Everyone with network access can participate asynchronously, liberated from location and occasion limitations, sometimes even languages. Such a new way of communication and discussion undoubtedly has potential to aid humans in progressing and prospering. However, as society, issues and policies grow increasingly complex, and the amount of information inputs on online platforms become more overwhelming day by day; an online debate can become unprecedentedly large in scale. Thus, the demand for automatic and efficient processing and analysis of debates are getting higher, to help humans better engage in or moderate debates, and contribute their points of view to public affairs as citizens of the new, networking era. A discussion thread of a debate usually starts with someone expressing their opinions, questions or arguments. Others are then able to engage via replying, commenting, or other sorts of interactions, expressing their own points of view. This specific approach makes tree-like structures, where the roots are the original arguments or topics. Although one can better engage in and focus on certain parts of the debate, such a method also inevitably poses some new challenges. First, as the discussion gets larger with many participants, its information become too distributed and too much for a single human’s limited cognitive abilities. It then is difficult for humans to analyze and make sense of a discussion thread with thousands of comments. Such challenges demand natural language processing techniques, but what lines of research are relevant and what exactly do we hope to achieve? Several techniques are essential to help humans engage in and understand debates. Natural Language Processing, Machine Learning, Information Retrieval, and Argumentation Mining are examples of relevant lines of research. Text embedding techniques have developed over the years to become the current state-of-the-art Transformer-based models, and argumentation extraction, organization, discourse analysis, text summary, sentiment analysis, stance prediction, reasoning strategies, hate speech detection, knowledge base building and fact check, debate winner prediction, argumentative graph neural network, countless feature engineering research works and even others not mentioned here, can all be critical if one aims to automatically model a debate, and make use of its underlying information. However, many argument mining research lines suffer from the high costs of data annotation, and the heterogeneity of corpora that often renders some methods inapplicable to other data formulations. This work thus aims at exploring the possibility to make use of the naturally formed structures of thread-based debates to construct effective representations in a self-supervised manner. We also attempted to minimize our assumption of the data by limiting our information sources, hoping to alleviate the corpus heterogeneity issue.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88856
DOI:	10.6342/NTU202302592
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	1.9 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。