請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21409| 標題: | 非監督式事實查核檢索模型 Unsupervised Fact-checking Retrieval Model : A Real Case Study |
| 作者: | Ying-Hui Wu 吳盈慧 |
| 指導教授: | 鄭卜壬(Pu-Jen Cheng) |
| 關鍵字: | 假新聞,事實查核,非監督式,資訊檢索系統,詞向量,擴展查詢,命名實體識別,維基百科條目名,Okapi BM25, Fake News,Fact-Checking,Unsupervised,Information Retrieval System,Word Embedding,Query Expansion,Named Entity Recognition,Wikipedia Titles,Okapi BM25, |
| 出版年 : | 2019 |
| 學位: | 碩士 |
| 摘要: | 隨著網際網路及社群媒體蓬勃發展,徹底改變人類接受資訊的方式,人們可隨時隨地透過各式網路平台傳播及閱讀訊息,也因此假新聞(Fake News)猖獗並以極快的速度傳播擴散,假新聞對於社會的危害,不只是單純因錯誤健康資訊、謠言影響個人生活,更妨礙公共議題正常對話,進而演變成重大國安風險。本研究期望探索針對事實查核(Fact-Checking)的特性,如何設計精確性高的非監督式(Unsupervised)資訊檢索系統(Information Retrieval System),以期推展至實務應用,協助減少假新聞藉由網路及通信服務平台氾濫傳播的現象。
本研究資料集融合「Cofacts 真的假的」及「台灣事實查核中心」2 家臺灣主流查核機構的謠言資料庫及查核分析報告內容,並運用大量的臺灣新聞資料集訓練詞向量模型(Word Embedding Model),進而以此模型為基礎設計擴展查詢(Query Expansion),另再透過自然語言處理中的命名實體識別(Named Entity Recognition)技術,進行百萬筆中文維基百科條目名(Wikipedia Titles)自動標註命名實體類別,其成果可同時增進中文斷詞精準度及進行查詢關鍵字加權。最後,以經典資訊檢索模型 Okapi BM25 為基底,建構基於詞向量擴展查詢及命名實體加權的混合式事實查核檢索模型,經由多項實驗及參數調校,證明其混合式模型之綜合表現優於基準,代表設計構想具有一定程度的可行性,並就實驗成果提出相關的發現與精進方向。 As the trend of Internet and social media goes, the way for people to gain access to information has been entirely evolved. People nowadays deliver and receive messages through online platforms anytime and anywhere. However, the convenience also causes severe problems about fake news and the rapid spread of misinformation. Such transmits are harmful to the society. While inaccurate health care tips and rumors trouble personal lives, misinterpret assertions and fabricate claims obstruct communication about public issues, leading to national security risks. In order to decrease the overspreading of fake news on the Internet and telecommunication platforms, the study attempted to discover the characteristic of fact-checking, design a high accurate unsupervised information retrieval system which can be applied in practice. The source of dataset refered to two major fact-checking organizations in Taiwan, ”Cofacts” and ”Taiwan Fact Check Center”. The study functioned word embedding model to attain query expansion. Chinese text segmentation optimizing and keyword weight tuning were implemented by applying named entity recognition on Wikipedia titles. The final fact-checking retrieval model was developed based on Okapi BM25, word embedding and named entity recognition keyword weighting. After experiments and parameter optimization, the result shows that the mixture model performs better and the design is practical for real cases. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21409 |
| DOI: | 10.6342/NTU201902687 |
| 全文授權: | 未授權 |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-108-1.pdf 未授權公開取用 | 10.6 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
