設計與實作基於可控制大型語言模型的新聞意見萃取與分析系統

董光立; Guang-Li Dong

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94212

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬	zh_TW
dc.contributor.advisor	Pu-Jen Cheng	en
dc.contributor.author	董光立	zh_TW
dc.contributor.author	Guang-Li Dong	en
dc.date.accessioned	2024-08-15T16:15:01Z	-
dc.date.available	2024-10-05	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-08	-
dc.identifier.citation	Office of the President (Taiwan), "Inaugural address of ROC 14th-term President Tsai Ing-wen," 2016. [Online]. Available: https://www.president.gov.tw/news/20444. The Central News Agency, "World Press Freedom Index ranking: Taiwan rises to 27th and China ranks 9th from bottom," 2024. Reporters Without Borders, "2024 World Press Freedom Index ranking," 2024. [Online]. Available: https://rsf.org/en/country/taiwan. Ralf Krestel, Sabine Bergler and Rene Witte, "Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles," in LREC, 2008. David K. Elson and Kathleen R. McKeown, "Automatic Attribution of Quoted Speech in Literary Narrative," in AAAI, 2010. Chris Newell, Tim Cowlishaw and David Man, "Quote Extraction and Analysis for News," in KDD, 2018. Timoté Vaucher, Andreas Spitz, Michele Catasta and Robert West, "Quotebank: A Corpus of Quotations from a Decade of News," in WSDM, 2021. Kuan-Lin Lee, Yu-Chung Cheng, Pai-Lin Chen and Hen-Hsen Huang, "Keeping Their Words: Direct and Indirect Chinese Quote Attribution from Newspapers," in WWW, 2020. Xiaoxiao Shang, Zhiyuan Peng, Qiming Yuan, Sabiq Khan, Lauren Xie, Yi Fang and Subramaniam Vincent, "DIANES: A DEI Audit Toolkit for News Sources," in SIGIR, 2022. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kapla, "Language Models are Few-Shot Learners.," NeurIPS, 2020. Kristina Toutanova and Jacob Devlin, Ming-Wei Chang, Kenton Lee, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.," in NAACL, 2019. Shen, S., Dong, Z., Ye, J., Ma, L., Huang, Y., & L, "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.," AAAI , 2020. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang and Lu Wang, Weizhu Chen, "LoRA: Low-Rank Adaptation of Large Language Models.," ICLR, 2022. JasonWei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le and Denny Zhou, "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," in NeurIPS, 2022. Guidance, "Guidance: A language for controlling large language models.," [Online]. Available: https://github.com/guidance-ai/guidance. M. A. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora.," in COLING, 1992. Lafferty, J., McCallum, A., & Pereira, F., "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.," in ICML, 2001. S. Sarawagi, "Information Extraction," in Foundations and Trends in Databases, 2008. Lample, G., Ballesteros, M., Subramanian, S. and Kawakami, K., & Dyer, C., "Neural Architectures for Named Entity Recognition.," in NAACL, 2016. Monica Agrawal, Stefan Hegselmann and Hunter Lang, "Large language models are few-shot clinical information extractors," in EMNLP, 2022. Minqing Hu, Bing Liu, "Mining and Summarizing Customer Reviews," in SIGKDD, 2004. Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, "Thumbs up?: Sentiment Classification Using Machine Learning Techniques.," in EMNLP, 2002. Peter D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews.," in ACL, 2002. Yoon Kim, "Convolutional Neural Networks for Sentence Classification," in EMNLP, 2014. Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi and Luke Zettlemoyer, "Rethinking the Role of Demonstrations: What Makes In-Context LearningWork?," in EMNLP, 2022. Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan and Lidong Bing, "Sentiment Analysis in the Era of Large Language Models: A Reality Check," in NAACL, 2024. Central News Agency (CNA), "https://focustaiwan.tw/aboutus," [Online]. OpenAI, "GPT-3.5 Turbo fine-tuning and API updates," 22 8 2023. [Online]. Available: https://openai.com/index/gpt-3-5-turbo-fine-tuning-and-api-updates/. NARLabs, "Llama3-TAIDE-LX-8B-Chat-Alpha1," [Online]. Available: https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1. National Applied Research Laboratories, "TAIDE," [Online]. Available: https://en.taide.tw/. OpenAI, "dall-e-3," [Online]. Available: https://openai.com/index/dall-e-3/. Nations United, "UNRWA Situation Report #1 on the Situation in the Gaza Strip," 7 10 2023. [Online]. Available: https://www.un.org/unispal/document/unrwa-situation-report-1-on-the-situation-in-the-gaza-strip/. BBC, "US and China agree to resume military communications after summit," 2024. [Online]. Available: https://www.bbc.com/news/world-us-canada-67411191.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94212	-
dc.description.abstract	新聞意見探勘旨在蒐集與處理新聞中各受訪者與組織發布的訊息，進而客觀地綜整並分析各立場的異同。為了實現更多元、平等、透明且完整的意見分析結果，我們設計並實作了一套基於可控制大型語言模型的新聞意見萃取與分析系統，具有以下三項主要效益：透明過程：將多筆非結構化資訊經由結構化表格轉換，使系統不僅顯示最終的摘要結果，還能透明化整體運算過程，方便專家檢視、分析、介入與調整作業流程。設計簡化：利用大型語言模型連結多個自然語言處理任務，簡化分析系統程式的演算法邏輯設計，提升整體運算效率。高效抽取：通過上下文學習、模型微調及引導語言模型的高效程式範本，讓小型模型在引述抽取任務中也能達到與SOTA大型語言模型相媲美的效能。這套系統不僅提升了新聞意見萃取與分析的準確性和效率，還為專家提供了更直觀和可操作的分析工具，為新聞領域帶來更全面和客觀的意見分析成果。	zh_TW
dc.description.abstract	News opinion mining aims to collect and process information released by interviewees and organizations in the news, objectively summarizing and analyzing the differences and similarities in various stances. To achieve more diverse, equal, transparent, and comprehensive opinion analysis results, we have designed and implemented a news opinion extraction and analysis system based on controllable large language models (LLMs), which offers the following three main benefits: Transparent Process: By converting multiple pieces of unstructured information into structured tables, the system not only displays the final summarized results but also transparently reveals the entire computational process, making it easier for experts to review, analyze, intervene, and adjust the workflow. Simplified Design: Utilizing LLMs to connect multiple natural language processing tasks simplifies the algorithmic logic design of the analysis system, improving overall computational efficiency. Efficient Extraction: Through contextual learning, model fine-tuning, and efficient programming templates for guiding language models, even small models can achieve performance comparable to state-of-the-art (SOTA) LLMs in citation extraction tasks. This system not only enhances the accuracy and efficiency of news opinion extraction and analysis but also provides experts with more intuitive and operable analytical tools, bringing more comprehensive and objective opinion analysis results to the field of news.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:15:01Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T16:15:01Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 ii Abstract iii Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Workflow 2 1.3 Thesis Structure 5 Chapter 2 Background and Related Work 7 2.1 Quote Extraction 7 2.2 Large language models 9 2.3 Information Extraction 11 2.4 Sentiment Analysis 12 Chapter 3 Quote Extraction and Experiment Result 15 3.1 Quotation Types 15 3.2 Data Source and Crawler 16 3.3 JSON Format 18 3.4 Supervised Fine-Tuning 19 3.5 Programming Paradigm 22 3.6 Experiment Result 26 Chapter 4 News Opinion Analysis System 29 4.1 Workflow 29 4.2 Title and Name Extraction 30 4.3 Three Classification Methods 31 4.4 Five Categories 34 4.5 Name and Title Classification 36 4.6 Aspect Extraction 37 4.7 Sentiment Classification 39 4.9 Aspect Summarize 40 Chapter 5 User Interface and Demo 41 5.1 Index 41 5.2 Israel–Hamas War 42 5.3 Biden-Xi Meeting 46 Chapter 6 Conclusion and Future Work 50 6.1 Conclusion 50 6.2 Future Work 51 References 53	-
dc.language.iso	en	-
dc.title	設計與實作基於可控制大型語言模型的新聞意見萃取與分析系統	zh_TW
dc.title	Design and Implementation of a News Opinion Extraction and Analysis System Based on Controllable Large Language Models	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	邱志義;魏志達	zh_TW
dc.contributor.oralexamcommittee	Chih-Yi Chiu;Jyh-Da Wei	en
dc.subject.keyword	引述抽取,意見探勘,資訊抽取,大型語言模型,新聞分析,	zh_TW
dc.subject.keyword	Quote Extraction,Sentiment Analysis,Information Extraction,Large Language Model,News Analysis,	en
dc.relation.page	55	-
dc.identifier.doi	10.6342/NTU202403256	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-10	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2027-08-10	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 此日期後於網路公開 2027-08-10	6.71 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。