大型語言模型在語音到文字摘要生成任務中的評估: 基於會議語料庫的研究

邱語謙; Yu-Chien Chiu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94095

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎	zh_TW
dc.contributor.advisor	Seng-Cho Chou	en
dc.contributor.author	邱語謙	zh_TW
dc.contributor.author	Yu-Chien Chiu	en
dc.date.accessioned	2024-08-14T16:40:00Z	-
dc.date.available	2024-08-15	-
dc.date.copyright	2024-08-14	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-31	-
dc.identifier.citation	Introducing meta llama 3: The most capable openly available llm to date. https://ai.meta.com/blog/meta-llama-3/. 2024. O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533–1545, 2014. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. S. Alshaina, A. John, and A. G. Nath. Multi-document abstractive summarization based on predicate argument structure. In 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pages 1–6. IEEE, 2017. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. L. Dong, S. Xu, and B. Xu. Speech-transformer: a no-recurrence sequence-to45 sequence model for speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5884–5888. IEEE, 2018. G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457–479, Dec. 2004. A. Graves, N. Jaitly, and A.-r. Mohamed. Hybrid speech recognition with deep bidirectional lstm. In 2013 IEEE workshop on automatic speech recognition and understanding, pages 273–278. IEEE, 2013. A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013. S. Gupta and S. K. Gupta. Abstractive summarization: An overview of the state of the art. Expert Systems with Applications, 121:49–65, 2019. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012. A. Khan, N. Salim, H. Farman, M. Khan, B. Jan, A. Ahmad, I. Ahmed, and A. Paul. Abstractive text summarization based on improved semantic graph approach. International Journal of Parallel Programming, 46:992–1016, 2018. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004. 46 Y. Liu. Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318, 2019. R. Mihalcea and P. Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004. H. Nori, N. King, S. M. McKinney, D. Carignan, and E. Horvitz. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023. OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundage, K. Button, T. Cai, R. Campbell, A. Cann, B. Carey, C. Carlson, R. Carmichael, B. Chan, C. Chang, F. Chantzis, D. Chen, S. Chen, R. Chen, J. Chen, M. Chen, B. Chess, C. Cho, C. Chu, H. W. Chung, D. Cummings, J. Currier, Y. Dai, C. Decareaux, T. Degry, N. Deutsch, D. Deville, A. Dhar, D. Dohan, S. Dowling, S. Dunning, A. Ecoffet, A. Eleti, T. Eloundou, D. Farhi, L. Fedus, N. Felix, S. P. Fishman, J. Forte, I. Fulford, L. Gao, E. Georges, C. Gibson, V. Goel, T. Gogineni, G. Goh, R. Gontijo-Lopes, J. Gordon, M. Grafstein, S. Gray, R. Greene, J. Gross, S. S. Gu, Y. Guo, C. Hallacy, J. Han, J. Harris, Y. He, M. Heaton, J. Heidecke, C. Hesse, A. Hickey, W. Hickey, P. Hoeschele, B. Houghton, K. Hsu, S. Hu, X. Hu, J. Huizinga, S. Jain, S. Jain, J. Jang, A. Jiang, R. Jiang, H. Jin, D. Jin, S. Jomoto, B. Jonn, H. Jun, T. Kaftan, Łukasz Kaiser, A. Kamali, I. Kanitscheider, N. S. Keskar, T. Khan, L. Kilpatrick, J. W. Kim, C. Kim, Y. Kim, J. H. Kirchner, J. Kiros, M. Knight, D. Kokotajlo, Łukasz Kondraciuk, 47 A. Kondrich, A. Konstantinidis, K. Kosic, G. Krueger, V. Kuo, M. Lampe, I. Lan, T. Lee, J. Leike, J. Leung, D. Levy, C. M. Li, R. Lim, M. Lin, S. Lin, M. Litwin, T. Lopez, R. Lowe, P. Lue, A. Makanju, K. Malfacini, S. Manning, T. Markov, Y. Markovski, B. Martin, K. Mayer, A. Mayne, B. McGrew, S. M. McKinney, C. McLeavey, P. McMillan, J. McNeil, D. Medina, A. Mehta, J. Menick, L. Metz, A. Mishchenko, P. Mishkin, V. Monaco, E. Morikawa, D. Mossing, T. Mu, M. Murati, O. Murk, D. Mély, A. Nair, R. Nakano, R. Nayak, A. Neelakantan, R. Ngo, H. Noh, L. Ouyang, C. O’Keefe, J. Pachocki, A. Paino, J. Palermo, A. Pantuliano, G. Parascandolo, J. Parish, E. Parparita, A. Passos, M. Pavlov, A. Peng, A. Perelman, F. de Avila Belbute Peres, M. Petrov, H. P. de Oliveira Pinto, Michael, Pokorny, M. Pokrass, V. H. Pong, T. Powell, A. Power, B. Power, E. Proehl, R. Puri, A. Radford, J. Rae, A. Ramesh, C. Raymond, F. Real, K. Rimbach, C. Ross, B. Rotsted, H. Roussez, N. Ryder, M. Saltarelli, T. Sanders, S. Santurkar, G. Sastry, H. Schmidt, D. Schnurr, J. Schulman, D. Selsam, K. Sheppard, T. Sherbakov, J. Shieh, S. Shoker, P. Shyam, S. Sidor, E. Sigler, M. Simens, J. Sitkin, K. Slama, I. Sohl, B. Sokolowsky, Y. Song, N. Staudacher, F. P. Such, N. Summers, I. Sutskever, J. Tang, N. Tezak, M. B. Thompson, P. Tillet, A. Tootoonchian, E. Tseng, P. Tuggle, N. Turley, J. Tworek, J. F. C. Uribe, A. Vallone, A. Vijayvergiya, C. Voss, C. Wainwright, J. J. Wang, A. Wang, B. Wang, J. Ward, J. Wei, C. Weinmann, A. Welihinda, P. Welinder, J. Weng, L. Weng, M. Wiethoff, D. Willner, C. Winter, S. Wolrich, H. Wong, L. Workman, S. Wu, J. Wu, M. Wu, K. Xiao, T. Xu, S. Yoo, K. Yu, Q. Yuan, W. Zaremba, R. Zellers, C. Zhang, M. Zhang, S. Zhao, T. Zheng, J. Zhuang, W. Zhuk, and B. Zoph. Gpt-4 technical report, 2024. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever. Robust 48 speech recognition via large-scale weak supervision, 2022. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. 2018. M. Razavi, R. Rasipuram, and M. Magimai-Doss. On modeling context-dependent clustered states: Comparing hmm/gmm, hybrid hmm/ann and kl-hmm approaches. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 7659–7663. IEEE, 2014. M. Reid, N. Savinov, D. Teplyashin, D. Lepikhin, T. Lillicrap, J.-b. Alayrac, R. Soricut, A. Lazaridou, O. Firat, J. Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024. H. Sak, A. Senior, and F. Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint arXiv:1402.1128, 2014. J. Savelka. Unlocking practical applications in legal domain: Evaluation of gpt for zero-shot semantic annotation of legal texts. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pages 447–451, 2023. P. Swietojanski, A. Ghoshal, and S. Renals. Revisiting hybrid and gmm-hmm system combination techniques. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6744–6748. IEEE, 2013. G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. 49 H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. D. Wang, P. Liu, Y. Zheng, X. Qiu, and X. Huang. Heterogeneous graph neural networks for extractive document summarization. arXiv preprint arXiv:2004.12393, 2020. J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, J. Zhou, S. Chen, T. Gui, Q. Zhang, and X. Huang. A comprehensive capability analysis of gpt-3 and gpt-3.5 series models, 2023. J. Zhang, Y. Zhao, M. Saleh, and P. Liu. Pegasus: Pre-training with extracted gapsentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR, 2020. Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao. Neural document summarization by jointly learning to score and select sentences. arXiv preprint arXiv:1807.02305, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94095	-
dc.description.abstract	隨著現代企業和組織日益依賴會議來進行溝通和決策，會議記錄和摘要的生成變得尤為重要。手動撰寫會議記錄和摘要過程繁瑣且容易出錯，並且品質往往依賴於紀錄者的能力，因此過去便有相關研究使用語言模型進行會議摘要。而近年來，大型語言模型崛起，如 GPT、Gemini 和 Llama 系列，目前已經有先關研究證明大型語言模型在自然語言處理相關任務如: 文本生成、翻譯、問答系統等有優異的表現。會議記錄文本通常較長且結構複雜，包含多個發言者的對話，涉及話題廣泛，導致會議摘要的生成與一般文章摘要有所不同。然而目前針對使用大型語言模型生成會議摘要的研究仍然相對較少，因此本研究旨在填補這一空缺。本研究評估多種大型語言模型在生成會議摘要方面的效果。通過使用 AMI 會議語料庫，並結合不同的預處理方法（如 Google 語音識別和 Whisper Base 轉錄）及不同的Prompt設計，來比較這些模型的表現。研究結果顯示，GPT-4 在大多數情況下表現最佳，但在需要高度精確率的情境中，GPT-3.5 更具優勢。而Gemini 1.5 Pro 在召回率方面表現突出。本研究提供使用不同大型語言模型在實際生成會議摘要時的建議，可以依據不同需求選擇相應解決方案，期望這些發現能夠幫助企業和組織選擇適合的技術來提高會議記錄和摘要的效率與準確性。	zh_TW
dc.description.abstract	Enterprises and organizations increasingly rely on meetings for communication and decision-making, the generation of meeting summaries has become particularly important. The manual process of writing meeting summaries is cumbersome and prone to errors, with the quality often depending on the recorder's ability. Consequently, there has been related research using language models for meeting summarization in the past. In recent years, the rise of large language models such as GPT, Gemini, and Llama series has demonstrated exceptional performance in natural language processing tasks like text generation, translation, and question-answering systems. Meeting transcripts are usually lengthy and complex, involving multiple speakers' dialogues and covering a wide range of topics, making meeting summarization different from general article summarization. However, there is relatively little research on using large language models for meeting summarization, and this study aims to fill this gap. This study evaluates the effectiveness of various large language models in generating meeting summaries. By using the AMI meeting corpus and combining different preprocessing methods (such as Google Speech Recognition and Whisper transcription) with different prompt designs, and compare the performance of these models. The research results show that GPT-4 performs best in most cases, but GPT-3.5 is more advantageous in situations requiring high precision. On the other hand, Gemini 1.5 Pro excels in recall rate. This study provides recommendations for using different large language models in practical meeting summarization scenarios, allowing for the selection of appropriate solutions based on specific needs. It is hoped that these findings can help enterprises and organizations choose suitable technologies to improve the efficiency and accuracy of meeting minutes and summaries.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-14T16:39:59Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-14T16:40:00Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables viii Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Literature Review 4 2.1 Automatic Speech Recognition . . . . . . . . . . . . . . . . . . . . . 4 2.2 Text Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 Methodology 9 3.1 Research Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 Evaluation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 4 Experiments 28 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Experimental Analysis: Impact of Prompts . . . . . . . . . . . . . . 29 4.3 Comparison of Language Model and Large Language Models . . . . 38 4.4 Evaluation of Practical Applications of Language Models and Large Language Models for Meeting Summarization . . . . . . . . . . . . 40 Chapter 5 Conclusion 43 References 45	-
dc.language.iso	en	-
dc.title	大型語言模型在語音到文字摘要生成任務中的評估: 基於會議語料庫的研究	zh_TW
dc.title	Evaluation of LLM on Verbal to Text Summarization: A Study Using the Meeting Corpus	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳建錦;杜志挺	zh_TW
dc.contributor.oralexamcommittee	Chien-Chin Chen;Timon Du	en
dc.subject.keyword	大型語言模型,會議摘要,自動語音識別,提示工程,	zh_TW
dc.subject.keyword	Large Language Models,Meeting Summarization,Automatic Speech Recognition,Prompt Engineering,	en
dc.relation.page	50	-
dc.identifier.doi	10.6342/NTU202402787	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-02	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
dc.date.embargo-lift	2025-07-30	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 此日期後於網路公開 2025-07-30	718.69 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。