Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92122
Title: | 利用 Prompt-Based 微調段落檢索器與 Alpaca-Lora 解碼器增強法說會預期問題生成 Expected Question Generation in Earnings Call Conferences Enhanced by Prompt-Based Fine-Tuning for Relevant Paragraph Retrieval with Alpaca-Lora |
Authors: | 阮羿寧 Yi-Ning Juan |
Advisor: | 陳信希 Hsin-Hsi Chen |
Co-Advisor: | 古倫維 Lun-Wei Ku |
Keyword: | 基於提示的微調,問題生成,檢索增強,大型語言模型,自然語言理解, Prompt-based Fine-tuning,Question Generation,Retrieval-Augmented,Large Language Model,Natural Language Understanding, |
Publication Year : | 2024 |
Degree: | 碩士 |
Abstract: | 從學術研討會到每個季度由企業召開的法說會,對於當今社會多樣的專業領域而言,口頭報告已成為日常工作中必不可少的一部分。演講/報告者會試圖通過分析觀眾的背景、興趣和對主題的認知,來預測觀眾可能的提問以進一步完善自己的報告。然而,這一過程不僅耗時,而且在觀眾群體龐大或多元化的情況下尤為困難。雖然近幾年自然語言處理(NLP)在問題生成方面取得了進展,但其主要焦點仍然在事實型問題的生成或教育場域評量型問題的應用,往往無法直接適用於商業或實際工作情境,因這些專業領域會面臨的問題通常更為複雜,且依賴於特定背景,常常需要一定的領域知識,如法說會中財經分析師所提出的問題。有鑑於此,本研究首次提出了專為專業領域如法說會設計的多問題生成(MQG)任務。我們收集並整理了一份包含近六千篇法說會逐字稿的資料集,其中包含公司經理人的報告內容,以及約十萬個分析師於法說會所提出的問題,根據我們提出的標記框架,這些問題又被進一步分為對應的不同類別。此外,我們提出了一種基於檢索器-生成器的共同訓練框架,以及利用檢索增強模型多問題生成的訓練策略,其中結合了基於提示(prompt-based)的微調(fine-tune)方法來檢索相關經理人報告中的段落,以增強生成模型預測分析師可能提出的問題的能力。實驗結果證實,我們提出的方法在生成問題的準確性、多樣性、一致性和困惑度以及人工評估上皆優於其他具競爭力的訓 練策略,證明了其有效性。本文探索了自動問題生成的新方法,而且通過實驗與人工評估驗證了其在實際專業領域應用場景中的效果,為自然語言處理領域的應用探索了新的可能性。 In diverse professional environments, from academic conferences to corporate earnings calls, being able to predict audience questions is crucial. Traditional methods, which rely on manual assessment of an audience''s background, interests, and subject knowledge, often fall short—particularly when facing large or heterogeneous groups, leading to inaccuracies and inefficiencies. Although Natural Language Processing (NLP) has advanced in generating questions from text, its focus has largely been on academic applications, not fully addressing the complex needs of professional scenarios like earnings calls. To fill this gap, our paper introduces a new task: multi-question generation (MQG), tailored for the context of earnings calls. Our approach includes collecting a vast number of earnings call transcripts and developing a unique annotation method to categorize potential questions. Additionally, we introduced a "Co-Trained Retriever-Generator Framework" that enhances the generator''s ability to produce a variety of questions likely to be asked by financial analysts. This is achieved by supplying the generator with information relevant to the questions during the training phase, focusing on the content of earnings calls. Our empirical evaluations demonstrate the effectiveness of our method, showcasing strong performance across question accuracy, diversity, consistency, perplexity, and human evaluation. We believe this approach could improve the quality of conference presentations and make communication more efficient, better meeting stakeholder needs. It represents a notable advancement in preparation tools within the financial communication field. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92122 |
DOI: | 10.6342/NTU202400722 |
Fulltext Rights: | 未授權 |
Appears in Collections: | 資料科學學位學程 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-112-1.pdf Restricted Access | 2 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.