請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83513
標題: | 法說會逐字稿的連貫性分析 Analyzing the Coherence Structure of Earnings Conference Call Transcripts |
作者: | 林聖典 Sheng-Dian Lin |
指導教授: | 盧信銘 Hsin-Min Lu |
關鍵字: | 法說會,連貫性分析,RoBERTa,少樣本學習,自監督學習,triplet loss, earnings conference calls,coherence analysis,RoBERTa,few-shot learning,self-supervised learning,triplet loss, |
出版年 : | 2022 |
學位: | 碩士 |
摘要: | 法說會是投資人取得公司第一手資訊的重要管道之一,無論是公司管理階層對於該季的業務現況、財務表現的說明,還是受邀分析師所提出的問題,都是投資者關注的重要資訊。然而,有關法說會逐字稿的研究,大多聚焦在公司披露的語氣、敘述結構,甚或是分析師的選角。上述研究大多聚焦在管理階層準備敘述(MPN)或問答部分(Q&A)其中之一,鮮少研究兩者之間的連貫性,但是公司的資訊揭露與分析師提問的差距可能是投資者感興趣的資訊。另一方面,連貫性分析在其他領域已有不少研究成果,卻極少應用在法說會的逐字稿分析中。 因此,本研究建立一個基於RoBERTa transformer的自然語言處理框架去分析法說會逐字稿的主要章節之間的連貫性。我們將逐字稿原文的主要章節段落串聯,建構成連貫性資料集,並定義了不同連貫程度的規則,接著利用RoBERTa transformer、池化策略將各段落組合轉化成句子表示(sentence representations),最後經過全連接層來預測連貫性。其中,為了改善模型表現,我們利用正負樣本之間的相對相似度設計了輔助任務(auxiliary task),讓RoBERTa transformer的模型參數取得更好的起始點,進一步提升模型效能。我們的實驗結果表明,事先通過輔助任務預訓練的RoBERTa模型,其分辨連貫性與預測連貫性程度的表現比其他模型更好。 本研究主要的三項貢獻是:(1) 將原始法說會逐字稿轉換成結構化的連貫性資料集,並標記人工標籤;(2) 定義區分連貫性程度的規則;(3) 提出結合了自監督學習、少樣本學習的RoBERTa模型架構來預測連貫性,並得到不錯的成效。 未來,我們希望透過實證研究證實法說會各章節之間的連貫性對於投資者而言是重要的資訊,也期待利用更新穎的模型架構提升預測的準確度。 An earnings conference call is one of the important channels for investors to obtain first-hand information about a company. Investors are concerned about the management's explanation of the current business status and financial performance of the quarter, and the questions asked by the invited analysts. However, the gap between company disclosures and analyst questions may be of interest to investors. Most of the earnings conference call transcripts research has focused on the disclosure tone, narrative structure, or even the casting of analysts. The above studies have focused on either the management prepared narrative (MPN) or the question and answer (Q&A) section, while little research on the coherence between the two. On the other hand, coherence analysis has many research results in other fields, but it is seldom applied in the analysis of earnings conference call transcripts. Therefore, our study establishes a natural language processing framework based on the RoBERTa transformer to analyze the coherence between the main sections of earnings conference call transcripts. We combine the paragraphs of those main sections of an earnings call transcript to construct coherent datasets and define the rules of different coherence degrees. Next, we use the RoBERTa transformer and pooling strategy to convert the combination of paragraphs into sentence representations and pass through fully connected layers to predict coherence. In addition, in order to improve the model performance, we designed an auxiliary task by distinguishing the relative similarity between positive and negative data points, so that the model parameters of the RoBERTa transformer can obtain a better starting point. Our experimental results show that the RoBERTa model pre-trained on auxiliary tasks performs better than other models in detecting coherence and predicting the degree of coherence. There are three main contributions to this paper: (1) transform original transcripts into coherence datasets and annotate them with expert label, (2) define coherence relation rules to distinguish the degree of coherence, and (3) propose a RoBERTa-based model architecture combining self-supervised and few-shot learning approaches and reach good performances. In the future, we hope to confirm through empirical research that the coherence between the main sections of the earnings conference calls is important information for investors, and we also look forward to more novel model architectures to improve model performance. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83513 |
DOI: | 10.6342/NTU202203102 |
全文授權: | 未授權 |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-110-2.pdf 目前未授權公開取用 | 3.45 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。