法說會逐字稿的連貫性分析

林聖典; Sheng-Dian Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83513

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	盧信銘	zh_TW
dc.contributor.advisor	Hsin-Min Lu	en
dc.contributor.author	林聖典	zh_TW
dc.contributor.author	Sheng-Dian Lin	en
dc.date.accessioned	2023-03-19T21:09:20Z	-
dc.date.available	2023-11-10	-
dc.date.copyright	2022-09-06	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	Reference [1] Abbasi, A., Zhou, Y., Deng, S., & Zhang, P. (2018). Text analytics to support sense-making in social media: A language-action perspective. MIS Quarterly, 42, 427-464. [2] Abhishek, T., Rawat, D., Gupta, M., & Varma, V. (2021). Transformer models for text coherence assessment. arXiv preprint arXiv:2109.02176. [3] Adams, P. H., & Martell, C. H. (2008). Topic detection and extraction in chat. 2008 IEEE international conference on Semantic computing, [4] Allee, K. D., & DeAngelis, M. D. (2015). The structure of voluntary disclosure narratives: Evidence from tone dispersion. Journal of Accounting Research, 53(2), 241-274. [5] Allee, K. D., Do, C., & Sterin, M. (2021). Product market competition, disclosure framing, and casting in earnings conference calls. Journal of Accounting and Economics, 72(1), 101405. [6] Cohen, L., Lou, D., & Malloy, C. J. (2020). Casting conference calls. Management Science, 66(11), 5015-5039. [7] Cui, B., Li, Y., Zhang, Y., & Zhang, Z. (2017). Text coherence analysis based on deep neural network. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, [8] Dantas, T. (2020). Multi-Task Learning with Pytorch and FastAI. https://towardsdatascience.com/multi-task-learning-with-pytorch-and-fastai-6d10dc7ce855 [9] Deng, S., Zhou, Y., Zhang, P., & Abbasi, A. (2019). Using discussion logic in analyzing online group discussions: A text mining approach. Information & Management, 56(4), 536-551. [10] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [11] Dey, S., Dutta, A., Toledo, J. I., Ghosh, S. K., Lladós, J., & Pal, U. (2017). Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131. [12] Fearon, S., & Powell, J. (2015). Dead Companies Walking: How a Hedge Fund Manager Finds Opportunity in Unexpected Places. Macmillan. [13] Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524. [14] Fu, T., Abbasi, A., & Chen, H. (2008). A hybrid approach to web forum interactional coherence analysis. Journal of the American Society for Information Science and Technology, 59(8), 1195-1209. [15] Gashteovski, K., Gemulla, R., & Corro, L. d. (2017). Minie: minimizing facts in open information extraction. [16] Huang, A. H., Zang, A. Y., & Zheng, R. (2014). Evidence on the information content of text in analyst reports. The Accounting Review, 89(6), 2151-2180. [17] Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE conference on computer vision and pattern recognition, [18] Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45(2-3), 221-247. [19] Li, J., & Hovy, E. (2014). A model of coherence based on distributed sentence representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), [20] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. [21] Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230. [22] Matsumoto, D., Pronk, M., & Roelofsen, E. (2011). What makes conference calls useful? The information content of managers' presentations and analysts' discussion sessions. The Accounting Review, 86(4), 1383-1414. [23] Mayew, W. J., Sethuraman, M., & Venkatachalam, M. (2015). MD&A Disclosure and the Firm's Ability to Continue as a Going Concern. The Accounting Review, 90(4), 1621-1651. [24] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. [25] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41. [26] Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. [27] Savický, P., & Hlavácová, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215-231. [28] Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), 2673-2681. [29] Suslava, K. (2021). “Stiff Business Headwinds and Uncharted Economic Waters”: The Use of Euphemisms in Earnings Conference Calls. Management Science, 67(11), 7184-7213. [30] Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks. ICML, [31] Te’eni, D. (2001). Review: A Cognitive Effective Model of Organizational Communication for Designing IT” MIS Quarterly,(25) 2. In: June. [32] Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3), 1-34.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83513	-
dc.description.abstract	法說會是投資人取得公司第一手資訊的重要管道之一，無論是公司管理階層對於該季的業務現況、財務表現的說明，還是受邀分析師所提出的問題，都是投資者關注的重要資訊。然而，有關法說會逐字稿的研究，大多聚焦在公司披露的語氣、敘述結構，甚或是分析師的選角。上述研究大多聚焦在管理階層準備敘述(MPN)或問答部分(Q&A)其中之一，鮮少研究兩者之間的連貫性，但是公司的資訊揭露與分析師提問的差距可能是投資者感興趣的資訊。另一方面，連貫性分析在其他領域已有不少研究成果，卻極少應用在法說會的逐字稿分析中。因此，本研究建立一個基於RoBERTa transformer的自然語言處理框架去分析法說會逐字稿的主要章節之間的連貫性。我們將逐字稿原文的主要章節段落串聯，建構成連貫性資料集，並定義了不同連貫程度的規則，接著利用RoBERTa transformer、池化策略將各段落組合轉化成句子表示(sentence representations)，最後經過全連接層來預測連貫性。其中，為了改善模型表現，我們利用正負樣本之間的相對相似度設計了輔助任務(auxiliary task)，讓RoBERTa transformer的模型參數取得更好的起始點，進一步提升模型效能。我們的實驗結果表明，事先通過輔助任務預訓練的RoBERTa模型，其分辨連貫性與預測連貫性程度的表現比其他模型更好。本研究主要的三項貢獻是：(1) 將原始法說會逐字稿轉換成結構化的連貫性資料集，並標記人工標籤；(2) 定義區分連貫性程度的規則；(3) 提出結合了自監督學習、少樣本學習的RoBERTa模型架構來預測連貫性，並得到不錯的成效。未來，我們希望透過實證研究證實法說會各章節之間的連貫性對於投資者而言是重要的資訊，也期待利用更新穎的模型架構提升預測的準確度。	zh_TW
dc.description.abstract	An earnings conference call is one of the important channels for investors to obtain first-hand information about a company. Investors are concerned about the management's explanation of the current business status and financial performance of the quarter, and the questions asked by the invited analysts. However, the gap between company disclosures and analyst questions may be of interest to investors. Most of the earnings conference call transcripts research has focused on the disclosure tone, narrative structure, or even the casting of analysts. The above studies have focused on either the management prepared narrative (MPN) or the question and answer (Q&A) section, while little research on the coherence between the two. On the other hand, coherence analysis has many research results in other fields, but it is seldom applied in the analysis of earnings conference call transcripts. Therefore, our study establishes a natural language processing framework based on the RoBERTa transformer to analyze the coherence between the main sections of earnings conference call transcripts. We combine the paragraphs of those main sections of an earnings call transcript to construct coherent datasets and define the rules of different coherence degrees. Next, we use the RoBERTa transformer and pooling strategy to convert the combination of paragraphs into sentence representations and pass through fully connected layers to predict coherence. In addition, in order to improve the model performance, we designed an auxiliary task by distinguishing the relative similarity between positive and negative data points, so that the model parameters of the RoBERTa transformer can obtain a better starting point. Our experimental results show that the RoBERTa model pre-trained on auxiliary tasks performs better than other models in detecting coherence and predicting the degree of coherence. There are three main contributions to this paper: (1) transform original transcripts into coherence datasets and annotate them with expert label, (2) define coherence relation rules to distinguish the degree of coherence, and (3) propose a RoBERTa-based model architecture combining self-supervised and few-shot learning approaches and reach good performances. In the future, we hope to confirm through empirical research that the coherence between the main sections of the earnings conference calls is important information for investors, and we also look forward to more novel model architectures to improve model performance.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T21:09:20Z (GMT). No. of bitstreams: 1 U0001-0209202212442300.pdf: 3529223 bytes, checksum: 0824a3d3542442d413c64b5139d206ad (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Contents 誌謝 i 摘要 ii ABSTRACT iv LIST OF FIGURES ix LIST OF TABLES xi LIST OF EQUATIONS xii Chapter 1 Introduction 1 Chapter 2 Literature Review 9 2.1 Text Analysis on Earnings Call Transcripts 9 2.1.1 Tone & Uncertainty 9 2.1.2 Tone Dispersion & Average Reduced Frequency (ARF) 10 2.1.3 Casting 12 2.2 Coherence Analysis 14 2.2.1 Text-mining Approach 14 2.2.2 Neural Network Approach 19 2.3 Small Data Improvement Approach 28 2.3.1 Self-supervised learning 28 2.3.2 Few-shot learning 29 2.4 Research Gap 36 2.5 Research Questions 36 Chapter 3 System Design 37 3.1 RoBERTa Backbone 37 3.2 Multi-Task Learning (MTL) 40 3.3 Few-Shot Learning (FSL) 41 3.3.1 Semi-supervised learning 41 3.4 Triplet Loss 42 3.5 Self-Supervised Learning + FSL Approach 44 3.6 Rule-Based Method 46 3.7 Linear Regression and Logistic Regression 46 3.8 Roberta Baseline 47 Chapter 4 Dataset 48 4.1 Data Format 48 4.2 Annotation Rules 51 4.3 Annotation Examples 54 4.3.1 MPN_Q: (0, 1) – Completely irrelevant among the two sessions 54 4.3.2 MPN_Q: (0, 2) – irrelevant but some keywords in MPN are mentioned 56 4.3.3 MPN_Q: (0, 3) – not really relevant but it should be answered by the person who presents this 59 4.3.4 MPN_Q: (1, 3) – weakly relevant 62 4.3.5 MPN_Q: (1, 4) – somewhat relevant but should not be answered by the person who presents this MPN 65 4.3.6 MPN_Q: (1, 5) – The question is exactly what this paragraph is about 67 4.3.7 Q_A: (0, 1) – confidential (agreement) 70 4.3.8 Q_A: (0, 2) – Not fully answer to the point 71 4.3.9 Q_A: (0, 3) – Not sure if it answers to the point 71 4.3.10 Q_A: (1, 3) – Not sure if it answers to the point 73 4.3.11 Q_A: (1, 4) – only answer some questions (1~n-1) 74 4.3.12 Q_A: (1, 5) – All questions answered 76 4.4 Descriptive Statistics 78 Chapter 5 Experimental Design 82 Chapter 6 Experimental Results 85 Chapter 7 Conclusion 93 7.1 Main Contributions 93 7.2 Future Work 94 Chapter 8 Reference 95	-
dc.language.iso	en	-
dc.title	法說會逐字稿的連貫性分析	zh_TW
dc.title	Analyzing the Coherence Structure of Earnings Conference Call Transcripts	en
dc.type	Thesis	-
dc.date.schoolyear	110-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	洪茂蔚;畢南怡	zh_TW
dc.contributor.oralexamcommittee	Mao-Wei Hung;Nan-Yi Bi	en
dc.subject.keyword	法說會,連貫性分析,RoBERTa,少樣本學習,自監督學習,triplet loss,	zh_TW
dc.subject.keyword	earnings conference calls,coherence analysis,RoBERTa,few-shot learning,self-supervised learning,triplet loss,	en
dc.relation.page	98	-
dc.identifier.doi	10.6342/NTU202203102	-
dc.rights.note	未授權	-
dc.date.accepted	2022-09-02	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-110-2.pdf 目前未授權公開取用	3.45 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。