應用遷移學習與文字探勘分析致股東報告書

Yu-Te Chen; 陳予得

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8502

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林嬋娟(Chan-Jane Lin)
dc.contributor.author	Yu-Te Chen	en
dc.contributor.author	陳予得	zh_TW
dc.date.accessioned	2021-05-20T00:56:03Z	-
dc.date.available	2021-05-20T00:56:03Z	-
dc.date.issued	2021
dc.date.submitted	2021-04-08
dc.identifier.citation	Antweiler, W., and Murray, F. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294. Biddle, G. C., Hilary, G. Verdi, R. S. (2009). How does financial reporting quality relate to investment efﬁciency? Journal of Accounting and Economics, 48, 112–131. Bochkay, K., Levine, C. B. (2019). Using MD A to improve earnings forecasts. Journal of Accounting, Auditing Finance, 34(3), 458-482. doi:10.1177/0148558X17722919. Bryan, S. H. (1997). Incremental information content of required disclosures contained in management discussion and analysis. The Accounting Review, 72(2), 285-301. Campbell, J. L., Chen, H., Dhaliwal, D. S., Lu, H., Steele, L. B. (2014). The information content of mandatory risk factor disclosures in corporate filings. Review of Accounting Studies, 19, 396-455. doi:10.1007/s11142-013-9258-3. Cole, C. J., and Jones, C. L. (2004). The usefulness of MD A disclosures in the retail industry. Journal of Accounting, Auditing Finance, 19(4), 361-388. Delvin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformer for language understanding. arXiv:1810.04805. Ding, K., Peng, X., and Wang, Y. (2019). A machine learning-based peer selection method with financial ratios. Accounting Horizons, 33(3), 75-87. doi:10.2308/acch-52454. Elwany, E., Moore, D., and Oberoi. G. (2019). BERT goes to law school: Quantifying the competitive advantage of access to large legal corpora in contract understanding. In workshop on document intelligence at NeurIPS 2019. Henry, E. (2008). Are investors influenced by how earnings press releases are written? The Journal of Business Communication, 45(4), 363-407. doi:10.1177/0021943608319388 Hiew, J. Z. G., Huang, X., Mou, H., Li, D., Wu, Q., and Xu, Y. (2019). Bert-based financial sentiment index and LSTM-based stock return predictability. Submitted to NeurIPS 2019, under review. arXiv:1906.09024. Hoberg, G., and Phillips, G. (2015). Text-based network industries and endogenous product differentiation. Journal of Political Economy : Forthcoming. Huang, A., Zang, A. Zheng, R. (2014). Evidence on the information content of text in analyst reports. The Accounting Review, 89, 2151-2180. doi:10.2308/accr-50833. Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45(2-3), 221-247. Li, F. (2010a). The information content of forward-looking statements in corporate filings—A naive Bayesian machine learning approach. Journal of Accounting Research, 48, 1049-1102. Li, F. (2010b). Textual analysis of corporate disclosures: A survey of the literature. Journal of Accounting Literature, 29, 143-165. Li, F., Lundholm, R. J., and Minnis, M. (2012), A measure of competition based on 10-K filings. Chicago booth research paper 11-30, Journal of Accounting Research. Li, M., Li, W., Wang, F., Jia, X, and Rui, G. (2020). Applying BERT to analyze investor sentiment in stock market. Neural Computing Applications. doi:10.1007/s00521-020-05411-7 Loughran, T., and McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66(1), 35-65. Loughran, T., and Mcdonald, B. (2014). Measuring readability in financial disclosures. Journal of Finance, 69(4), 1643-1671. Loughran, T., and McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230. Miller, B. P. (2010). The effects of reporting complexity on small and large investor trading. The Accounting Review, 85, 2107- 2143. Pan, S. J. and Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345-1359. doi: 10.1109/TKDE.2009.191. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv:1802.05365, 2018. Price, S. M., Doran, J. S., Peterson, D. R., and Bliss, B. A. (2012). Earnings conference calls and stock returns: The incremental informativeness of textual tone. Journal of Banking Finance, 36(4), 992-1011. Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training. Rogers, L. J., Buskirk A. V., and Zechman, S. L. C. (2011). Disclosure tone and shareholder litigation. The Accounting Review, 86(6), 2155-2183. Securities and Exchange Commission (SEC). (1987). Concept release on management’s discussion and analysis of financial condition and results of operations. Securities Act Release No. 6711. Washington, D.C.: SEC. Securities and Exchange Commission (SEC). (2003). Interpretation: commission guidance regarding management’s discussion and analysis of financial condition and results of operations. Securities Act Release No. 8350. Washington, D.C.: SEC. Siano, F., and Wysocki, P. (2018). The primacy of numbers in financial and accounting disclosures: Implications for textual analysis research. Siano, F., and Wysocki, P. (2020). Transfer learning and textual analysis of accounting disclosures: Applying big data methods to small(er) data sets. doi: 10.2139/ssrn.3560355 Sun, Y. (2010). Do MD A disclosures help users interpret disproportionate inventory increase? The Accounting Review 85(4), 1411-1440.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8502	-
dc.description.abstract	本研究先以自然語言處理方法中的BERT (Bidirectional Encoder Representation from Transformers) 建立文字探勘模型，並利用致股東報告書對BERT進行微調 (fine-tuning)。接著探討BERT是否解決過往文字探勘方法的缺點，最後以情緒分析 (Sentiment Analysis) 剖析致股東報告書的語調，研究致股東報告書語調對於公司未來績效的影響。實證結果顯示，致股東報告書須針對中英夾雜問題做前處理，而經過驗證資料集表現篩選超參數 (hyperparameter) 後，BERT模型分類準確率高達86%。經過視覺化BERT模型的運作，發現其能捕捉否定詞修飾的詞彙，且同樣能成功捕捉形容詞所修飾的名詞。語境測試結果顯示，將文字順序隨機打亂後，BERT表現大幅下滑，因此可知BERT確實有學習到語言結構。然而關於語調對公司未來績效的影響，從實證結果發現，當年(t)的致股東報告書情緒對隔年(t+1)的盈餘並無顯著影響，推論原因可能是樣本篩選不夠具代表性，或是台灣致股東報書本身與美國的MD A資訊含量有差異，導致台灣的致股東報告書與未來盈餘並無呈顯著關聯。	zh_TW
dc.description.abstract	First, this study applies BERT (Bidirectional Encoder Representation from Transformers) to construct a text mining model, and uses Report to Shareholders to fine-tune BERT. Next, we will discuss whether BERT can overcome some weaknesses of traditional text mining techniques. Finally, this study tries to assess the impact of the tone in Report to Shareholders on company’s future performance by using Sentiment Analysis. The empirical result shows that the problem of mixing Chinese and English in Report to Shareholders must be tackled, and after choosing the best hyperparameter based on validation performance, the classification accuracy reaches up to 86%. By visualizing the operation of BERT, we find that BERT can not only capture the relation between the word and its negation, but also capture the relation between the adjective and the noun successfully. The result from the context test also shows that the performance of BERT drop significantly after the text sequence is randomly shuffled, so it is considered that the language structure of Chinese is learned by BERT. However, regarding to the impact of tone in Report to Shareholders on the company’s future performance, the empirical result shows that the sentiment in Report to Shareholders has no significant impact on the next year’s earnings. The results suggest that the sample may not be representative enough or Taiwan’s Report to Shareholders has less information values than the US’s MD A information content so that there is no significant relation between the tone and the next year’s earnings.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T00:56:03Z (GMT). No. of bitstreams: 1 U0001-0804202112442900.pdf: 2262656 bytes, checksum: 398f80674174deaffaee7cd5a50b9034 (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv 目錄 v 圖目錄 vii 表目錄 viii 第一章緒論 1 第一節研究動機與目的 1 第二章文獻探討 4 第一節文字探勘於會計與財金領域上之應用 4 (一) 文字探勘於會計財金領域之應用 4 (二) 文字探勘方法之比較分析 7 第二節自然語言處理方法-BERT 9 (一) 遷移學習 9 (二) BERT 11 第三節致股東報告書資訊價值 12 第三章研究設計 15 第一節研究流程 15 第二節監督式學習 16 第三節模型選擇與視覺化 17 (一) 模型選擇 17 (二) 交叉驗證 18 (三) BERT視覺化 19 (四) 語境測試 19 (五) 預測盈餘能力 19 第四節樣本選取 20 (一) 樣本期間 20 (二) 樣本篩選 22 (三) 樣本標記 23 第四章實證結果 26 第一節模型分類結果 26 (一) 模型選擇 27 (二) 交叉驗證 28 第二節視覺化與語境測試 32 (一) BERT視覺化 32 (二) 語境測試 35 第三節預測盈餘能力 37 (一) 敘述性統計 37 (二) 回歸結果 38 第四節額外測試 41 (一) 字典法 41 (二) TF-IDF 43 第五章研究結論與限制 46 參考文獻 48 附錄 51 附錄一 BERTViz 51
dc.language.iso	zh-TW
dc.title	應用遷移學習與文字探勘分析致股東報告書	zh_TW
dc.title	Application of Transfer learning and Text Mining on Reports to Shareholders	en
dc.type	Thesis
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	周濟群(Chi-Chun Chou),盧信銘(Hsin-Min Lu)
dc.subject.keyword	深度學習,BERT,文字探勘,情緒分析,盈餘預測,致股東報告書,	zh_TW
dc.subject.keyword	Deep Learning,BERT,Text Mining,Sentiment Analysis,Earnings Prediction,Report to Shareholders,	en
dc.relation.page	53
dc.identifier.doi	10.6342/NTU202100823
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2021-04-08
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	會計學研究所	zh_TW
顯示於系所單位：	會計學系

文件中的檔案：

檔案	大小	格式
U0001-0804202112442900.pdf	2.21 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。