請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81609完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 盧信銘(Hsin-Min Lu) | |
| dc.contributor.author | Yu-Hsuan Chuang | en |
| dc.contributor.author | 莊于萱 | zh_TW |
| dc.date.accessioned | 2022-11-24T09:24:42Z | - |
| dc.date.copyright | 2021-11-11 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-08-25 | |
| dc.identifier.citation | Alsabti, K., Ranka, S., Singh, V. (1997). An Efficient K-Means Clustering Algorithm. Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. Boureau, Y.-L., Ponce, J., LeCun, Y. (2010). A Theoretical Analysis of Feature Pooling in Visual Recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 111-118). Brown, S. V., Tucker, J. W. (2011). Large‐Sample Evidence on Firms’ Year‐over‐Year MD A Modifications. Journal of Accounting Research, 49(2), 309-346. Cai, D., He, X., Han, J. (2005). Document Clustering Using Locality Preserving Indexing. IEEE Transactions on Knowledge and Data Engineering, 17(12), 1624-1637. Cho, H., Muslu, V. (2021). How Do Firms Change Investments Based on MD A Disclosures of Peer Firms? The Accounting Review, 96(2), 177-204. Cohen, L., Malloy, C., Nguyen, Q. (2020). Lazy Prices. The Journal of Finance, 75(3), 1371-1415. Davis, A. K., Piger, J. M., Sedor, L. M. (2012). Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language. Contemporary Accounting Research, 29(3), 845-868. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391-407. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. Deza, M. M., Deza, E. (2009). Encyclopedia of Distances. In Encyclopedia of Distances (pp. 1-583). Springer. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T. (2015). Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2625-2634). Dos Santos, C., Zadrozny, B. (2014). Learning Character-Level Representations for Part-of-Speech Tagging. In International Conference on Machine Learning (pp. 1818-1826). Durnev, A., Mangen, C. (2020). The Spillover Effects of MD A Disclosures for Real Investment: The Role of Industry Competition. Journal of Accounting and Economics, 70(1), 101299. Dyer, T., Lang, M., Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245. Feldman, R., Govindaraj, S., Livnat, J., Segal, B. (2010). Management’s Tone Change, Post Earnings Announcement Drift and Accruals. Review of Accounting Studies, 15(4), 915-953. Graves, A., Schmidhuber, J. (2005). Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18(5-6), 602-610. Gupta, P. (2012). Design and Analysis of Algorithms. PHI Learning. He, Z., Wang, Z., Wei, W., Feng, S., Mao, X., Jiang, S. (2020). A Survey on Recent Advances in Sequence Labeling from Deep Learning Models. arXiv preprint arXiv:2011.06727. Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. Huang, A. (2008). Similarity Measures for Text Document Clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand (Vol. 4, pp. 9-56). Huang, Z., Xu, W., Yu, K. (2015). Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint arXiv:1508.01991. Hunt, J. W., MacIlroy, M. D. (1976). An Algorithm for Differential File Comparison. Bell Laboratories Murray Hill. Kingma, D. P., Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980. Kontostathis, A., Pottenger, W. M. (2006). A Framework for Understanding Latent Semantic Indexing (LSI) Performance. Information Processing Management, 42(1), 56-73. Kothari, S. P., Li, X., Short, J. E. (2009). The Effect of Disclosures by Management, Analysts, and Business Press on Cost of Capital, Return Volatility, and Analyst Forecasts: A Study Using Content Analysis. The Accounting Review, 84(5), 1639-1670. Krestel, R., Fankhauser, P., Nejdl, W. (2009). Latent Dirichlet Allocation for Tag Recommendation. In Proceedings of the Third ACM Conference on Recommender Systems (pp. 61-68). Kullback, S., Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86. Lafferty, J., McCallum, A., Pereira, F. C. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C. (2016). Neural Architectures for Named Entity Recognition. arXiv preprint arXiv:1603.01360. Lau, J. H., Baldwin, T. (2016). An Empirical Evaluation of Doc2Vec with Practical Insights into Document Embedding Generation. arXiv preprint arXiv:1607.05368. Le, Q., Mikolov, T. (2014). Distributed Representations of Sentences and Documents. In International Conference on Machine Learning (pp. 1188-1196). LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324. Li, F. (2008). Annual Report Readability, Current Earnings, and Earnings Persistence. Journal of Accounting and Economics, 45(2-3), 221-247. Li, F. (2010). The Information Content of Forward‐Looking Statements in Corporate Filings—a Naïve Bayesian Machine Learning Approach. Journal of Accounting Research, 48(5), 1049-1102. Ling, W., Luís, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., Black, A. W., Trancoso, I. (2015). Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. arXiv preprint arXiv:1508.02096. Liu, L., Shang, J., Ren, X., Xu, F., Gui, H., Peng, J., Han, J. (2018). Empower Sequence Labeling with Task-Aware Neural Language Model. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1). Lo, K., Ramos, F., Rogo, R. (2017). Earnings Management and Annual Report Readability. Journal of Accounting and Economics, 63(1), 1-25. Loughran, T., McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65. Loughran, T., McDonald, B. (2014). Measuring Readability in Financial Disclosures. The Journal of Finance, 69(4), 1643-1671. Loughran, T., McDonald, B. (2016). Textual Analysis in Accounting and Finance: A Survey. Journal of Accounting Research, 54(4), 1187-1230. Lundholm, R. J., Rogo, R., Zhang, J. L. (2014). Restoring the Tower of Babel: How Foreign Firms Communicate with US Investors. The Accounting Review, 89(4), 1453-1485. Ma, X., Hovy, E. (2016). End-to-End Sequence Labeling via Bi-Directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. Miller, B. P. (2010). The Effects of Reporting Complexity on Small and Large Investor Trading. The Accounting Review, 85(6), 2107-2143. Muslu, V., Radhakrishnan, S., Subramanyam, K., Lim, D. (2015). Forward-Looking MD A Disclosures and the Information Environment. Management Science, 61(5), 931-948. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2002). Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311-318). Pascanu, R., Mikolov, T., Bengio, Y. (2013). On the Difficulty of Training Recurrent Neural Networks. In International Conference on Machine Learning (pp. 1310-1318). Plank, B., Søgaard, A., Goldberg, Y. (2016). Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss. arXiv preprint arXiv:1604.05529. Reimers, N., Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084. Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986). Learning Representations by Back-Propagating Errors. Nature, 323(6088), 533-536. Salton, G., Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing Management, 24(5), 513-523. Salton, G., Wong, A., Yang, C.-S. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11), 613-620. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958. Tamine, L., Soulier, L., Nguyen, G.-H., Souf, N. (2019). Offline Versus Online Representation Learning of Documents Using External Knowledge. ACM Transactions on Information Systems (TOIS), 37(4), 1-34. U.S. Securities and Exchange Commission. (2021). How to Read a 10-K/10-Q. Retrieved July 12, 2021 from https://www.sec.gov/fast-answers/answersreada10khtm.html Zhang, Y., Chen, H., Zhao, Y., Liu, Q., Yin, D. (2018). Learning Tag Dependencies for Sequence Tagging. In International Joint Conference on Artificial Intelligence (pp. 4581-4587). 陳妍秀 (2018)。財報項目全文的擷取和效能評估。國立臺灣大學資訊管理學研究所碩士論文,台北市。https://hdl.handle.net/11296/2nww2b | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81609 | - |
| dc.description.abstract | "目的:管理層討論與分析(MD A)是10-K年度報表中重要的項目之一,而每年MD A文字內容的修改,被用在許多研究上,包含評估公司的表現,股價預測等。然而,MD A 修改的前處理步驟,包含從10-K報表中擷取MD A,以及從擷取出的MD A中移除不想要的文字,仍然使用一些傳統的文字分析方法,而對MD A 修改的分析造成負面影響。除此之外,MD A 修改的呈現,無法完整考量文字語意,且經常以數值形式呈現,鮮少呈現實際上MD A修改的內容。 方法:本研究運用建立一個自然語言處理框架(EPSC)去分析MD A 的修改,包含項目擷取(Item Extraction)、項目修飾(Item Prettification)、基於文字語意的句子層級文件比較(SDDSC),以及運用分群方法(Clustering)探索MD A修改的傾向。我們的EPSC能解決先前研究在項目擷取、項目修飾和MD A修改呈現上的研究限制,並運用進階的自然語言處理技術,改善MD A修改的分析。我們的EPSC包含四個步驟,第一步是使用條件隨機場(Conditional Random Field, CRF)做10-K年度報表的項目擷取,第二步是用雙向長短期記憶模型(Bi-directional Long Short-Term Memory, Bi-LSTM)做10-K年度報表的項目修飾,第三步使用我們所設計的基於文字語意的句子層級文件比較的演算法(SDDSC),呈現每年詳細的MD A修改,而第四步使用K-平均演算法(K-Means Clustering)識別產業中MD A修改的傾向。 結果:我們的實驗結果顯示出,使用Bi-LSTM做項目修飾的表現比其他模型還要好。我們設計的SDDSC能夠基於不同的文字語意相似度之閥值,呈現詳細的MD A修改的資訊。除此之外,使用K-平均演算法能成功的識別產業內的MD A修改的傾向,並以離群中心相似度最高的前五個句子呈現此傾向。 結論:本研究採用進階的自然語言處理技術,改善MD A修改的分析。此外,我們的EPSC可以提供更詳細的MD A文字內容修改的內容,提供研究者和投資者有價值的資訊。未來,我們希望能增加項目擷取的人工標註資料以提升模型的表現,也希望將我們的SDDSC修改成非遞迴演算法,解決遞迴演算法的深度限制,並提升演算法的執行效率。" | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-24T09:24:42Z (GMT). No. of bitstreams: 1 U0001-2408202115393500.pdf: 2166818 bytes, checksum: 46c9bdfa9256e1581b2042af195ec2bf (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 誌謝 i 中文摘要 ii ABSTRACT iv TABLE OF CONTENTS vi LIST OF FIGURES x LIST OF TABLES xii Chapter 1 Introduction 1 Chapter 2 Literature Review 5 2.1 Text Analysis on MD A 6 2.1.1 Tone 6 2.1.2 Readability 7 2.1.3 MD A Modifications 8 2.1.3.1 Tone Changes 8 2.1.3.2 Overall Changes 9 2.1.4 Limitation of Text Analysis on MD A 9 2.2 Text Representation 10 2.2.1 Vector Space Model 11 2.2.1.1 Term Frequency 11 2.2.1.2 Term Frequency-Inverse Document Frequency (TF-IDF) 11 2.2.1.3 Properties of Vector Space Model 12 2.2.2 Distributed Representation 12 2.2.2.1 Latent Semantic Indexing (LSI) 13 2.2.2.2 Latent Dirichlet Allocation (LDA) 13 2.2.2.3 Word2Vec 15 2.2.2.4 Doc2Vec 16 2.2.2.5 Bidirectional Encoder Representations from Transformers (BERT) 16 2.2.2.6 Summary for Distributed Representation 17 2.3 Text Similarity 17 2.3.1 Euclidean Distance 17 2.3.2 Cosine Similarity 18 2.3.3 Jaccard Coefficient 18 2.3.4 Averaged KL Divergence 19 2.4 Sequence Labeling Model 19 2.4.1 Conditional Random Field (CRF) 20 2.4.2 Convolutional Neural Networks (CNN) 20 2.4.3 Long Short-Term Memory (LSTM) 22 2.4.4 Bidirectional Long Short-Term Memory (Bi-LSTM) 23 2.5 Summary for Literature Review 24 Chapter 3 Methodology 25 3.1 Overview of EPSC 25 3.2 Step 1: Item Extraction 26 3.2.1 Data Annotation for Item Extraction 27 3.3 Step 2: Item Prettification 29 3.3.1 Data Annotation for Item Prettification 30 3.3.2 Bi-LSTM Model Structure 33 3.3.3 Evaluation Metric 34 3.3.3.1 Bilingual Evaluation Understudy (BLEU) 34 3.4 Step 3: MD A Modification Detection 35 3.4.1 Transformation to Sentence Embedding 35 3.4.2 Sentence-Level Document Difference Based on Semantic Changes (SDDSC) 36 3.5 Step 4: MD A Modification Trend Exploration 41 3.5.1 K-Means Clustering 42 3.5.2 Process of MD A Modification Trend Exploration 43 Chapter 4 Experiment 44 4.1 Item Prettification 44 4.1.1 Annotated Dataset for Item Prettification 44 4.1.2 Experimental Setting of Bi-LSTM 45 4.1.3 Baseline Models and Others 46 4.1.4 Experiment Results 47 4.2 MD A Modification Detection 50 4.3 MD A Modification Trend Exploration 57 4.3.1 Experiment Setting on K-Means Clustering 62 4.3.2 Experimental Results 63 4.3.2.1 SIC 73 63 4.3.2.2 SIC 60 64 4.3.2.3 SIC 28 65 4.3.2.4 SIC 67 66 4.3.2.5 SIC 36 67 4.3.2.6 SIC 61 68 4.3.2.7 SIC 38 69 4.3.2.8 SIC 13 70 4.3.2.9 SIC 49 71 4.3.2.10 SIC 35 72 Chapter 5 Conclusion 74 REFERENCE 76 | |
| dc.language.iso | en | |
| dc.subject | BERT | zh_TW |
| dc.subject | 10-K報表 | zh_TW |
| dc.subject | 管理層討論與分析 | zh_TW |
| dc.subject | 管理層討論與分析的修改 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | CRF | zh_TW |
| dc.subject | Bi-LSTM | zh_TW |
| dc.subject | 10-K Reports | en |
| dc.subject | BERT | en |
| dc.subject | Bi-LSTM | en |
| dc.subject | CRF | en |
| dc.subject | Natural Language Processing | en |
| dc.subject | MD A Modifications | en |
| dc.subject | MD A | en |
| dc.title | 以自然語言處理方法分析年度報表中的管理層討論與分析的修改 | zh_TW |
| dc.title | A Novel Natural Language Processing Framework for Analyzing Management's Discussion and Analysis Modifications in 10-K Reports | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 簡宇泰(Hsin-Tsai Liu),顏如君(Chih-Yang Tseng) | |
| dc.subject.keyword | 10-K報表,管理層討論與分析,管理層討論與分析的修改,自然語言處理,CRF,Bi-LSTM,BERT, | zh_TW |
| dc.subject.keyword | 10-K Reports,MD A,MD A Modifications,Natural Language Processing,CRF,Bi-LSTM,BERT, | en |
| dc.relation.page | 82 | |
| dc.identifier.doi | 10.6342/NTU202102680 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2021-08-26 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-10-23 | - |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2408202115393500.pdf 未授權公開取用 | 2.12 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
