Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92361
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor盧信銘zh_TW
dc.contributor.advisorHsin-Min Luen
dc.contributor.author吳琦艾zh_TW
dc.contributor.authorChi-Ai Wuen
dc.date.accessioned2024-03-21T16:47:44Z-
dc.date.available2024-10-31-
dc.date.copyright2024-03-21-
dc.date.issued2023-
dc.date.submitted2023-10-05-
dc.identifier.citationAnderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6):490–496.
Biddle, G. C., Hilary, G., and Verdi, R. S. (2009). How does financial reporting quality relate to investment efficiency? Journal of Accounting and Economics, 48:112–131.
Björnsson, C.-H. (1968). Lesbarkeit durch Lix. Pedagogiskt Centrum.
Blankespoor, E. (2019). The impact of information processing costs on firm disclosure choice: Evidence from the xbrl mandate. Journal of Accounting Research, 57:919–967.
Bonsall, S. B., Leone, A. J., Miller, B. P., and Rennekamp, K. (2017). A plain english measure of financial reporting readability. Journal of Accounting and Economics, 63:329–357.
Chen, Y. H. (2018). Item extraction for annual financial report: Annotation and evaluation. Master’s thesis, National Taiwan University.
Chuang, Y. H. (2021). A novel natural language processing framework for analyzing management’s discussion and analysis modifications in 10-K reports. Master’s thesis, National Taiwan University.
Cohen, L., Malloy, C., and Nguyen, Q. (2020). Lazy prices. The Journal of Finance, 75:1371–1415.
Coleman, M. and Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283.
Craja, P., Kim, A., and Lessmann, S. (2020). Deep learning for detecting financial statement fraud. Decision Support Systems, 139:113421.
Dyer, T., Lang, M., and Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics, 64:221–245.
Feldman, R., Govindaraj, S., Livnat, J., and Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15:915–953.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3):221.
Griffin, P. A. (2003). Got information? investor response to form 10-K and form 10-Q edgar filings. Review of Accounting Studies, 8:433–460.
Jegadeesh, N. and Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110:712–729.
Lawrence, A. (2013). Individual investors and financial disclosure. Journal of Accounting and Economics, 56:130–147.
Lehavy, R., Li, F., and Merkley, K. (2011). The effect of annual report readability on analyst following and the properties of their earnings forecasts. The Accounting Review, 86:1087–1115.
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45:221–247.
Li, F. (2010). The information content of forward-looking statements in corporate filings—a naïve bayesian machine learning approach. Journal of Accounting Research, 48:1049–1102.
Loughran, T. and Mcdonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66:35–65.
Loughran, T. and Mcdonald, B. (2014). Measuring readability in financial disclosures. Journal of Finance, 69:1643-1671.
Loughran, T. and Mcdonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54:1187–1230.
Mc Laughlin, G. H. (1969). Smog grading—a new readability formula. Journal of Reading, 12(8):639–646.
Miller, B. P. (2010). The effects of reporting complexity on small and large investor trading. The Accounting Review, 85:2107–2143.
Robert, G. (1952). The Technique of Clear Writing. McGraw-Hill.
Senter, R. and Smith, E. A. (1967). Automated readability index. Technical report, DTIC document.
Smith, M. and Taffler, R. (1992). Readability and understandability: Different measures of the textual complexity of accounting narrative. Accounting, Auditing & Accountability Journal, 5:84–98.
You, H. F. and Zhang, X. J. (2009). Financial reporting complexity and investor underreaction to 10-K information. Review of Accounting Studies, 14:559–586.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92361-
dc.description.abstract隨著10-K文本分析的興起,如何準確地計算可讀性分數,以及哪種10-K文件的文本清理方法最有效,已變得至關重要。我們提出了基於機器學習的文本整理方法(LTT-Better),這個方法利用Bi-LSTM模型清理文本以利進行可讀性的計算。大多數可讀性公式假設文本內僅包含完整的句子,不包括標題或頁碼,很少原始10-K文件可以滿足這樣的條件。LTT-Better使用Bi-LSTM刪除10-K中不必要的字符,減少干擾,提高文本清理的品質。當使用LTT-Better代替傳統的文本清理時,大多數可讀性在統計上更接近人工清理的10-K報告。我們的研究進一步使用了1994年至2022年的10-K進行實證研究,調查可讀性引起的資訊不確定性是否能影響10-K提交日期後的股價波動。我們的實驗結果顯示,與傳統基於規則的文本清理相比,LTT-Better的可讀性在大多數情況下達到了更高的t分數。此外,當迴歸模型包含傳統文本清理的Fog指數和LTT-Better Fog指數時,兩者都具有顯著性,其中LTT-Better Fog指數的t分數更高。我們的研究結果顯示,當研究需要清理10-K報告以進行可讀性分析時,LTT-Better是一種有效的方法。未來的研究應在分析其語言特徵之前,將此清理方法應用於10-K文件。此外,我們向研究人員提供了關於使用不同文本清理方式後,應使用哪些可讀性公式的建議。zh_TW
dc.description.abstractWith the growth of 10-K text analysis, it becomes essential to determine how to reliably compute readability scores and what text preparation method for 10-K files is effective. We propose the Better Learning-Based Text Tidying (LTT-Better) approach that leverages Bi-LSTM models in preparing text for readability computation. Most readability measures assume correct sentence boundaries and text chunks without headings or dangling page numbers. These conditions are rarely satisfied in the original 10-K files. LTT-Better uses Bi-LSTM to remove unnecessary text chunks to reduce the noise and improve text preparation and text analysis using 10-K reports. When LTT-Better is used instead of the traditional rule-based preparation, the majority of the readabilities are shown to be statistically closer to the readabilities of human-prepared 10-Ks. Our research further conducts empirical models that investigate whether readability-induced information uncertainty can contribute to stock price volatility after the filing date using 10-Ks from 1994 to 2022. Our empirical results show that, compared to rule-based text preparation, readability from LTT-Better achieved a higher t-value in most cases. Moreover, when the regression models contain both the rule-based Fog index and LTT-Better Fog index, both are significant, with the LTT-Better Fog index achieving a higher t-value. Our findings suggest that LTT-Better is a promising approach to preparing 10-K reports for readability analysis. Future research should apply such an approach to 10-Ks before analyzing their linguistic attributes. Moreover, we give researchers helpful direction on what readability measurements should be used in future research.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-21T16:47:44Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-03-21T16:47:44Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝 i
中文摘要 ii
Abstract iii
List of Figures vii
List of Tables viii
1 Introduction 1
2 Literature Review 4
2.1 Text Analysis of Financial Report . . . . . . . . . . . . . . . . . 4
2.2 Readability Measures . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Preparing Financial Reports for Text Analysis . . . . . . . . . . . 9
3 Methodology 12
3.1 Research Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Text Preparation Approaches . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Ruled-Based approach (RB) . . . . . . . . . . . . . . . . 15
3.2.2 Learning-Based Text Tidying (LTT) and Better LearningBased Text Tidying (LTT-Better) . . . . . . . . . . . . . . 15
3.3 Reliable Readability . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.1 First Experiment: Paired t-Test . . . . . . . . . . . . . . . 17
3.4.2 Second Experiment: Regression . . . . . . . . . . . . . . 19
4 Experimental Results 22
4.1 Summary Statistis . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Paired t-Test Result . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Conclusion 46
Reference 48
A Summary Statistics of Reproduced Regression 51
B Assumptions for Statistical Tests 53
-
dc.language.isoen-
dc.subject文本分析zh_TW
dc.subject文本清理zh_TW
dc.subject機器學習zh_TW
dc.subject可讀性zh_TW
dc.subject財務報表zh_TW
dc.subject10-Kzh_TW
dc.subjectText Analysisen
dc.subjectReadabilityen
dc.subject10-Ken
dc.subjectBi-LSTMen
dc.subjectText Preparationen
dc.title建構可靠的10-K財報可讀性衡量法-利用機器學習的文本清理減少可讀性中的雜訊zh_TW
dc.titleReliable Readability for 10-K Reports: Reducing Noise in Readability by Learning-Based Text Tidyingen
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee張景宏;簡宇泰zh_TW
dc.contributor.oralexamcommitteeChing-Hung Chang;Yu-Tai Chienen
dc.subject.keyword10-K,財務報表,可讀性,文本分析,文本清理,機器學習,zh_TW
dc.subject.keyword10-K,Readability,Text Analysis,Text Preparation,Bi-LSTM,en
dc.relation.page59-
dc.identifier.doi10.6342/NTU202304256-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-10-11-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-lift2024-10-31-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
924.94 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved