請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92361完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 盧信銘 | zh_TW |
| dc.contributor.advisor | Hsin-Min Lu | en |
| dc.contributor.author | 吳琦艾 | zh_TW |
| dc.contributor.author | Chi-Ai Wu | en |
| dc.date.accessioned | 2024-03-21T16:47:44Z | - |
| dc.date.available | 2024-10-31 | - |
| dc.date.copyright | 2024-03-21 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-10-05 | - |
| dc.identifier.citation | Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6):490–496.
Biddle, G. C., Hilary, G., and Verdi, R. S. (2009). How does financial reporting quality relate to investment efficiency? Journal of Accounting and Economics, 48:112–131. Björnsson, C.-H. (1968). Lesbarkeit durch Lix. Pedagogiskt Centrum. Blankespoor, E. (2019). The impact of information processing costs on firm disclosure choice: Evidence from the xbrl mandate. Journal of Accounting Research, 57:919–967. Bonsall, S. B., Leone, A. J., Miller, B. P., and Rennekamp, K. (2017). A plain english measure of financial reporting readability. Journal of Accounting and Economics, 63:329–357. Chen, Y. H. (2018). Item extraction for annual financial report: Annotation and evaluation. Master’s thesis, National Taiwan University. Chuang, Y. H. (2021). A novel natural language processing framework for analyzing management’s discussion and analysis modifications in 10-K reports. Master’s thesis, National Taiwan University. Cohen, L., Malloy, C., and Nguyen, Q. (2020). Lazy prices. The Journal of Finance, 75:1371–1415. Coleman, M. and Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283. Craja, P., Kim, A., and Lessmann, S. (2020). Deep learning for detecting financial statement fraud. Decision Support Systems, 139:113421. Dyer, T., Lang, M., and Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics, 64:221–245. Feldman, R., Govindaraj, S., Livnat, J., and Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15:915–953. Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3):221. Griffin, P. A. (2003). Got information? investor response to form 10-K and form 10-Q edgar filings. Review of Accounting Studies, 8:433–460. Jegadeesh, N. and Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110:712–729. Lawrence, A. (2013). Individual investors and financial disclosure. Journal of Accounting and Economics, 56:130–147. Lehavy, R., Li, F., and Merkley, K. (2011). The effect of annual report readability on analyst following and the properties of their earnings forecasts. The Accounting Review, 86:1087–1115. Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45:221–247. Li, F. (2010). The information content of forward-looking statements in corporate filings—a naïve bayesian machine learning approach. Journal of Accounting Research, 48:1049–1102. Loughran, T. and Mcdonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66:35–65. Loughran, T. and Mcdonald, B. (2014). Measuring readability in financial disclosures. Journal of Finance, 69:1643-1671. Loughran, T. and Mcdonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54:1187–1230. Mc Laughlin, G. H. (1969). Smog grading—a new readability formula. Journal of Reading, 12(8):639–646. Miller, B. P. (2010). The effects of reporting complexity on small and large investor trading. The Accounting Review, 85:2107–2143. Robert, G. (1952). The Technique of Clear Writing. McGraw-Hill. Senter, R. and Smith, E. A. (1967). Automated readability index. Technical report, DTIC document. Smith, M. and Taffler, R. (1992). Readability and understandability: Different measures of the textual complexity of accounting narrative. Accounting, Auditing & Accountability Journal, 5:84–98. You, H. F. and Zhang, X. J. (2009). Financial reporting complexity and investor underreaction to 10-K information. Review of Accounting Studies, 14:559–586. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92361 | - |
| dc.description.abstract | 隨著10-K文本分析的興起,如何準確地計算可讀性分數,以及哪種10-K文件的文本清理方法最有效,已變得至關重要。我們提出了基於機器學習的文本整理方法(LTT-Better),這個方法利用Bi-LSTM模型清理文本以利進行可讀性的計算。大多數可讀性公式假設文本內僅包含完整的句子,不包括標題或頁碼,很少原始10-K文件可以滿足這樣的條件。LTT-Better使用Bi-LSTM刪除10-K中不必要的字符,減少干擾,提高文本清理的品質。當使用LTT-Better代替傳統的文本清理時,大多數可讀性在統計上更接近人工清理的10-K報告。我們的研究進一步使用了1994年至2022年的10-K進行實證研究,調查可讀性引起的資訊不確定性是否能影響10-K提交日期後的股價波動。我們的實驗結果顯示,與傳統基於規則的文本清理相比,LTT-Better的可讀性在大多數情況下達到了更高的t分數。此外,當迴歸模型包含傳統文本清理的Fog指數和LTT-Better Fog指數時,兩者都具有顯著性,其中LTT-Better Fog指數的t分數更高。我們的研究結果顯示,當研究需要清理10-K報告以進行可讀性分析時,LTT-Better是一種有效的方法。未來的研究應在分析其語言特徵之前,將此清理方法應用於10-K文件。此外,我們向研究人員提供了關於使用不同文本清理方式後,應使用哪些可讀性公式的建議。 | zh_TW |
| dc.description.abstract | With the growth of 10-K text analysis, it becomes essential to determine how to reliably compute readability scores and what text preparation method for 10-K files is effective. We propose the Better Learning-Based Text Tidying (LTT-Better) approach that leverages Bi-LSTM models in preparing text for readability computation. Most readability measures assume correct sentence boundaries and text chunks without headings or dangling page numbers. These conditions are rarely satisfied in the original 10-K files. LTT-Better uses Bi-LSTM to remove unnecessary text chunks to reduce the noise and improve text preparation and text analysis using 10-K reports. When LTT-Better is used instead of the traditional rule-based preparation, the majority of the readabilities are shown to be statistically closer to the readabilities of human-prepared 10-Ks. Our research further conducts empirical models that investigate whether readability-induced information uncertainty can contribute to stock price volatility after the filing date using 10-Ks from 1994 to 2022. Our empirical results show that, compared to rule-based text preparation, readability from LTT-Better achieved a higher t-value in most cases. Moreover, when the regression models contain both the rule-based Fog index and LTT-Better Fog index, both are significant, with the LTT-Better Fog index achieving a higher t-value. Our findings suggest that LTT-Better is a promising approach to preparing 10-K reports for readability analysis. Future research should apply such an approach to 10-Ks before analyzing their linguistic attributes. Moreover, we give researchers helpful direction on what readability measurements should be used in future research. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-21T16:47:44Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-03-21T16:47:44Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 致謝 i
中文摘要 ii Abstract iii List of Figures vii List of Tables viii 1 Introduction 1 2 Literature Review 4 2.1 Text Analysis of Financial Report . . . . . . . . . . . . . . . . . 4 2.2 Readability Measures . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Preparing Financial Reports for Text Analysis . . . . . . . . . . . 9 3 Methodology 12 3.1 Research Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Text Preparation Approaches . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Ruled-Based approach (RB) . . . . . . . . . . . . . . . . 15 3.2.2 Learning-Based Text Tidying (LTT) and Better LearningBased Text Tidying (LTT-Better) . . . . . . . . . . . . . . 15 3.3 Reliable Readability . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.1 First Experiment: Paired t-Test . . . . . . . . . . . . . . . 17 3.4.2 Second Experiment: Regression . . . . . . . . . . . . . . 19 4 Experimental Results 22 4.1 Summary Statistis . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Paired t-Test Result . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . 35 5 Conclusion 46 Reference 48 A Summary Statistics of Reproduced Regression 51 B Assumptions for Statistical Tests 53 | - |
| dc.language.iso | en | - |
| dc.subject | 文本分析 | zh_TW |
| dc.subject | 文本清理 | zh_TW |
| dc.subject | 機器學習 | zh_TW |
| dc.subject | 可讀性 | zh_TW |
| dc.subject | 財務報表 | zh_TW |
| dc.subject | 10-K | zh_TW |
| dc.subject | Text Analysis | en |
| dc.subject | Readability | en |
| dc.subject | 10-K | en |
| dc.subject | Bi-LSTM | en |
| dc.subject | Text Preparation | en |
| dc.title | 建構可靠的10-K財報可讀性衡量法-利用機器學習的文本清理減少可讀性中的雜訊 | zh_TW |
| dc.title | Reliable Readability for 10-K Reports: Reducing Noise in Readability by Learning-Based Text Tidying | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 張景宏;簡宇泰 | zh_TW |
| dc.contributor.oralexamcommittee | Ching-Hung Chang;Yu-Tai Chien | en |
| dc.subject.keyword | 10-K,財務報表,可讀性,文本分析,文本清理,機器學習, | zh_TW |
| dc.subject.keyword | 10-K,Readability,Text Analysis,Text Preparation,Bi-LSTM, | en |
| dc.relation.page | 59 | - |
| dc.identifier.doi | 10.6342/NTU202304256 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2023-10-11 | - |
| dc.contributor.author-college | 管理學院 | - |
| dc.contributor.author-dept | 資訊管理學系 | - |
| dc.date.embargo-lift | 2024-10-31 | - |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-1.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 924.94 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
