Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 文學院
  3. 語言學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93360
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor謝舒凱zh_TW
dc.contributor.advisorShu-Kai Hsiehen
dc.contributor.author張鈺琳zh_TW
dc.contributor.authorYu-Lin Changen
dc.date.accessioned2024-07-30T16:07:16Z-
dc.date.available2024-07-31-
dc.date.copyright2024-07-30-
dc.date.issued2024-
dc.date.submitted2024-07-25-
dc.identifier.citationAraci, D. (2019). Finbert: financial sentiment analysis with pre-trained language models. https://arxiv.org/abs/1908.10063
Barrasa, J., & Webber, J. (2023). Building knowledge graphs. O'Reilly Media Inc.
Biderman, S., Schoelkopf, H., Anthony, Q., Bradley, H., O’Brien, K., Hallahan, E., Khan, M. A., Purohit, S., Prashanth, U. S., Raff, E., Skowron, A., Sutawika, L., & van der Wal, O. (2023). Pythia: a suite for analyzing large language models across training and scaling.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. http://dx.doi.org/10.1162/tacl_a_00051
Camacho-Collados, J., & Pilehvar, M. T. (2020). Embeddings in natural language processing. In L. Specia & D. Beck (Eds.), Proceedings of the 28th international conference on computational linguistics: tutorial abstracts (pp. 10–15). International Committee for Computational Linguistics. https://aclanthology.org/2020.coling-tutorials.2
Cheng, D., Yang, F., Wang, X., Zhang, Y., & Zhang, L. (2020). Knowledge graph-based event embedding framework for financial quantitative investments. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2221–2230. https://doi.org/10.1145/3397271.3401427
Cheng, Z., Wu, L., Lukasiewicz, T., Sallinger, E., & Gottlob, G. (2022). Democratizing financial knowledge graph construction by mining massive brokerage research reports. In M. Ramanath & T. Palpanas (Eds.), Proceedings of the workshops of the EDBT/ICDT 2022 joint conference, edinburgh, uk, march 29, 2022. CEUR-WS.org. http://ceur-ws.org/Vol-3135/EcoFinKG%5C_ 2022%5C_paper5.pdf
Cimiano, P., & Paulheim, H. (2017). Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web, 8(3), 489–508. https://doi.org/10.3233/SW-160218
DeLong, L. N., Mir, R. F., & Fleuriot, J. D. (2024). Neurosymbolic ai for reasoning over knowledge graphs: a survey.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding.
Dou, J., Mao, H., Bao, R., Liang, P., Tan, X., Zhang, S., Jia, M., Zhou, P., & Mao, Z.-H. (2022). The measurement of knowledge in knowledge graphs.
Elhammadi, S., V.S. Lakshmanan, L., Ng, R., Simpson, M., Huai, B., Wang, Z., & Wang, L. (2020). A high precision pipeline for financial knowledge graph construction. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th international conference on computational linguistics (pp. 967–977). International Committee on Computational Linguistics. https://aclanthology.org/2020.coling-main.84
Gao, J., Li, X., Xu, E., Sisman, B., Dong, X. L., & Yang, J. (2019). Efficient knowledge graph accuracy evaluation. VLDB 2019. https://www.amazon.science/publications/efficient-knowledge-graph-accuracy-evaluation
Guo, Y., Xu, Z., & Yang, Y. (2023). Is chatgpt a financial expert? evaluating language models on financial natural language processing.
Issa, S., Adekunle, O., Hamdi, F., Cherfi, S. S.-S., Dumontier, M., & Zaveri, A. (2021). Knowledge graph completeness: a systematic literature review. IEEE Access, 9, 31322–31339.
Kalyan, K. S. (2023). A survey of gpt-3 family large language models including chatgpt and gpt-4. SSRN Electronic Journal.
Kejriwal, M. (2019). Domain-specific knowledge graph construction. Springer.
Kertkeidkachorn, N., Nararatwong, R., Xu, Z., & Ichise, R. (2023). Finkg: a core financial knowledge graph for financial analysis. 2023 IEEE 17th International Conference on Semantic Computing (ICSC), 90–93.
Lee, J., Stevens, N., Han, S. C., & Song, M. (2024). A survey of large language models in finance (finllms).
Lin, C.-Y. (2004). ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. https://aclanthology.org/W04-1013
Maia, M., Handschuh, S., Freitas, A., Davis, B., McDermott, R., Zarrouk, M., & Balahur, A. (2018). Www’18 open challenge: financial opinion mining and question answering. Companion Proceedings of the The Web Conference 2018, 1941–1942. https://doi.org/10.1145/3184558.3192301
Malo, P., Sinha, A., Takala, P., Korhonen, P., & Wallenius, J. (2013). Financialphrasebank- v1.0.
Miao, R., Zhang, X., Yan, H., & Chen, C. (2019). A dynamic financial knowledge graph based on reinforcement learning and transfer learning. 2019 IEEE International Conference on Big Data (Big Data), 5370–5378.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In P. Isabelle, E. Charniak, & D. Lin (Eds.), Proceedings of the 40th annual meeting of the association for computational linguistics (pp. 311–318). Association for Computational Linguistics. https://aclanthology.org/P02-1040
Pau, R. I., Mariam, N., Raheel, Q., Gaëtan, C., & Jingshu, L. (2023). Large language model adaptation for financial sentiment analysis.
Pujara, J. (2017). Extracting knowledge graphs from financial filings: extended abstract. Proceedings of the 3rd International Workshop on Data Science for Macro–Modeling with Financial and Economic Datasets. https://doi.org/10.1145/3077240.3077246
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In J. Su, K. Duh, & X. Carreras (Eds.), Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2383–2392). Association for Computational Linguistics. https://aclanthology.org/D16-1264
Serrano, S., Brumbaugh, Z., & Smith, N. A. (2023). Language models: a guide for the perplexed.
Song, D., Schilder, F., Hertz, S., Saltini, G., Smiley, C., Nivarthi, P., Hazai, O., Landau, D., Zaharkin, M., Zielund, T., Molina-Salgado, H., Brew, C., & Bennett, D. (2019). Building and querying an enterprise knowledge graph. IEEE Transactions on Services Computing, 12(3), 356–369.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). Llama: open and efficient foundation language models.
TSMC. (2023). Taiwan semiconductor manufacturing company, 2022 annual report. https://investor.tsmc.com/sites/ir/annual-report/2022/2022_Business_Overview_E.pdf
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., & Wilkins, D. (2010). A comparison of a graph database and a relational database: a data provenance perspective. The 48th ACM Southeast Conference, 10, 42.
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. (2018). GLUE: a multi-task benchmark and analysis platform for natural language understanding. In T. Linzen, G. Chrupała, & A. Alishahi (Eds.), Proceedings of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP (pp. 353–355). Association for Computational Linguistics. https://aclanthology.org/W18-5446
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., & Mann, G. (2023). Bloomberggpt: a large language model for finance.
Xie, Y., Aggarwal, K., & Ahmad, A. (2023). Efficient continual pre-training for building domain specific large language models.
Yang, H., Liu, X.-Y., & Wang, C. D. (2023). Fingpt: open-source financial large language models.
Yang, Y., Tang, Y., & Tam, K. Y. (2023). Investlm: a large language model for investment using financial domain instruction tuning.
Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L., & Levy, O. (2023). Lima: less is more for alignment.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93360-
dc.description.abstract自然語言處理(NLP)和金融數據分析的交匯處代表了一個新興的研究領域,這主要由非結構化金融數據的指數級增長以及對精確解讀這些資訊的複雜工具需求所驅動。本論文探討了大型語言模型(LLMs)和知識圖譜的整合,作為增強金融文件分析的新方法。這項研究的動機源於金融分析不斷演變的格局,越來越依賴於對文本數據的細微解釋來協助決策過程。

論文強調了 LLMs 的作用,它們在這一典範轉移中發揮了重要作用,實現了更準確的情感分析、欺詐檢測和自動化財務報告。文獻綜述追溯了 NLP 應用在金融領域的發展,突顯 LLMs 在從金融文本中提取有意義洞察的關鍵作用。文獻回顧也介紹了知識圖譜的概念,作為一種構建和增強金融數據可解釋性的方法,為後續的實證研究奠定了基礎。

本論文的實證部分提供了一個案例研究,展示在金融數據分析中結合 LLMs 和知識圖譜的實際應用。這部分說明了如何將理論概念應用於解決現實世界的挑戰,重點關注將非結構化數據轉換為結構化格式(即知識圖譜),從而促進更深入的分析和解釋。討論部分深入探討了這種轉換的影響,特別是資訊損失的問題以及減輕這種損失的策略。本部分探討了如何在金融語境中保持原始數據的語義完整性對於準確分析和決策制定的重要性。

論文通過綜合文獻綜述、方法論和案例研究得出的見解作為結論,反思了整合 LLMs 和知識圖譜以革新金融數據分析的潛力,提供了對金融市場更細緻和全面的理解。這項研究為金融領域高級 NLP 技術應用的持續討論做出了貢獻,提出未來可進行研究的方向,並強調在開發和部署這些技術時考慮倫理因素的重要性。
zh_TW
dc.description.abstractThe intersection of Natural Language Processing (NLP) and financial data analysis represents a burgeoning field of study, driven by the exponential growth of unstructured financial data and the need for sophisticated tools to interpret this information accurately. This thesis explores the integration of Large Language Models (LLMs) and knowledge graphs as a novel approach to enhance the analysis of financial documents. The motivation behind this research stems from the evolving landscape of financial analysis, which increasingly relies on the nuanced interpretation of textual data to inform decision-making processes.

The thesis emphasizes the role of LLMs which have been instrumental in this paradigm shift, enabling more accurate sentiment analysis, fraud detection, and automated financial reporting. A review of the literature traces the development of NLP applications within the financial domain, highlighting the critical role of LLMs in extracting meaningful insights from financial texts. This section also introduces the concept of knowledge graphs as a means to structure and enhance the interpretability of financial data, providing a foundation for the subsequent empirical investigation. The empirical component of the thesis presents a case study that exemplifies the practical application of combining LLMs with knowledge graphs in financial data analysis. This section illustrates how theoretical concepts can be applied to address real-world challenges, focusing on the transformation of unstructured data into structured formats, i.e. knowledge graphs, that facilitate deeper analysis and interpretation. The discussion section delves into the implications of this transformation, particularly the issue of information loss and strategies to mitigate it. It explores how maintaining the semantic integrity of the original data is crucial for accurate analysis and decision-making in financial contexts.

The thesis concludes by synthesizing the insights gained from the literature review, methodology, and case study. It reflects on the potential of integrating LLMs with knowledge graphs to revolutionize financial data analysis, offering a more nuanced and comprehensive understanding of financial markets. This research contributes to the ongoing discourse on the application of advanced NLP techniques in finance, suggesting directions for future inquiry and highlighting the importance of ethical considerations in the development and deployment of these technologies.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-30T16:07:15Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-07-30T16:07:16Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝 i
摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables xi
1 Introduction 1
1.1 Background and Motivation ....................... 1
1.2 Organization of the Thesis........................ 4
2 Literature Review 5
2.1 Large Language Models (LLMs)..................... 6
2.2 LLMs in the Financial Domain ..................... 8
2.2.1 Continual pre-training, fine-tune and RAG in finance domain 8
2.2.2 Evaluations of Fin-LLMs..................... 13
2.3 Knowledge Graphs(KGs) ........................ 14
2.4 Financial KGs............................... 18
3 Research Methods 21
3.1 Financial Statements........................... 21
3.2 Graph Database ............................. 24
3.2.1 Neo4j ............................... 26
3.3 LangChainFramework.......................... 27
4 Case Study 29
4.1 SingleKG................................. 29
4.2 Schema Selection ............................. 31
4.2.1 Claude 3 Sonnet ......................... 33
4.2.2 GPT-4............................... 36
4.2.3 Interim Summary......................... 39
4.3 Schema to KGs.............................. 40
4.4 Query via LangChain........................... 40
4.5 Demo: Neo4j Knowledge Graph..................... 43
5 Discussion 45
5.1 Evaluation................................. 45
5.2 Information loss.............................. 49
5.3 Limitations................................ 52
6 Conclusion 55
Appendix A TSMC Balance Sheet 57
References 59
-
dc.language.isoen-
dc.subject大型語言模型zh_TW
dc.subject知識圖譜zh_TW
dc.subjectLangChainzh_TW
dc.subjectFinNLPzh_TW
dc.subjectLarge Language Modelsen
dc.subjectKnowledge Graphsen
dc.subjectLangChainen
dc.subjectFinNLPen
dc.title大型語言模型與知識圖譜在財經文件上之應用zh_TW
dc.titleAn Application of Large Language Models and Knowledge Graphs in Financial Documentsen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee謝吉隆;張瑜芸zh_TW
dc.contributor.oralexamcommitteeJi-Lung Hsieh;Yu-Yun Changen
dc.subject.keyword大型語言模型,知識圖譜,LangChain,FinNLP,zh_TW
dc.subject.keywordLarge Language Models,Knowledge Graphs,LangChain,FinNLP,en
dc.relation.page63-
dc.identifier.doi10.6342/NTU202401582-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-07-26-
dc.contributor.author-college文學院-
dc.contributor.author-dept語言學研究所-
顯示於系所單位:語言學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
5.41 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved