Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 社會科學院
  3. 經濟學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94218
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊睿中zh_TW
dc.contributor.advisorJui-Chung Yangen
dc.contributor.author葉浩霖zh_TW
dc.contributor.authorHao-Lin Yehen
dc.date.accessioned2024-08-15T16:16:53Z-
dc.date.available2024-08-16-
dc.date.copyright2024-08-15-
dc.date.issued2024-
dc.date.submitted2024-08-05-
dc.identifier.citationBengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin, “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, March 2003, 3, 1137–1155.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 2003, 3, 993–1022.
Campello, Ricardo J. G. B., Davoud Moulavi, and Joerg Sander, “Density-Based Clustering Based on Hierarchical Density Estimates,” Advances in Knowledge Discovery and Data Mining, 2013, pp. 160–172.
Chen, Kuan Chen, Chung I Lin, and Hong Ming Chen, “Relationship between News Sentiment Indicator and the Taiwan Weighted Stock Index,” Journal of Social Sciences and Philosophy, 2021, 33 (3), 383–423.
Churchill, Rob and Lisa Singh, “The Evolution of Topic Modeling,” ACM Computing Surveys, 2022, 54 (10s), 1–35.
Cowles, Alfred, “Can Stock Market Forecasters Forecast?,” Econometrica, 1933, 1 (3), 309–324
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, 1990, 41 (6), 391–407.
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), 1977, 39 (1), 1–38.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2019
Dieng, Adji B., Francisco J. R. Ruiz, and David M. Blei, “Topic Modeling in Embedding Spaces,” Transactions of the Association for Computational Linguistics, 2020, 8, 439–453.
Faccini, Renato, Rastin Matin, and George Skiadopoulos, “Dissecting Climate Risks: Are They Reflected in Stock Prices?,” Journal of Banking Finance, 2023, 155, 106948.
Friedman, Jerome H., Trevor Hastie, and Rob Tibshirani, “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 2010, 33 (1), 1–22.
Grootendorst, Maarten, “BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure,” arXiv preprint arXiv:2203.05794, 2022
Gu, Shihao, Bryan Kelly, and Dacheng Xiu, “Empirical Asset Pricing via Machine Learning,” The Review of Financial Studies, February 2020, 33 (5), 2223–2273
Hochreiter, Sepp and Jürgen Schmidhuber, “Long Short-Term Memory,” Neural Computation, 11 1997, 9 (8), 1735–1780.
Hofmann, Thomas, “Probabilistic Latent Semantic Indexing,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 50–57.
Jiang, Fuwei, Joshua Lee, Xiumin Martin, and Guofu Zhou, “Manager Sentiment and Stock Returns,” Journal of Financial Economics, 2019, 132 (1), 126–149
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul, “An Introduction to Variational Methods for Graphical Models,” Machine Learning, 1999, 37, 183–233.
Kingma, Diederik P. and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2017.
Ku, Lun‐Wei and Hsin‐Hsi Chen, “Mining Opinions from the Web: Beyond Relevance Retrieval,” Journal of the American Society for Information Science and Technology, October 2007, 58 (12), 1838–1850.
Kullback, Solomon and Richard A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, 1951, 22 (1), 79–86.
Li, Peng-Hsuan, Tsu-Jui Fu, and Wei-Yun Ma, “Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER,” arXiv preprint arXiv:1908.11046, 2020.
Lin, Chenghua and Yulan He, “Joint Sentiment/Topic model for Sentiment Analysis,” Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009, pp. 375––384.
Loughran, Tim and Bill Mcdonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks,” Journal of Finance, February 2011, 66 (1), 35–65.
McInnes, Leland, John Healy, and James Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv preprint arXiv:1802.03426, 2020.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv preprint arXiv:1301.3781, 2013.
Prim, Robert C., “Shortest Connection Networks and Some Generalizations,” The Bell System Technical Journal, 1957, 36 (6), 1389–1401.
Rahmadeyan, Akhas and Mustakim, “Long Short-Term Memory and Gated Recurrent Unit for Stock Price Prediction,” Procedia Computer Science, 2024, 234, 204–212. Seventh Information Systems International Conference (ISICO 2023).
Reimers, Nils and Iryna Gurevych, “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, November 2019.
Röder, Michael, Andreas Both, and Alexander Hinneburg, “Exploring the Space of Topic Coherence Measures,” WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, February 2015, pp. 399–408.
Sharpe, William F., “The Sharpe Ratio,” The Best of The Journal of Portfolio Management, 1998, pp. 169–178.
Tang, Wenjin, Hui Bu, Yuan Zuo, and Junjie Wu, “Unlocking the Power of the Topic Content in News Headlines: BERTopic for Predicting Chinese Corporate Bond Defaults,” Finance Research Letters, 2024, 62, 105062.
Tetlock, Paul C., “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” The Journal of Finance, 2007, 62 (3), 1139–1168.
Tibshirani, Robert, “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 1996, 58 (1), 267–288.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention Is All You Need,” arXiv preprint arXiv:1706.03762, 2017.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94218-
dc.description.abstract本研究以「台積電」為關鍵字,選取2016年至2023年間自由時報電子報和中時新聞網的財經新聞,其中2016年至2021年之新聞為訓練資料,2022年至2023年為測試資料。經自動化篩選和清理後,使用潛在狄利克里分配(Latent Dirichlet Allocation,LDA)(Blei et al., 2003)和BERTopic(Grootendorst, 2022)模型進行新聞分類,並結合情緒分數,通過迴歸分析和最小絕對壓縮挑選運算子(Least Absolute Shrinkage and Selection Operator,LASSO)(Tibshirani, 1996)迴歸找出顯著主題類別,進而使用長短期記憶(Long Short-Term Memory,LSTM)(Hochreiter and Schmidhuber, 1997)模型訓練和預測台積電隔日股票一日報酬(收盤價與開盤價差距),並設計交易策略以評估不同方法的交易效果。研究結果顯示,使用BERTopic分類財經新聞,並通過迴歸和LASSO迴歸選取顯著主題後,再以LSTM進行訓練和預測,其交易結果最佳,投資報酬率達56%。本研究證明BERTopic能有效處理自動化篩選的新聞資料,並結合傳統迴歸和深度學習方法,成功應用於股票交易策略上;相比之下,雖然LDA能進行新聞分類,但無法用自動化篩選的新聞資料預測股票報酬的漲跌。zh_TW
dc.description.abstractThis study focuses on "TSMC" as the keyword, selecting financial news from the Liberty Times and China Times News Network from 2016 to 2023. The news from 2016 to 2021 is used as training data, while the news from 2022 to 2023 is used as test data. After automated screening and cleaning, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and BERTopic (Grootendorst, 2022) models are employed for news classification. Combining sentiment scores, significant topic categories are identified through regression analysis and Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996) regression. An Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) model is then used to train and predict TSMC's next-day stock return (the difference between closing and opening prices). A trading strategy is designed to evaluate the performance of different methods. The results show that using BERTopic to classify financial news, selecting significant topics through regression and LASSO regression, and then training and predicting with LSTM yields the best trading results, with an investment return rate of 56%. This study demonstrates that BERTopic can effectively handle relatively coarse news data and, combined with traditional regression and deep learning methods, can be successfully applied to stock trading strategies. In contrast, while LDA can classify news, it cannot predict stock returns' rise and fall using automatically screened news data.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:16:53Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-08-15T16:16:53Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
致謝 ii
摘要 iii
Abstract iv
目錄 vi
圖目錄 ix
表目錄 x
第一章 前言 1
第二章 文獻回顧 4
2.1 主題模型(topic model)發展 4
2.2 結合文字探勘技術之金融市場相關研究 6
第三章 主題模型介紹-潛在狄利克里分配(Latent Dirichlet Allocation,LDA與BERTopic 8
3.1 潛在狄利克里分配(Latent Dirichlet Allocation,LDA) 8
3.1.1 潛在狄利克里分配(Latent Dirichlet Allocation,LDA)聯合機率分配 8
3.1.2 推論後驗分配機率 11
3.2 BERTopic 14
3.2.1 SBERT(Sentence-BERT) 15
3.2.2 均勻流形逼近及投影降維法(Uniform Manifold Approximation and Projection,UMAP)19
3.2.3 基於密度之含噪空間階層聚類法(Hierarchical Density-Based Spatial Clustering of Applications with Noise,HDBSCAN) 23
3.2.4 基於類別的詞頻與逆文件頻率(class-based term frequency–inverse document frequency,c-TF-IDF) 25
第四章 研究方法 27
4.1 文字資料說明與處理 27
4.1.1 新聞篩選過程 27
4.1.2 內文清理過程 28
4.1.3 斷詞 29
4.1.4 停用詞(stopwords) 29
4.2 情緒分數 30
4.3 資料切分及主題模型參數設定 31
4.4 股票資料 31
4.5 實證模型設定 32
4.6 深度學習模型:長短期記憶(Long Short-Term Memory,LSTM) 34
4.7 投資策略 40
第五章 實證結果 42
5.1 模型分類結果 42
5.1.1 潛在狄利克里分配(Latent Dirichlet Allocation,LDA)模型分類結果 42
5.1.2 BERTopic 模型分類結果 43
5.2 迴歸結果 43
5.2.1 模型一與模型二 43
5.2.2 模型三與模型四 44
5.3 預測結果 47
5.4 投資策略績效 49
第六章 結果與未來展望 53
6.1 結論 53
6.2 未來展望 54
參考文獻 55
附錄 A — 表格 60
-
dc.language.isozh_TW-
dc.title使用 LDA 和 BERTopic 模型分類財經新聞並預測股票報酬-以台積電為例zh_TW
dc.titleUsing LDA and BERTopic Models to Classify Financial News and Predict Stock Returns - Evidence from TSMCen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳由常;林昌平zh_TW
dc.contributor.oralexamcommitteeYu-Chang Chen;Chang-Ping Linen
dc.subject.keywordBERTopic,潛在狄利克里分配,主題模型,股票報酬預測,台積電,zh_TW
dc.subject.keywordBERTopic,LDA,topic model,stock return prediction,TSMC,en
dc.relation.page67-
dc.identifier.doi10.6342/NTU202403036-
dc.rights.note未授權-
dc.date.accepted2024-08-07-
dc.contributor.author-college社會科學院-
dc.contributor.author-dept經濟學系-
顯示於系所單位:經濟學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
  目前未授權公開取用
4.16 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved