請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94218
標題: | 使用 LDA 和 BERTopic 模型分類財經新聞並預測股票報酬-以台積電為例 Using LDA and BERTopic Models to Classify Financial News and Predict Stock Returns - Evidence from TSMC |
作者: | 葉浩霖 Hao-Lin Yeh |
指導教授: | 楊睿中 Jui-Chung Yang |
關鍵字: | BERTopic,潛在狄利克里分配,主題模型,股票報酬預測,台積電, BERTopic,LDA,topic model,stock return prediction,TSMC, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 本研究以「台積電」為關鍵字,選取2016年至2023年間自由時報電子報和中時新聞網的財經新聞,其中2016年至2021年之新聞為訓練資料,2022年至2023年為測試資料。經自動化篩選和清理後,使用潛在狄利克里分配(Latent Dirichlet Allocation,LDA)(Blei et al., 2003)和BERTopic(Grootendorst, 2022)模型進行新聞分類,並結合情緒分數,通過迴歸分析和最小絕對壓縮挑選運算子(Least Absolute Shrinkage and Selection Operator,LASSO)(Tibshirani, 1996)迴歸找出顯著主題類別,進而使用長短期記憶(Long Short-Term Memory,LSTM)(Hochreiter and Schmidhuber, 1997)模型訓練和預測台積電隔日股票一日報酬(收盤價與開盤價差距),並設計交易策略以評估不同方法的交易效果。研究結果顯示,使用BERTopic分類財經新聞,並通過迴歸和LASSO迴歸選取顯著主題後,再以LSTM進行訓練和預測,其交易結果最佳,投資報酬率達56%。本研究證明BERTopic能有效處理自動化篩選的新聞資料,並結合傳統迴歸和深度學習方法,成功應用於股票交易策略上;相比之下,雖然LDA能進行新聞分類,但無法用自動化篩選的新聞資料預測股票報酬的漲跌。 This study focuses on "TSMC" as the keyword, selecting financial news from the Liberty Times and China Times News Network from 2016 to 2023. The news from 2016 to 2021 is used as training data, while the news from 2022 to 2023 is used as test data. After automated screening and cleaning, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and BERTopic (Grootendorst, 2022) models are employed for news classification. Combining sentiment scores, significant topic categories are identified through regression analysis and Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996) regression. An Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) model is then used to train and predict TSMC's next-day stock return (the difference between closing and opening prices). A trading strategy is designed to evaluate the performance of different methods. The results show that using BERTopic to classify financial news, selecting significant topics through regression and LASSO regression, and then training and predicting with LSTM yields the best trading results, with an investment return rate of 56%. This study demonstrates that BERTopic can effectively handle relatively coarse news data and, combined with traditional regression and deep learning methods, can be successfully applied to stock trading strategies. In contrast, while LDA can classify news, it cannot predict stock returns' rise and fall using automatically screened news data. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94218 |
DOI: | 10.6342/NTU202403036 |
全文授權: | 未授權 |
顯示於系所單位: | 經濟學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf 目前未授權公開取用 | 4.16 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。