運用文字探勘分析市場消息中關鍵詞彙與臺灣股市之關聯性--以科技股、食品股為例

Hung-Chieh Lee; 李宏杰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69466

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪茂蔚(Mao-Wei Hung, Ph.D.)
dc.contributor.author	Hung-Chieh Lee	en
dc.contributor.author	李宏杰	zh_TW
dc.date.accessioned	2021-06-17T03:16:30Z	-
dc.date.available	2028-07-02
dc.date.copyright	2018-07-06
dc.date.issued	2018
dc.date.submitted	2018-07-03
dc.identifier.citation	參考英文文獻 [1] Ahmad, K., Oliveira, P. C. F. D., Manomaisupat, P., Casey, M. & Taskaya, T. ( 2002. ). Description of events: An analysis of keywords and indexical names. [2] Eugene F. Fama ( 1970. ). Efficient Capital Markets: A Review of Theory and Empirical Work [3] Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D. & Allan, J. ( 2000. ), Mining of concurrent text and time series. In: Proceedings of the 6th international conference on knowledge discovery and data mining, pp 37-44. [4] Ma, Wei-Yun and Keh-Jiann Chen ( 2003. ), 'Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff', Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp168-171. [5] Mittermayer, Knolmayer ( 2016. ), NewsCATS: A News Categorization and Trading System [6] ST Dumais, GW Furnas, TK Landauer ( 1988. ), Using latent semantic analysis to improve access to textual information [7] Victor Lavrenko ( 2001.), Topical Language Models, An Overview of Estimation Techniques [8] Yu-Ming Hsieh, Wei-Yun Ma. volume 21, number 2, pages 19-34, December ( 2016.), N-best Parse Rescoring Based on Dependency-Based Word Embeddings. International Journal of Computational Linguistics and Chinese Language Processing. 參考中文文獻 [1] 柯禹伸，北台灣科學技術學院，使用文字探勘技術預測股票漲跌之研究( 2001 )，p.19-80 [2] 吳昀錚，利用文字探勘技術預測台股加權指數之漲跌趨勢( 2008 )，p.20-56 [3] 吳振和，應用文字探勘技術於概念股股價共同之研究( 2011 )，p.15-22 [4] 吳佳儒，新聞報導對現金增資宣告時之股價衝擊( 2013 )，p.26-27 [5] 黃于珊，文字探勘在總體經濟上之應用－以美國聯準會會議紀錄為例( 2017 ), p.6-7 [6] 楊德倫，文字探勘之前處理與TF-IDF 介紹( 2014 )，臺灣大學計算機及資訊網路中心教學研究組 [7] 簡智宏，應用文字探勘技術於概念股輿情與股價共同移動之研究-以蘋果供應鏈為例( 2015 ) [8] 鍾任明李維平吳澤民，運用文字探勘於日內股價漲跌趨勢預測之研究( 2007 )，中華管理評論國際學報‧第十卷‧第一期，p.3-10
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69466	-
dc.description.abstract	在現今臺灣股市，基本分析及技術分析經常是投資大眾預測股市表現的分析方式，至今有許多學者也是採取此兩種方式進行預測股價，但仍有預測上的諸多難度及限制，如分析區間太短及短期波動等問題難以處理，對於公司內部行動較難知情，在行為見解上，如減資不一定是彌補虧損，也可能是公司暫時找不到適合投資機會等之類解釋，以及與許多指標上運用情況及量化的問題。在實務上，對於投資策略或特定條件的狀況，此二分析方式參考價值雖有值得參考之處，但充分解釋上仍是有限。臺股過去常被普遍認為是接近低效率市場，因此許多投資人希望盡快取得市場上流通第一手資訊，輔助判斷當下決策，本研究也納入此假設。然而因為過去研究常產生選用參考詞庫上的客觀性問題、傳播媒體的解讀及文字選詞不盡相同，以及資料數據十分龐大。因此消息面預測股市趨勢是有其難度的，需考慮許多複雜因素。為了解決上述難題，過往有不少研究多以取得網路上消息條目以進行抽樣及斷詞分析。本研究主以Pandas toolbox功能篩選同類別條件的個股( 例如:科技股-臺積電-宏達電 )，並取得公開資訊觀測站上個股歷史重大消息條目( 取得每日條目分析，考慮因許多臺股重大消息反應在周一開盤或當日晨間公布時 )，依消息發生之時間點( 過去3年內，從今年4/10回推)比對該股日收盤價歷史走動，針對消息後上漲、持平及下跌以進行漲跌幅標記，彙整後以中研院資訊科學所開發之詞庫中文斷詞系統進行斷詞，以及利用结巴中文分词系統( Python內建Jieba套件)進行關鍵字偵測、出現頻率、詞性篩選及配對，尋找與該類別股市漲跌表現具因果關係的關鍵詞彙並加以評分( 如科技股:成本價格上揚、擴廠、重挫等)，以期建立幫助市場投資者往後觀察並預測股價浮動的另一參考方式。另外，本研究也提出了利用SVD奇異值分解，對消息中文字進行因果預測，以建立字詞判斷漲跌的分類字庫，嘗試解決過去研究參考題庫客觀性的問題。實證結果顯示，除了宏達電利用TF-IDF法，篩選出關鍵詞內容無法有效預測未來收盤價走勢，其他三間公司：臺積電、統一、味全於公開資訊觀測站公布之重大訊息條目，包含標題及詳細資料，從其內容篩選關鍵字(10個以內)大致可預測收盤價走勢。	zh_TW
dc.description.abstract	Nowadays, the fundamental and technical analysis are often used as analysis techniques by publics in Taiwanese stock market. Related research from many scholars present that both techniques take great part in the research progress, but utter difficulties and restrictions are coming up in the forecasting stock prices. For examples, scholars found it hard to solve the problems such as short analysis interval, short-term fluctuations or difficulties to know the detailed information from policies in companies. Also, there are more measuring problems about setting indexes. In practice, when it comes to forming up an investment strategy or entering certain conditions, both analysis prove themselves as valuable methods to forecast the market trend, but they are still insufficient to explain the trend. The Taiwanese stock market are often considered as a nearly-inefficient market. Many investors desire to gather first-hand information as soon as possible in order to help decision in the meantime. In this research, this hypothesis also exists. However, it is hard to forecast the market by the information-gathering approach due to the lack in time and several issues, like huge data range, the different choosing ways in the text. In short, many complicated factors needed to be concerned. In order to solve these problems, many scholars applied the text mining approach to help analyze the performance in the stock market and sample the needed text on the website. This research will be mainly focus on data mining by using ( Python 3.6 ) Pandas toolbox techniques to select the stocks in the same categories. For example, TSMC and HTC belongs to technology stocks. We will choose the Taiwanese market observation post system to capture daily historical message on the three year basis and compare with the counterpart daily stock price. Then, mark positive and minus on the selective nouns and verbs to help the latter analysis. We will use CKIP and Jeiba system in the progress of this research. By finding the key words and valuation, we can improve this way of forecasting stock price and help investors understand more details about text mining in forecasting stock prices.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T03:16:30Z (GMT). No. of bitstreams: 1 ntu-107-R05724003-1.pdf: 7868420 bytes, checksum: 7d55cd1144f80869cba8ccc39cd9c26f (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	目錄第壹章緒論第一節研究背景................................................................................ 7 第二節研究動機................................................................................ 9 第三節研究目的及可能預期問題..................................................10 第四節研究架構.............................................................................. 11 第五節研究流程..............................................................................12 第貳章過去研究文獻與相關理論之探討第一節過去研究理論回顧..............................................................13 第二節過去學者相關研究整理......................................................14 第參章研究範疇界定第一節研究對象及資料取得來源..................................................16 第二節探討蒐集資料時間區間之決定..........................................16 第三節本研究環境設定的假設......................................................17 第肆章研究方法第一節研究方法概要......................................................................18 第二節研究模型簡述......................................................................18 第三節公開資訊觀測站上研究資料取得方式( PANDAS ) .....19 第四節CONVERTZ 進一步處理已取得資料( BIG5 轉 UTF-8 )....21 第五節資料斷詞處理(中研院CKIP 系統) .................................22 第六節以潛在語意分析( LSA)建置可比對漲跌字庫..................26 第七節PYTHON JIEBA 進行TF-IDF 演算法篩選關鍵詞.........29 第伍章研究實證與結果分析第一節標記後文字資料進行分數評估..........................................38 第二節語言模型上的分析及說明預測結果( EVIEWS 8 ) ..............41 第陸章研究結語與建議第一節研究結語..............................................................................58 第二節給予未來研究參考建言......................................................61 參考英文文獻....................................................................................................... 63 參考中文文獻..................................................................................................... 64 附錄 ....................................................................................................................... 65
dc.language.iso	zh-TW
dc.subject	股價浮動	zh_TW
dc.subject	消息面	zh_TW
dc.subject	漲跌幅標記	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	語意	zh_TW
dc.subject	關鍵詞彙	zh_TW
dc.subject	verbs	en
dc.subject	text mining	en
dc.subject	forecast	en
dc.subject	stock price	en
dc.subject	nouns	en
dc.subject	data mining	en
dc.title	運用文字探勘分析市場消息中關鍵詞彙與臺灣股市之關聯性--以科技股、食品股為例	zh_TW
dc.title	An analysis in the relations between the key phrases and the stock performance in Taiwanese stock market (e.g., Technology stocks and food industry stocks)	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蔡豐澤,蔡佳芬,邱琦倫
dc.subject.keyword	消息面,關鍵詞彙,語意,資料探勘,漲跌幅標記,股價浮動,	zh_TW
dc.subject.keyword	data mining,text mining,forecast,stock price,nouns,verbs,	en
dc.relation.page	89
dc.identifier.doi	10.6342/NTU201801080
dc.rights.note	有償授權
dc.date.accepted	2018-07-04
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	國際企業學研究所	zh_TW
顯示於系所單位：	國際企業學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	7.68 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。