請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55466
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 蘇雅韻(Ya-Yunn Su) | |
dc.contributor.author | Roy Guanyu Lin | en |
dc.contributor.author | 林冠宇 | zh_TW |
dc.date.accessioned | 2021-06-16T04:04:02Z | - |
dc.date.available | 2018-02-03 | |
dc.date.copyright | 2015-02-03 | |
dc.date.issued | 2014 | |
dc.date.submitted | 2014-10-02 | |
dc.identifier.citation | [1] Gabriel Pui Cheong Fung, Jeffrey Xu Yu, and Wai Lam. Stock prediction: Integrat- ing text mining approach using real-time news. In Computational Intelligence for Financial Engineering, 2003. Proceedings. 2003 IEEE International Conference on, pages 395–402. IEEE, 2003.
[2] Robert P Schumaker and Hsinchun Chen. Textual analysis of stock market predic- tion using breaking financial news: The azfin text system. ACM Transactions on Information Systems (TOIS), 27(2):12, 2009. [3] Victor Lavrenko, Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen, and James Allan. Mining of concurrent text and time series. In KDD-2000 Workshop on Text Mining, pages 37–44. Citeseer, 2000. [4] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innova- tion, competition, and productivity. 2011. [5] Ckipchinesetextsegmentationtool.(2014).http://ckipsvr.iis.sinica.edu.tw/.Re- trieved July 3, 2014. [6] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010. [7] Upendra Sharma, Prashant Shenoy, Sambit Sahu, and Anees Shaikh. A cost-aware elasticity provisioning system for the cloud. In Distributed Computing Systems (ICDCS), 2011 31st International Conference on, pages 559–570. IEEE, 2011. [8] Tian Guo, Upendra Sharma, Timothy Wood, Sambit Sahu, and Prashant J Shenoy. Seagull: Intelligent cloud bursting for enterprise applications. In USENIX Annual Technical Conference, pages 361–366, 2012. [9] Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, and David Patterson. Statistics-driven workload modeling for the cloud. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on, pages 87–92. IEEE, 2010. [10] Prabhakar Raghavan Christopher D. Manning and Hinrich Schütze. Naive bayes classification. In Introduction to Information Retrieval, chapter 13, page 258. Cam- bridge University Press, 2008. [11] Ruby on rails web framework. (2014). http://rubyonrails.org/, Retrieved July 2, 2014. [12] Mechanize ruby gem. (2014). https://rubygems.org/gems/mechanize. Retrieved July 2, 2014. [13] Watir ruby gem. (2014). https://rubygems.org/gems/watir. Retrieved July 2, 2014. [14] Amazon web service. (2014). http://aws.amazon.com/. Retrieved July 3, 2014. [15] Amazon web service elastic compute cloud (ec2). (2014). http://aws.amazon. com/ec2/. Retrieved July 3, 2014. [16] Peter Mell and Tim Grance. The nist definition of cloud computing. 2011. [17] Amazon web service s3. (2014). http://aws.amazon.com/s3/. Retrieved July 3, 2014. [18] Apache mahout project. (2014). https://mahout.apache.org/. Retrieved July 4, 2014. [19] Apache hadoop project. (2014). http://hadoop.apache.org/. Retrieved July 4, 2014. [20] Jieba project for chinese text segmentation. (2014). https://github.com/fxsjy/ jieba. Retrieved July 4, 2014. [21] Takashi Kimoto, Kazuo Asakawa, Morio Yoda, and Masakazu Takeoka. Stock mar- ket prediction system with modular neural networks. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on, pages 1–6. IEEE, 1990. [22] Mahout naïve bayes. (2014). https://mahout.apache.org/users/classification/ bayesian.html. Retrieved July 5, 2014. [23] Google compute engine (2014). https://cloud.google.com/compute/. Retrieved July 10, 2014. [24] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10–10, 2010. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55466 | - |
dc.description.abstract | 股市趨勢預測是很熱門的一個研究議題,這個議題主要是希望可以預期未來的股市價格趨勢,是上漲或是下跌。對於短期投資人來說,新聞資訊是一個重要的參考指標,用來預測股價是否會上漲或下跌。近年來,社群網路的興起,使得更即時的文字資訊,也被考慮拿來當做股價預測的另一個因子,然而,社群網路的資訊量越來越大,不是一般傳統的文字處理預測系統可以負擔得起。另外,因為開源軟體的成熟,使得處理大量資料的運算平台更容易的被架設。基於這些想法,這篇研究的主要議題是,建構一個處理中文文字資訊的可擴展性股市趨勢預測系統,利用大量的中文新聞文章當作第一步,進行股價的趨勢預測。有了這個系統,將會加速預測模型的建立與效能驗證,此外,搭配近幾年興起的雲端運算服務,使得這個平台更容易且即時地被搭建在雲端上,詳細地說,就是把系統封裝成映像檔,當需要的時候再跟雲端服務商租借資源根據映像檔來啟動平台服務。搭配雲端的另外一個問題,是討論如何充分利用既有的資源,在需求超出系統的負載量時,到雲端租借額外的資源來支持服務的品質。這篇研究的成果顯示,我們使用開源軟體所搭建的系統,在中文文字處理方面,Jieba中文斷詞開源專案,在本篇研究的修改過後,在四核處理2.3GB新聞文字的平行化的能力,提高了80%。然而,負責機器學習部分的Mahout Project並沒有顯示出效能的提升。 | zh_TW |
dc.description.abstract | Stock Market Prediction is a problem that people deal with when they want to predict market trend. For short-term investment, news is one of the most important factors that has influence on stock price. Based on this idea, our target issue is to build a scalable stock market prediction system, which can process Chinese news articles in order to produce a prediction model. With this system, we can speed up the model training process and take into account more training source, e.g., posts from China’s microblog service, Sina Weibo. Also, with the emergence of cloud computing, a scalable system can lease more resources from cloud to serve the growing work. Our solution about building this system is using mature open source project, such as Hadoop for parallel computing, Mahout for scalable machine learning, and Jieba for Chinese text segmentation. We provide a basic algorithm for stock trend prediction, build the software stack, collect the news in Taiwan during March 2009 to May 2014 and also run some experiments to evaluate scalability of this system. The result shows that in this application, Jieba Chinese text Segmentation tool can scale well with multiprocessing, namely, 80 percent faster with four parallel processes compared to sequential mode. However, Mahout does not show significant speedup in this scenario. | en |
dc.description.provenance | Made available in DSpace on 2021-06-16T04:04:02Z (GMT). No. of bitstreams: 1 ntu-103-R00922096-1.pdf: 4132833 bytes, checksum: 0440d4fbe23a7908b72424ef33909116 (MD5) Previous issue date: 2014 | en |
dc.description.tableofcontents | 誌謝 ii
摘要 iv Abstract v 1 Introduction 1 2 Overview of The System 3 2.1 Systemworkflow .............................. 3 2.2 System Software Stack Design and Implementation . . . . . . . . . . . . 5 2.3 ScalabilityIssues .............................. 7 3 Literature Review 9 4 Basic Stock Market Prediction Algorithm 12 5 Preliminary Experiments 15 6 Experiment on Cloud 18 6.1 ExperimentDescription........................... 18 6.2 ProfilingJiebaComponents......................... 19 6.3 HightCPUVMwithSSD.......................... 19 7 Future Direction and Conclusion 24 Bibliography 26 | |
dc.language.iso | en | |
dc.title | 可擴展式基於文字分析之股票趨勢預測系統 | zh_TW |
dc.title | Scalable System for Textual Analysis based Stock Market Prediction | en |
dc.type | Thesis | |
dc.date.schoolyear | 103-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 蔡子傑(Tzu-Chieh Tsai),諶家蘭(Jia-Lang Seng) | |
dc.subject.keyword | 分散式系統,可擴展性,股票趨勢預測,雲端運算, | zh_TW |
dc.subject.keyword | Distributed System,Scalability,Stock Market Prediction,Cloud Computing, | en |
dc.relation.page | 28 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2014-10-03 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-103-1.pdf 目前未授權公開取用 | 4.04 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。