Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17212
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor曹承礎(Seng-cho chou)
dc.contributor.authorJung-Hsuan Chouen
dc.contributor.author周容萱zh_TW
dc.date.accessioned2021-06-08T00:01:15Z-
dc.date.copyright2013-08-23
dc.date.issued2013
dc.date.submitted2013-08-15
dc.identifier.citationAsur Sitaram, & Huberman Bernardo A. (2010). Predicting the Future With Social Media.
Berlo David Kenneth. (1960). The process of communication: an introduction to theory and practice.
Carl Iver Hovland, Irving Lester Janis, & Kelley Harold H. (1985). Communication and persuasion: psychological studies of opinion change.
Cha Meeyoung, Haddadi Hamed, Benevenuto Fabricio, & Gummadi Krishna P. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy.
Culotta, A. (2010). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. in SOMA 2010, Proceedings of the 1st Workshop on Social Media Analytics.
Danah Boyd, Scott Golder, & Lotan Gilad. (2010). Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter. IEEE, 1-10. doi: 10.1109/hicss.2010.412
Dann, S. (2010). Twitter content classification. First Monday[online], volume 15.
Elham Khabiri, Chiao-Fang Hsu, and James Caverlee. (2009). Analyzing and Predicting Community Preference of Socially Generated Metadata: A Case Study on Comments in the Digg Community. ICWSM.
Forman George. (2003). An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 3, 1289-1305.
Gallaugher John, & Ransbotham Sam. ( 2010). Social Media and Customer dialog Management at Starbucks. MIS Quarterly Executive;2010, Vol. 9 (Issue 4), p197.
Hong Liangjie, Dan Ovidiu, & Davison Brian D. (2011). Predicting popular messages in Twitter. Paper presented at the Proceedings of the 20th international conference companion on World wide web, Hyderabad, India.
Hu Minqing, & Liu Bing. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA.
Internet Users in the World Distribution by World Region. (2012). Retrieved December, from http://www.internetworldstats.com/stats.htm
Java Akshay, Finin Tim, Song Xiaodan, & Tseng Belle. (2007). Why We Twitter: Understanding Microblogging Usage and Communities. ACM.
Joachims Thorsten. (1998). Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. Paper presented at the Proceedings of the 10th European Conference on Machine Learning.
Li Nan, & Wu Desheng-Dash. (2009). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision support systems : DSS.
McAndrew, Francis T., & Jonge, Chelsea Rae De. (2011). Electronic Person Perception: What Do We Infer About People From the Style of Their E-mail Messages? Social Psychological and Personality Science, Vol.2(No.4), 403-407.
Naaman Mor, Boase Jeffrey, & Lai Chih-Hui. (2010). Is it really about me?: message content in social awareness streams. Paper presented at the Proceedings of the 2010 ACM conference on Computer supported cooperative work, Savannah, Georgia, USA.
Petrovi’ Saˇsa, Osborne Miles, & Lavrenko Victor. (2010). Streaming First Story Detection with application to Twitter.
Silva, C., & Ribeiro, B. (2003, 20-24 July 2003). The importance of stop word removal on recall values in text categorization. Paper presented at the Neural Networks, 2003. Proceedings of the International Joint Conference on.
Suh Bongwon, Hong Lichan, Pirolli Peter, & Chi Ed H. (2010). Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network. Paper presented at the Proceedings of the 2010 IEEE Second International Conference on Social Computing.
Yiming-Yang. (1999). An Evaluation of Statistical Approaches to Text Categorization. Inf. Retr., 1(1-2), 69-90. doi: 10.1023/a:1009982220290
Yu Bei. (2006). An evaluation of text classification methods for literary study. University of Illinois at Urbana-Champaign.
Yu Bei, Chen, Miao, & Kwok, Linchi. (2011). Toward predicting popularity of social marketing messages. Paper presented at the Proceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction, College Park, MD.
Zhao Dejin, & Rosson Mary Beth. (2009). How and Why People Twitter: The Role that Micro-blogging Plays in Informal Communication at Work. ACM.
黃建榮. (2004). 使用支援向量機分類變異特徵之影像查詢. 朝陽科技大學.
廖述賢、溫志浩. (2012). 資料探勘理論與應用: 博碩文化股份有限公司.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17212-
dc.description.abstract社群網站隨著行動裝置以及無線網路發展,已成為人們生活中不可或缺的一部分,著名的社群網站像是FaceBook, Twitter, LinkedIn, Plurk…等,每天都有成千上萬名使用者透過社群網站來維繫人際關係、表達自我、分享軼聞趣事,已有許多研究利用這個過程中所產生的龐大資訊量,進行了行銷、社交行為、金融預測、疾病與災害防治等分析與應用。Twitter在社群網站中屬於微網誌的分類,是世界第二大的社群網站,由於每則發文不得超過140個字元,使其就像網際網路中的簡訊服務一般,無形中增加了使用者撰寫發文的次數,提升了資訊傳播的效率。轉推是Twitter中資訊傳播的重要機制,他能讓有用、有趣的訊息爆炸性地散播,基於Twitter多元的使用者以及訊息傳播的便利性,了解具有甚麼樣特性的發文較易被轉推便成為重要的議題之一。
本研究利用twitter4j從Twitter上收集資料,並依內容特性、發文特性以及作者特性利用資料探勘方法建立預測模型來預測發文所獲得之轉推數等級。在前測階段我們選用支援向量機(Support Vector Machine)、單純貝氏分類器(Naive Bayes)以及決策樹(Decision Tree)三種資料探勘方法分別建立預測模型比較預測能力,選擇效能最好之資料探勘方法,再分別對內容特性、發文特性、作者特性以及本研究所提出之特殊變數建立預測模型,進行預測能力比較。
我們的實驗使用weka此一資料探勘工具來進行預測模型建立,輔以十折交叉驗證(10-fold cross validation)進行模型訓練。研究結果顯示決策樹是前測階段效能最好之資料探勘方法,並且使用本研究所提出的十個特殊變數建立之預測模型,比內容特性、發文特性、作者特性任一預測模型之整體預測力還要好,但是在各推文數等級每個預測模型之預測力各有千秋,因此不宜偏頗地只使用某項特性。
zh_TW
dc.description.abstractAs the development of mobile devices and wireless network, social network has become an indispensable part of humans’ lives. Famous social network sites such as FaceBook, Twitter, LinkedIn and Plurk has numerous users maintaining interpersonal relationships, presenting themselves, and sharing anecdotes with social network sites everyday. Many studies had utilized the huge amount of information generated in this process to analyze and apply in marketing, social behavior, financial forecasting, disease and disaster prevention. Twitter is categorized to micro-blog in social network, it is second-largest social network site in the world. Due to the restriction of 140 characters in a post, make Twitter as the SMS in social network. Potentially increased the number of posts written by users, and enhanced the efficiency of the information propagation. Retweet is the key mechanism for information propagation in Twitter. It emerged as a simple yet powerful way of disseminating useful information. Based on the abundance of users and convenience of information propagation, understanding what kind of posts will be retweet more easily has become an important issue.
In this study, we collected datasets from Twitter by twitter4j and build a predictive model to predict the level of retweet number by data mining technology based on content feature, post feature, author feature and special variables we proposed in this study. In the pretest stage, we chose Support Vector Machine, Naive Bayes and Decision Tree to build predictive model and compared the performance of each model. Then we selected the method with best performance, and used this method to build predictive model based on features we mentioned earlier respectively.
Our experiments are executed with weka, a data mining tool, to build predictive model and performed by a 10-fold cross-validation to train the predictive model. Experiment results shows that Decision Tree is the best data mining method in pretest stage, and after comparing the performances of each predictive models, we found that the model built based on special variables we proposed in this study was the best among all features. And every predictive model has different predict power in different retweet number levels, so it is biased to use only one feature to build predictive model.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T00:01:15Z (GMT). No. of bitstreams: 1
ntu-102-R00725043-1.pdf: 1285491 bytes, checksum: 9ab43d224b8f8eb05423b54e8308a995 (MD5)
Previous issue date: 2013
en
dc.description.tableofcontents摘要 ii
Abstract iii
目錄 iv
圖目錄 v
表目錄 vi
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究架構 3
第二章 文獻探討 5
2.1 社群網路:以Twitter為例 5
2.2 社群網路之影響力分析 9
第三章 研究方法 14
3.1 變數定義 14
3.1.1 內容特性 15
3.1.2 發文特性 16
3.1.3 作者特性 17
3.2 研究流程 18
3.2.1 資料前處理 19
3.2.2 方法選取 21
3.2.3 模型建立 26
第四章 實驗 27
4.1 資料收集 27
4.2 實驗一:方法選取 29
4.3 實驗二:特性評估 31
4.4 實驗三:特殊變數檢驗 35
第五章 結論與未來展望 39
參考文獻 41
dc.language.isozh-TW
dc.title社群網路中資訊傳播預測之探討:以Twitter為例zh_TW
dc.titleInvestigation of Predicting Information Propagation on Social Network:Using Data from Twitteren
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳文國,王貞雅
dc.subject.keywordTwitter,資訊傳播,影響力分析,預測,zh_TW
dc.subject.keywordTwitter,Information propagation,influence power analyze,predict,en
dc.relation.page42
dc.rights.note未授權
dc.date.accepted2013-08-15
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf
  未授權公開取用
1.26 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved