以文字探勘方法探討臺灣大學校務建言與回覆關聯性之研究

Tzu-Han Liao; 廖子涵

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4081

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉仁沛(Jen-Pei Liu)
dc.contributor.author	Tzu-Han Liao	en
dc.contributor.author	廖子涵	zh_TW
dc.date.accessioned	2021-05-13T09:20:33Z	-
dc.date.available	2018-08-23
dc.date.available	2021-05-13T09:20:33Z	-
dc.date.copyright	2016-08-23
dc.date.issued	2016
dc.date.submitted	2016-08-19
dc.identifier.citation	[英文參考文獻] 1. Gantz J. & Reinsel D. (2012). The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. IDC Report. Published by International Data Corporation, sponsored by EMC Corporation. 7. Frawley,W. J., Paitetsky-Shapiro, G., & Matheus, C.J. (1991). Knowledge Discovery in Databases: An overview. Communication of the ACM, 39, 1-34. 8. Grupe, F. H., & Owrang, M. M. (1995). Database mining discovering new knowledge and cooperative advantage. Information systems management, 12, 26-31. 9. Fayyad, U. M., Piatetsky,S. G. & Padhraic, S. (1996). From data mining to knowledge discovery in databases. American Association for Artificial Intelligence, 11(5), 20-25. 10. Berry, M.J.A., & Linoff, G.S. (1997).Data mining techniques: For marketing, sales, and customer support. John Wiley & Sons, Inc. New York, NY, USA. 11. Sholom M. Weiss & Nitin Indurkhya (1998). Predictive data mining: a practical guide. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. 12. Kleissner, C.(1998). Data Mining for the Enterprise. Proceedings of the 31st Annual Hawaii International Conference On System Sciences, 295-304. 13. Hand , D. J., Blunt, G., Kelly, M. G., & Adams, N. M. (2000). Data mining for fun and profit. Statistical Sci., 15,111-131. 14. Shaw,M. J., Subramaniam, C., Tan, G. W. E.(2001).Knowledge management and data mining for marketing. Decision Support System, 31(1),127-137. 17. Burges, C.J.C. (1998).A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121-167. 18. Berson, A., Smith, S., & Therling, K. (1999). Building Data Mining application for CRM. McGraw-Hill Companies, New York, NY, USA. 19. Hearst, M.A.(1999).Untangling text data mining. Proceeding ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics Pages 3-10. College Park, Maryland. 20. Dan Sullivan(2001).Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales. John Wiley & Sons, Inc. New York, NY, USA. 22. Yuen-Hsien Tseng, Yeong-Ming Wang, Dai-Wei Juang, Chi-Jen Lin(2005). Text Mining for Patent Map Analysis. Proceedings of IACIS Pacific 2005 Conference. Taipei, Taiwan. 28. Sproat,R. and Shih,C. (1990). A statistical method for finding word boundaries in chinese text. Computer Processing of Chinese and Oriental Languages, Vol. 4No. 4, 336-351. 42. Martin, D.I., & Berry, M. W.(2007). Mathematical foundations behind latent semantic analysis. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis. (pp. 35-55). Mahwah, NJ: Lawrence Erlbaum Associate. 45. Berry, M.W., & Browne, M. (2005). Understanding search engines: Mathematical Modeling and Text Retrieval. Philadelphia: SIAM, 2,12-14. 46. Dumais, S. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, and Computers, 23, 229–236. 47. Letsche, T., & Berry, M. W. (1997). Large-scale information retrieval with latent semantic indexing. Information Sciences, 100, 105–137. 48. Landauer, T.K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259-284. 49. Berry, M.W., Dumais, S., & O’Brien, G. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37, 573–595. 50. Witter, D., & Berry, M. W. (1998). Downdating the latent semantic indexing model for conceptual information retrieval. The Computer Journal, 41, 589–6 51. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. 52. Cooley, W. W. & Lohnes, P. R. (1971) .Multivariate Data Analysis. Wiley, New York, NY. 56. Yang, Changhua, Lin, Kevin Hsin-Yih, & Chen, Hsin-Hsi. (2007). Emotion Classification Using Web Blog Corpora. Proceeding WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence , 275-278. Washington, DC, USA. 57. Lin, Hao-Chiang Koong, Chen, Nian-Shing, Sun, Rui-Ting, & Tsai, I. Hen.(2012). Usability of affective interfaces for a digital arts tutoring system. Behaviour & Information Technology, Volume 33, Issue2, 1-12. [中文參考文獻] 2. 姚力維(2014)。國立台灣大學校務會議及校務建言系統資料之分析研究。國立台灣大學農藝學系未發表碩士論文。臺北，臺灣。 4. 胡世忠(2013)。雲端時代的殺手級應用：Big Data海量資料分析。臺北市:天下雜誌。 15. 黃勝崇(2000)。資料探勘應用於醫療院所輔助病患看診指引之研究。南華大學資訊管理研究所未發表碩士論文。嘉義，臺灣。 16. 謝邦昌(2001)。資料採礦入門及應用－統計技術看資料採礦。臺北市：資商訊息顧問股份有限公司。 21. 巫啟台(2001)。文件之關聯資訊萃取及其概念圖自動建構。國立成功大學資訊工程學系未發表碩士論文。臺南，臺灣。 23. 朱怡霖(2002)。中文斷詞與專有名詞辨識之研究。國立臺灣大學資訊工程學系未發表碩士論文。臺北，臺灣。 24. 曾元顯(1997)。關鍵詞自動擷取技術與相關詞回饋。中國圖書館學會會報 59 期Pages 59-64。 25. 喻欣凱(2008)。運用支援向量機與文字探勘於股價漲跌趨勢之預測。輔仁大學資訊管理學系未發表碩士論文。臺北，臺灣。 26. 林千翔、張嘉惠、陳貞伶(2010)。結合長詞優先與序列標記之中文斷詞研究。中文計算語言學期刊15卷3-4期 Pages 161 -179。 27. 陳克建、陳正佳、林隆基(1986)。中文語句分析的研究-斷詞與構詞。中央研究院資訊所技術報告TR86-004。 29. 詞庫小組(1998)。中央研究院平衡語料庫的內容與說明(修訂版)。臺北市:中央研究院資訊科學研究所中文詞知識庫小組。 41. 陳明蕾、王學誠、柯華葳(2009)。中文語意空間建置及心理效度驗證：以潛在語意分析技術爲基礎。中華心理學刊 51卷4期 Pages 415 – 435。 44. 白鎧誌、李政軒、郭伯臣、廖晨惠（2011）。應用潛在語意分析探究詞彙對語料庫之重要性。2011資訊科技國際研討會，朝陽科技大學。 53. 沈明來(2011)。統計分析與SAS應用。臺北市:九州圖書文物有限公司。 54. 沈明來(2007)。實用無母數統計學。臺北市:九州圖書文物有限公司。 58. 林豪鏘(2013)。以FACEBOOK 塗鴉牆文本分析情緒文字的關係。國立台南大學數位科技學習系未發表碩士論文。臺南，臺灣。 [網頁] 3. Here’s What Happens in 60 Seconds on the Internet。Accessed date: March 05,2016. http://smallbiztrends.com/2015/12/60-seconds-on-the-internet.html 5. 劃時代的掏金術 Big Data。Accessed date: March 05,2016. http://www.moneydj.com/topics/bigdata/ 6. IBM海量資料的淘金術。Accessed date: March 20,2016. http://www-07.ibm.com/tw/blueview/2012oct/8.html 30. 中研院中文詞知識庫小組(CKIP)-中文斷詞系統。Accessed date: March 22,2015. http://ckipsvr.iis.sinica.edu.tw/ 31. GitHub - fxsjy/jieba: 結巴中文分詞。Accessed date: March 30,2015. https://github.com/fxsjy/jieba 32. JIEBA 結巴中文斷詞。Accessed date: March 30,2015. https://speakerdeck.com/fukuball/jieba-jie-ba-zhong-wen-duan-ci 33. jiebaR中文分词。Accessed date: March 30,2015. http://doc.qinwf.com/jiebaR_v0_7/index.html 34. 國立台灣大學統計教學中心-統計軟體介紹。Accessed date: April 03,2016. http://www.statedu.ntu.edu.tw/lab/%E7%B5%B1%E8%A8%88%E8%BB%9F%E9%AB%94%E7%B0%A1%E4%BB%8B.asp 35. R講題分享–SpideR--用R自製網路爬蟲收集資料。Accessed date: April 05,2015. http://programmermagazine.github.io/201311/htm/article6.html 36. 維基百科:大五碼（Big5）。Accessed date: April 10,2015. https://zh.wikipedia.org/wiki/%E5%A4%A7%E4%BA%94%E7%A2%BC 37. 維基百科: UTF-8。Accessed date: April 10,2015. https://zh.wikipedia.org/wiki/UTF-8 38. 台大-校總區及其他校區之主要建築物逐棟編碼地理位置對照。 Accessed date: April 15,2015. https://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiJ8r2Li9HMAhXFjJQKHe0hDHoQFggbMAA&url=http%3A%2F%2Fhomepage.ntu.edu.tw%2F~cpo%2Fenactment%2F991102.pdf&usg=AFQjCNGIg1iMoavfKXirggQyklJloRreSw&sig2=rJ2f0efNEnceGsU5TxCDVA 39. 院系所課程-台大課程地圖。Accessed date: April 15,2015. http://coursemap.aca.ntu.edu.tw/course_map_all/map.php.htm 40. 國立臺灣大學-行政組織。Accessed date: April 15,2015. http://www.ntu.edu.tw/administration/administration.html 41. 線代啟示錄- SVD 於資訊檢索與文本搜尋的應用。Accessed date: July 08,2015. https://ccjou.wordpress.com/2009/11/04/svd-%E6%96%BC%E8%B3%87%E8%A8%8A%E6%AA%A2%E7%B4%A2%E8%88%87%E6%96%87%E6%9C%AC%E6%90%9C%E5%B0%8B%E7%9A%84%E6%87%89%E7%94%A8/ 55. 維基百科:文本情感分析。Accessed date: February 16,2016. https://zh.wikipedia.org/wiki/%E6%96%87%E6%9C%AC%E6%83%85%E6%84%9F%E5%88%86%E6%9E%90 59. 資料科學實驗室: 情緒分析(Sentiment Analysis)的作法與商業價值。 Accessed date: February 16,2016. http://dataology.blogspot.tw/2015/04/sentiment-analysis.html
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4081	-
dc.description.abstract	隨著網路資訊發達及行動通訊的重度使用發展趨勢，大眾們在網路上留下大量的數據，代表著大數據時代（Big Data）的來臨。根據國際數據資訊中心IDC公司統計，2020年時全球總資料量將到達40 Zettabyte（ZB），相當於約43兆Gigabyte（GB），相較於2010年時超過五十倍的成長，且文字、圖片、視頻及音頻等非結構化資料的應用也會越來越頻繁。其中網路勢力崛起的鍵盤力量，讓文字儼然成為網路世界中大家溝通討論的媒介，重要性不可或缺。因此，在現今社會中除了利用量化的資料進行分析外，質化的資料含有更大量的資訊，其分析的結果也更具備價值性。故本研究延續「國立台灣大學校務會議及校務建言資料之分析研究」量化性的研究結果，進一步對台大校務建言的內容進行質化的資料分析。透過文字探勘（Text Mining）進行中文斷詞、潛在語意分析（Latent Semantic Analysis）及情緒分析（Sentiment Analysis），從眾多繁雜、尚未處理的文字中，找出被隱藏在字裡行間的重要資訊。來探討學生使用校務建言系統來表達意見，究竟為了什麼樣的溝通目的及需求；校方面對這樣的問題及建言，究竟如何回應學生，其是否有真正回答並處理問題，還是僅僅只是敷衍學生罷了。再者，兩方在溝通的過程中是否確實地落實真實的雙向「理性」溝通。　　在校務建言的橋樑中，若能透過文字探勘的分析，從中探討雙方在溝通上的問題並給予建議，使學生更能以理性的思辨與態度，提出建言及問題；學校更能以積極的誠意與態度，處理及回應建言。讓行政單位和學生透過校務建言這個網路意見交流的平台，彼此間有良性的互動溝通，這將是使學校的運作有更好的發展。	zh_TW
dc.description.abstract	Advanced network information and growing mobile communications have resulted in an increase of publicly available data on the internet. This indicates the arrival of Big Data. According to statistics from the International Data Corporation （IDC）, the overall volume of data worldwide will reach 40 Zettabytes（ZB）or, 43 trillion Gigabytes（GB）by 2020. This will generate a 50-fold growth from 2010. The frequent use of unstructured information such as text, images, video and audio will also become greater. Specifically, the rise of the power of keyboard has made text-based communication an essential channel for the public to discuss and exchange information online. 　　Aside from the commonly used quantitative analysis, qualitative data incorporates extensive information to provide additional value of analyzes. This study follows Yao’s work: “Statistical Analysis of the Data from University Assembly Meetings and the Opinion Web System of the National Taiwan University” and utilizes its quantitative analysis results to further conduct qualitative analysis on the National Taiwan University’s opinion web system. 　　This research aims to search for hidden information in complicated unprocessed text through text mining which involves text segmentation, latent semantic analysis and sentiment analysis. By using these approaches, it examines the issues that students used the system to express their opinions; and whether the responses from the university effectively and adequately responded and resolved these issues. The research also examines whether both sides have actually communicated in a rational manner.　 To better the communication between students and the university on the opinion web system; this research used the technic of text mining to uncover the problems that occurred in the process of information exchange. It gives further recommendations for students to raise questions through rational and critical thinking and for the university to respond with a positive and genuine attitude. This can enhance the operations of the university and lead to a better development in the future.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T09:20:33Z (GMT). No. of bitstreams: 1 ntu-105-R03621204-1.pdf: 5283421 bytes, checksum: 09571e3e184f1cc06a4d1f45a5ad86cb (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii English Abstract iv 第一章緒論 1 第一節　研究背景與動機 ...1 　第二節　研究目的 3 　第三節　研究架構 4 第二章文獻探討 5 　第一節　大數據 5 　第二節　資料探勘 12 　第三節　文字探勘 18 　第四節　台灣大學校務建言之量化研究 26 第三章研究方法 28 　第一節　研究流程 28 　第二節　研究工具 29 　第三節　編碼處理 30 　第四節　斷詞方法 33 　第五節　字詞處理-專有詞庫建立 35 　第六節　文件-詞彙矩陣(詞彙-文件矩陣) 37 　第七節　潛在語意分析 39 　第八節　相關分析 47 　第九節　情緒分析 52 第四章實證分析 54 　第一節　資料說明 54 　第二節　描述性統計 62 　第三節　關鍵字提取 67 　第四節　建言與回覆之關聯性 76 　第五節　情緒分析 80 第五章結論與建議 82 　第一節　結論 82 　第二節　研究建議與未來展望 83 參考文獻 85 附錄一、詞性對照表 91 附錄二、專有詞庫表 94 附錄三、校務建言原始資料 100 附錄四、測試資料之相似結果 101 附錄五、LSA與人工標記結果 127
dc.language.iso	zh-TW
dc.subject	相關分析	zh_TW
dc.subject	資料採礦	zh_TW
dc.subject	文字探勘	zh_TW
dc.subject	詞頻統計	zh_TW
dc.subject	詞雲	zh_TW
dc.subject	中文斷詞	zh_TW
dc.subject	潛在語意分析	zh_TW
dc.subject	情緒分析	zh_TW
dc.subject	Latent semantic analysis	en
dc.subject	Sentiment analysis	en
dc.subject	Data mining	en
dc.subject	Text mining	en
dc.subject	Word frequency	en
dc.subject	Word cloud	en
dc.subject	Text segmentation	en
dc.title	以文字探勘方法探討臺灣大學校務建言與回覆關聯性之研究	zh_TW
dc.title	Applications of Text Mining to Studying the Association between Responses and Opinions from Opinion Web System of the National Taiwan University	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	季瑋珠(Wei-Chu Chie),謝舒凱(Shu-Kai Hsieh),林志榮(Jr-Rung Lin)
dc.subject.keyword	資料採礦,文字探勘,詞頻統計,詞雲,中文斷詞,潛在語意分析,情緒分析,相關分析,	zh_TW
dc.subject.keyword	Data mining,Text mining,Word frequency,Word cloud,Text segmentation,Latent semantic analysis,Sentiment analysis,	en
dc.relation.page	138
dc.identifier.doi	10.6342/NTU201603448
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2016-08-21
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf	5.16 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。