機器學習應用於試題之標記與分類

Kai-Lin Yen; 顏楷霖

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69726

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	葉仲基(Chung-Kee Yeh)
dc.contributor.author	Kai-Lin Yen	en
dc.contributor.author	顏楷霖	zh_TW
dc.date.accessioned	2021-06-17T03:25:23Z	-
dc.date.available	2023-07-02
dc.date.copyright	2018-07-02
dc.date.issued	2018
dc.date.submitted	2018-05-22
dc.identifier.citation	1. 林宗勳。2000。Support Vector Machines 簡介。國立台灣大學資訊工程研究所。 2. 林大貴。2016。Machine Learning 介紹。網址：http://hadoopspark.blogspot.tw/2016/02/blog-post.html。上網日期：2016-09-05 3. 博客園。2015。Python TF-IDF計算100份文檔關鍵詞權重。中國廣東省中山大學信息科學與技術學院。網址：http://www.cnblogs.com/chenbjin/p/3851165.html。上網日期：2016-10-15。 4. Andrew Ng and John Duchi. 2016. CS229:Machine Learning. Available at: http://cs229.stanford.edu/ . Accessed 13 September 2000. 5. Brink, Henrik. Richards, Joseph W. and Fetherolf, Mark. 2016. Real-World Machine Learning: Model Evaluation and Optimization. 6. Fxsjy. 2012. Jieba. Available at: https://github.com/fxsjy/jieba. Accessed 11 September 2016. 7. Google Open Source. 2013. Learning the meaning behind words. Google. Available at: https://opensource.googleblog.com/2013/08/learning-meaning-behind-words.html.. Accessed 30 November 2016. 8. Kenter, Tom. Borisov, Alexey. and de Rijke, Maarten. 2016. Siamese CBOW: Optimizing Word Embeddings for Sentence Representations. 9. Mark Chang. 2016. Vector Space of Semantics. Available at: http://cpmarkchang.logdown.com/posts/772665-nlp-vector-space-semantics. Accessed 22 October 2016. 10. Srivastava, Tavish. 2014. Introduction to Random forest – Simplified. Available at: https://www.analyticsvidhya.com/blog/2014/06/introduction-random-forest-simplified/. Accessed 11 September 2016. 11. Socher, Richard. 2016. CS224d:Deep Learning for Natural Language Processing. The Stanford Natural Language Processing Group. Available at: http://cs224d.stanford.edu/syllabus.html. Accessed 25 November 2016. 12. Tsai, Sheng-Wen。2015。玩轉文字探勘以 word2vec 以及 ptt 資料為例。網址：https://www.etusolution.com/index.php/tw/news/blog/97-blog/technical-point-of-view/632-word2vec。上網日期：2016-10-15。
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69726	-
dc.description.abstract	隨著時代進步，資訊量越來越大，越來越多的中文文章與試題出現，如用過去的分類方式，人工成本太高，也會造成無法即時性的判斷試題。而本研究的目標是完成自動化分類中文試題的系統。以本研究為例，先利用機器學習各種演算法，並在正式系統中，能快速從眾多題目中以科目、章節甚至觀念分類，找尋學生要的試題，針對學生的學習狀況來給出不同的試題。目前機器學習應用的層面非常廣，用在語意分析的研究上也不少，但目前大部份研究還是偏向於英文字的深度學習，中文字的研究較少，然而準確應用在題目分析上是沒有相關研究的，所以本研究將會強調這部分。本論文主要是在各種機器學習演算法，來測試出最適當的演算法，來用在教育平台上，自動且精確的分類各科題目，各章節以及各觀念。本研究主要是探討四種機器學習演算法，對題目的觀念分析進行比較，所使用的方法有支持向量機、邏輯回歸、決策樹及隨機森林。首先將使用結巴分詞，維基百科語料庫訓練文字維度，然後開始進行題目中的機器學習，期望結果輸出為精準度高的題目分類。本研究結果，以支持向量機演算法用於中文試題分類為最佳化，此研究結果將可用在教育平台的自動分類中文試題上，幫助更多的試題分類能夠省去人力成本。	zh_TW
dc.description.abstract	With the progress of the times, the amount of information is growing, more and more Chinese articles and topics arise, as with previous classification, labor costs are too high, will not be making immediate judgments subject. The goal of this study is to complete the automated classification of Chinese questions system. In this study, for example, the first use of the machine to learn a variety of algorithms, and in the formal system, can quickly from a number of topics in the subjects, chapters or even the concept of classification, to find students to questions, for students to learn the situation to give different Questions. The current level of machine learning application is very wide, with a study on the semantic analysis of a lot, but most of the research was biased in favor of English words the depth of learning, few studies in the chinese text, however, accurate application in the topic analysis is no relevant research, so this research will emphasize in this section. This thesis is to study in a variety of machine algorithms to test the most appropriate algorithms to be used in education platform, automatically and accurately classified subjects topic, each chapter as well as the concept. This study was to investigate three types of neural algorithms to subject the concept of comparative analysis, the methods used are Support Vector Machine, Logistic Regression, Decision Tree and Random Forest. First use geiba word, wiki text corpus training dimension, and then make the title of the depth of learning, it is desirable for the high accuracy of the result output subject classification. The results of this study are based on the support vector machine algorithm for the classification of Chinese questions. The results of this study will be used in the automatic classification of Chinese education on the educational platform to help more problem classification to save labor costs.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T03:25:23Z (GMT). No. of bitstreams: 1 ntu-107-R03631038-1.pdf: 12011565 bytes, checksum: ff26a5bd040e46e08d3d809810807022 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	致謝 3 摘要 4 Abstract 5 圖目錄 9 表目錄 11 第一章緒論 12 1.1 研究動機 12 1.2 研究目的 12 第二章文獻探討 13 2.1 機器學習 13 2.1.1 機器學習架構 13 2.1.2 機器學習分類 14 2.2 語意分析 17 2.2.1自然語言處理-Word Vector 17 2.2.2 降維 18 2.2.3 實作並視覺化 19 2.3 訓練語料庫使用套件 20 2.3.1 Word2Vec 20 2.3.2 Jieba 22 2.4 機器學習原理介紹 24 2.4.1 機器學習範例 24 2.4.2 機器學習步驟 28 2.4.3 使用機器學習時機 28 2.5 Support Vector Machine演算法 28 2.5.1 Support Vector Machine理論 28 2.5.2 Kernel 30 2.6 邏輯回歸演算法 30 2.7 決策樹演算法 31 2.7.1決策樹介紹 31 2.7.2決策樹模型與學習 32 2.8 隨機森林演算法 33 2.9 scikit-learn 34 第三章研究方法 36 3.1 系統架構 36 3.2 文本前處理演算法 36 3.2.1 訓練語料庫 36 3.2.2 試題轉向量空間 37 3.3 機器學習演算法 40 3.3.1 訓練架構 40 3.3.2 SVM/決策樹/邏輯回歸/隨機森林演算法 44 3.4 測試系統 45 3.5 優化系統 47 3.6 網站架設 48 第四章實驗結果與討論 50 4.1不同維度之比較 50 4.1.1實驗數據 50 4.1.2分析討論 54 4.2不同window參數之結果 61 4.2.1實驗數據 61 4.2.1分析討論 63 4.3有無加停用字詞庫之結果 64 4.3.1實驗數據 64 4.3.2分析討論 64 4.4不同關鍵字詞數之結果 65 4.4.1實驗數據 65 4.4.2分析討論 67 4.5綜合討論最佳優化 67 第五章結論與建議 68 5.1 結論 68 5.2 建議 68 參考文獻 69
dc.language.iso	zh-TW
dc.subject	字詞轉向量套件	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	中文語意分析	zh_TW
dc.subject	教育平台	zh_TW
dc.subject	Software(Word2Vec)	en
dc.subject	Machine Learning	en
dc.subject	Support Vector Machine	en
dc.subject	Chinese Semantic Analysis	en
dc.subject	Education Platform	en
dc.title	機器學習應用於試題之標記與分類	zh_TW
dc.title	Application of Machine Learning to Marking and Classification of Test Problems	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張森富,吳剛智
dc.subject.keyword	機器學習,中文語意分析,教育平台,字詞轉向量套件,	zh_TW
dc.subject.keyword	Machine Learning,Support Vector Machine,Chinese Semantic Analysis,Education Platform,Software(Word2Vec),	en
dc.relation.page	70
dc.identifier.doi	10.6342/NTU201800822
dc.rights.note	有償授權
dc.date.accepted	2018-05-22
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	生物產業機電工程學研究所	zh_TW
顯示於系所單位：	生物機電工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	11.73 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。