利用廣義知網與維基百科產生單一選擇題之學習式架構

Min-Huang Chu; 朱民晃

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64151

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德(Shou-De Lin)
dc.contributor.author	Min-Huang Chu	en
dc.contributor.author	朱民晃	zh_TW
dc.date.accessioned	2021-06-16T17:32:16Z	-
dc.date.available	2013-08-19
dc.date.copyright	2012-08-19
dc.date.issued	2012
dc.date.submitted	2012-08-15
dc.identifier.citation	REFERENCE [1] M. Heilman and N. A. Smith. Extracting Simplified Statements for Factual Question Generation. In Proc. of the 3rd Workshop on Question Generation. 2010. [2] Rodney D. Nielsen, Jason Buckingham, Gary Knoll, Ben Marsh and Leysia Palen. A Taxonomy of Questions for Question Generation. Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, Virginia, September 25-26, 2008. [3] Takuyua Goto and et al. Automatic Generation System of Multiple-Choice Cloze Questions and its Evaluation. Knowledge Management & E-Learning: An International Journal, Vol.2, No.3. 2010. [4] Huang, Chih-Bin and et al. Computer assisted test-item generation for sentence reconstruction. Master Thesis. 2009. [5] Chao-Shainn Huang, et al. Using Linguistic Features to Classify Texts for Reading Comprehension Tests at the High School Levels. Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), Page 98-112. [6] Michael Heilman and Noah A. Smith. Good Question! Statistical Ranking for Question Generation. In Proc. of NAACL/HLT. 2010. [7] Ruslan Mitkov and Le An Ha. Computer-Aided Generation of Multiple-Choice Tests. HLT-NAACL-EDUC '03 Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing – Vol. 2 Pages 17-22. 2003 [8] Jonathan C. Brown, Gwen A Frishkoff and Maxine Eskenazi. Automatic question generation for vocabulary assessment. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 819–82. 2005. [9] Ming-Hsiung Ying and Heng-Li Yang. Computer-Aided Generation of Item Banks Based on Ontology and Bloom's Taxonomy. 2008 [10] Huang, Shu-Ling, You-Shan Chung, and Keh-Jiann Chen, 'E-HowNet: the Expansion of HowNet,' Proceedings of the First National HowNet workshop, pages 10-22, 2008. [11] Academia Sinica: A Chinese Word Segmentation System with Unknown Word Extraction and Pos Tagging. http://ckipsvr.iis.sinica.edu.tw/ [12] Roger Levy and Christopher D. Manning. “Is it harder to parse Chinese, or the Chinese Treebank?” ACL 2003, pp. 439-446. [13] Thomas Hofmann. Probabilistic Latent Semantic Analysis, UAI’99. [14] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm [15] M. Heilman and N. A. Smith. Extracting Simplified Statements for Factual Question Generation. In Proc. of the 3rd Workshop on Question Generation. 2010. [16] Agarwal, M. and Mannem, P. Automatic Gap-fill Question Generation from Text Books. Proc. of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications. 2011. [17] Ayako Hoshino and Hiroshi Nakagawa. A real-time multiple-choice question generation for language testing: a preliminary study. Proceeding EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP Pages 17-20, 2005. [18] Gruber, T. R. A Translation Approach to Portable Ontology Specifications. 1993. [19] Andreas Papasalouros. Automatic generation of multiple-choice questions from domain ontologies. International Conference e-Learning. 2008. [20] Mitkov, R. and Le An Ha. A computer-aided environment for generating multiple-choice test items. Natural Language Engineering 12 (2): 177–194. 2006. [21] Hu, Xiaohua and et al. Exploiting Wikipedia as External Knowledge for Document Clustering. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009. [22] Huang, Anna and et al.. Clustering Documents Using a Wikipedia-Based Concept Representation. Pacific Asia Knowledge Discovery and Data Mining. 2009. [23] Hofmann, Thomas. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning Volume 42, Numbers 1-2. 2001.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64151	-
dc.description.abstract	自動出題系統是建立e-Learning學習環境的一種重要工具，也教師減少出題考試的負擔。目前已經有許多文獻在探討自動出題的可行性，現有的文獻指出，自動出題系統的確可以減輕教師們出題的時間，並且在某些特殊的題型，例如英文克漏字測驗，已經可以達到作為課堂考試用的材料的程度。在本論文中，我們將利用知識本體的資料庫來建立一個能夠跨領域的自動出題系統，並且利用維基百科的條文補足知識本體資料庫本身的不足，來克服人工產生的知識本體資料庫本身的缺陷。本論文著重在單一選擇題的出題設計，利用機器學習的方法來判斷怎樣的句子包含事實的知識，找出適合出題的句子，同樣再利用機器學習的方法來找出適合挖空成為考題的詞，並且利用知識本體資料庫裡的語義定義以及知識結構，來取出意思相近的干擾項，若是我們要產生的干擾項的字詞並沒有預先被定義在知識本體的資料庫裡，將會查詢維基百科是否有該字詞的描述條文，若有則取回，同時取回所有在知識本體裡的字詞的維基百科描述，將相同字詞利用非監督式的分群方法歸類該字詞，再產生與其語意上相近之干擾項。因為每一個題目只能有一個正確答案，所以接下來我們要確認其他的干擾項是否也可以是正確答案，若是的話，我們要丟棄該干擾項並找其他的選項來取代。我們是利用Google 搜尋的結果數量當作檢查標準，當尋的結果數量超過某一個閥值或是比正確答案的選項高或在同個數量級時，便視該干擾項很可能也是正確答案，便會將其刪除。最後，我們將展示系統給做一般使用者測試，搜集實際答題紀錄，並做系統滿意度調查。在使用者測試的結果顯示，我們系統達到70.4%的可接受率。	zh_TW
dc.description.abstract	Automatic question generation is an important tool for e-Learning environment. It is also a useful tool for teachers to reduce workload of generating questions. There are some related works discussed the possibility of automatic question generation. From existing literature, it shows that an automatic question generation system can save lots of effort and time for teachers. Moreover, in some special type of test such as cloze test, it can output materials that can be used in real classes. In this paper, we use E-HowNet ontology database to construct a cross-domain automatic question generation system. In order to overcome the coverage limitation of ontology database, we retrieve description from Wikipedia to cover the missing words of E-HowNet ontology. This paper focuses on generating multiple-choice questions. We use machine learning based methods to decide which kinds of sentences contain factual knowledge. After identifying the suitable sentences, we apply machine learning methods again to identify which word phrases are suitable to become blank parts. Then, we use the semantic definitions of ontology database to choose close but not the same meaning words to become distractors. If we cannot find the words of blank parts, then we try to retrieve their description and also other existing ontology words from Wikipedia. By applying an unsupervised clustering method, we can discover the relationship between the missing words and those existing words. Therefore, we can use the relationship and combine the missing words with existing ontology database to help us retrieve suitable distractors. Because each question have only one correct answer, we need to verify whether other distractors could be the correct answer or not. If yes, then we should discard it and find another one. We retrieve the number of Google search results of the sentence which fills the correct answer back to the original sentence. Then, compare it with other distractors’ Google search results. If the number of results of a distractor exceeds a certain threshold or has the approximate number results to the number of the correct answer, we will assume it has a higher possibility that also could be a correct answer. If that is the case, we will discard it and choose another one. Finally, we conduct a user study to analyze the usability of our results. We invite some people to evaluate our system. The evaluation result indicates that our prototype system is feasible to generate multiple choice questions in 70.4% acceptable rate.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T17:32:16Z (GMT). No. of bitstreams: 1 ntu-101-R96943077-1.pdf: 995557 bytes, checksum: edbea32f3fd2391db12564a41555b9aa (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	CONTENTS 口試委員會審定書 # 誌謝 ii 中文摘要 iii ABSTRACT iv CONTENTS vi LIST OF FIGURES viii LIST OF TABLES ix Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Contributions 3 1.4 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Question generation system 5 2.2 Ontology database 6 2.3 Wikipedia clustering 7 Chapter 3 Methodology 9 3.1 System overview 9 3.2 Sentence classifier 10 3.3 Blank parts classifier 14 3.4 Distractors generation 15 3.4.1 Introduction of E-HowNet ontology database [10] 15 3.4.2 Generate distractor from E-HowNet ontology database 15 3.4.3 Cluster missing words via Wikipedia description 16 3.5 Unsuitable distractors filter 17 3.6 User evaluation 18 Chapter 4 Experiments 19 4.1 Data sets 19 4.1.1 十萬個為什麼 (A Hundred Thousand Whys) 19 4.1.2 Google query data 19 4.1.3 Wikipedia description 19 4.2 Results 20 4.2.1 Sentence classifier 20 4.2.2 Blank part classifier 21 4.3 User study 21 4.3.1 Testing theory 21 4.3.2 User evaluation 23 Chapter 5 Discussion and future work 26 Chapter 6 Conclusion 28 REFERENCE 29
dc.language.iso	en
dc.subject	干擾項	zh_TW
dc.subject	自動出題系統	zh_TW
dc.subject	單一選擇題	zh_TW
dc.subject	維基百科	zh_TW
dc.subject	知識本體	zh_TW
dc.subject	question generation system	en
dc.subject	Wikipedia	en
dc.subject	ontology	en
dc.subject	distractor	en
dc.subject	multiple choice question	en
dc.title	利用廣義知網與維基百科產生單一選擇題之學習式架構	zh_TW
dc.title	A learning-based framework to exploit E-HowNet ontology and Wikipedia sources to generate multiple choice questions	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林川傑,劉昭麟,禹良治
dc.subject.keyword	自動出題系統,單一選擇題,干擾項,知識本體,維基百科,	zh_TW
dc.subject.keyword	question generation system,multiple choice question,distractor,ontology,Wikipedia,	en
dc.relation.page	31
dc.rights.note	有償授權
dc.date.accepted	2012-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	972.22 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。