解決常識知識庫中基於句型模板的知識獲取法所遭遇的關係歧義之研究

Yu-An Tai; 戴佑安

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70796

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許永真(Jane Yung-jen Hsu)
dc.contributor.author	Yu-An Tai	en
dc.contributor.author	戴佑安	zh_TW
dc.date.accessioned	2021-06-17T04:38:49Z	-
dc.date.available	2018-08-08
dc.date.copyright	2018-08-08
dc.date.issued	2018
dc.date.submitted	2018-08-07
dc.identifier.citation	[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722–735. Springer, 2007. [2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. AcM, 2008. [3] F. Bond and R. Foster. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1352–1362, 2013. [4] C. Havasi, R. Speer, and J. Alonso. Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent advances in natural language processing, pages 27–29. Citeseer, 2007. [5] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [6] C.-R. Huang, S.-K. Hsieh, J.-F. Hong, Y.-Z. Chen, I.-L. Su, Y.-X. Chen, and S.-W. Huang. Chinese wordnet: Design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2):14–23, 2010. [7] Y.-l. Kuo, J.-C. Lee, K.-y. Chiang, R. Wang, E. Shen, C.-w. Chan, and J. Y.-j. Hsu. Community-based game design: experiments on social games for commonsense data collection. In Proceedings of the acm sigkdd workshop on human computation, pages 15–22. ACM, 2009. [8] D. B. Lenat. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33–38, 1995. [9] H.Lieberman, D.Smith, andA.Teeters. Common consensus: a web-based game for collecting commonsense goals. In ACM Workshop on Common Sense for Intelligent Interfaces, 2007. [10] H. Liu and P. Singh. Conceptnet-a practical commonsense reasoning tool-kit. BT technology journal, 22(4):211–226, 2004. [11] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [12] G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. [13] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. [14] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense: Knowledge acquisition from the general public. In OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pages 1223–1237. Springer, 2002. [15] A. Singhal. Introducing the knowledge graph: things, not strings. Official google blog, 2012. [16] R. Speer, J. Chin, and C. Havasi. Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI, pages 4444–4451, 2017. [17] R. Speer and C. Havasi. Representing general relational knowledge in conceptnet 5. In LREC, pages 3679–3686, 2012. [18] L. Von Ahn, M. Kedia, and M. Blum. Verbosity: a game for collecting common-sense facts. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 75–78. ACM, 2006. [19] M. E. Winston, R. Chaffin, and D. Herrmann. A taxonomy of part-whole relations. Cognitive science, 11(4):417–444, 1987.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70796	-
dc.description.abstract	智慧系統需要常識知識使其更能應付使用者的各種狀況，利用句型模板獲取知識是現在常見的方法之一，已有許多研究能夠透過句型模板從一般大眾身上收集到常識資料。這種方法為了降低收集的難度，在建構時使用了自然語言中的常用詞寫成的句型模板來對應特定的關係知識。但常用詞往往是帶有歧義的，不論是詞本身帶有不同的詞義，或是使用時出現借用、推廣、省略等非正式的用法。在這些情況下可能會收到預期之外的內容，而被系統錯誤地解讀。因此，本研究提出一種方法，藉由詞嵌入的幫助訓練長短期記憶模型分類器，能在不變更原本知識獲取系統的架構下，將原系統無法正確解讀的資料重新分配至正確的關係類別。並以中文概念網為例，本方法在實驗資料集中能將原系統的平均錯誤率從 21.50% 降到 12.66% ，在單一句型上更能將錯誤率從 22.0% 降到 5.0%，有效提升了常識知識庫的品質。	zh_TW
dc.description.abstract	Intelligent systems require commonsense knowledge to make them capable of handling various situations. Template-based knowledge acquisition is one of the most common approaches to commonsense knowledge bases construction. Numerous research have collected commonsense data from the general public through templates. In order to reduce the difficulty of collection, the designers of knowledge acquisition systems are more likely to use the templates written in common words. However, common words are often polysemous or used in informal usages. These factors will cause the systems receive some unexpected content and misinterpreted it. Therefore, this study proposes a method for training relation classifiers by LSTM models with word embedding. The classifiers can re-distribute data that cannot be correctly interpreted by the system to the correct relation without changing the system structure. Taking Chinese ConceptNet as an example, this study shows the proposed method can reduce average error rate from 21.50% to 12.66% in the experimental dataset. For a single template, it at most reduces the error rate from 22.0% to 5.0% . The method effectively improves the quality of the commonsense knowledge base.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:38:49Z (GMT). No. of bitstreams: 1 ntu-107-R03944035-1.pdf: 1609573 bytes, checksum: edb2adfc60078476f9245ba290212d61 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv 1 緒論 1 1.1 動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 智慧系統需要常識 . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 基於句型模板的知識獲取法所遭遇的關係歧義問題 . . . . . . 2 1.2 問題定義 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Template-Relation Disambiguation Problem . . . . . . . . . . . . 3 1.3 方法提案 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 論文架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 問題背景 5 2.1 常識知識庫的建構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 基於句型模板的知識獲取 . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 語意網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 句型法的運作原理 . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.3 句型法的優點 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 其他常識知識庫相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 詞彙網路 (WordNet) . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 維基百科 (Wikipedia) . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.3 谷歌知識圖譜 (Google Knowledge Graph) . . . . . . . . . . . . 11 3 輔助關係分類器 13 3.1 資料預處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 資料標記 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 格式調整 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.3 預訓練詞嵌入 (Pre-trained Word Embedding) . . . . . . . . . . 15 3.2 訓練分類器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 流程範例 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 實驗與評估 18 4.1 實驗資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.1 中文概念網 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.2 實驗對象挑選及調整 . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.1 資料標記 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 斷詞工具 (segmentation / tokenization) . . . . . . . . . . . . . . 21 4.2.3 詞嵌入 (word embedding) . . . . . . . . . . . . . . . . . . . . . 21 4.2.4 神經網路模型實作 . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 評估方式 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 實驗一：詞嵌入與字嵌入的比較 . . . . . . . . . . . . . . . . . . . . . 22 4.4.1 實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4.2 結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.5 實驗二：實作在各句型模板上的效果及討論 . . . . . . . . . . . . . . 24 4.5.1 實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5.2 結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.6 實驗三：半監督式學習 (Semi-supervised Learning, SSL) . . . . . . . . 34 4.6.1 實作方式 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.6.2 結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5 結論 36 5.1 貢獻概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 研究限制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Bibliography 38
dc.language.iso	zh-TW
dc.title	解決常識知識庫中基於句型模板的知識獲取法所遭遇的關係歧義之研究	zh_TW
dc.title	Resolving Relation Ambiguity of Template-Based Knowledge Acquisition for Commonsense Knowledge Bases	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蔡宗翰(Richard Tzong-Han Tsai),黃乾綱(Chien-Kang Huang),謝舒凱(Shu-Kai Hsieh),馬偉雲(Wei-Yun Ma)
dc.subject.keyword	常識知識,知識獲取,關係,句型模板,群眾外包,消歧義,中文概念網,	zh_TW
dc.subject.keyword	Commonsense Knowledge,Knowledge Acquisition,Relation,Template,Crowdsorcing,Disambiguation,Chinese ConceptNet,	en
dc.relation.page	40
dc.identifier.doi	10.6342/NTU201802563
dc.rights.note	有償授權
dc.date.accepted	2018-08-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 目前未授權公開取用	1.57 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。