Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51780
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor許永真(Jane Yung-jen Hsu)
dc.contributor.authorYu-Ju Chenen
dc.contributor.author陳昱儒zh_TW
dc.date.accessioned2021-06-15T13:49:24Z-
dc.date.available2015-12-01
dc.date.copyright2015-12-01
dc.date.issued2015
dc.date.submitted2015-10-23
dc.identifier.citation[1] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin. Learning from data. AML- Book, 2012.
[2] J. Amores. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence, 201:81–105, 2013.
[3] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In S. Becker, S. Thrun, and K. Obermayer, editors, Ad- vances in Neural Information Processing Systems 15, pages 577–584. MIT Press, 2003.
[4] N. Bach and S. Badaskar. A survey on relation extraction. Literature review for Language and Statistics II, 2007.
[5] T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific american, 284(5):28–37, 2001.
[6] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collabo- ratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. ACM, 2008.
[7] S. Brin. Extracting patterns and relations from the world wide web. In The World Wide Web and Databases, pages 172–183. Springer, 1999.
[8] R. Bunescu and R. Mooney. Learning to extract relations from the web using minimal supervision. In Annual meeting-association for Computational Linguistics, volume 45, page 576, 2007.
[9] R. C. Bunescu and R. J. Mooney. Multiple instance learning for sparse positive bags. In Proceedings of the 24th international conference on Machine learning, pages 105–112. ACM, 2007.
[10] W. Che, J. Jiang, Z. Su, Y. Pan, and T. Liu. Improved-edit-distance kernel for chinese relation extraction. In Proceedings of IJCNLP, pages 132–137, 2005.
[11] Y. Chen,Q. Zheng, and W. Zhang. Omni-word feature and soft constraint for chinese relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 572–581, 2014.
[12] R. Davis, H. Shrobe, and P. Szolovits. What is a knowledge representation? AI magazine, 14(1):17, 1993.
[13] G. Doran and S. Ray. A theoretical and empirical analysis of support vector machine methods for multiple-instance classification. Machine Learning, 97(1-2):79–102, 2014.
[14] R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 541–550. Association for Computational Linguistics, 2011.
[15] R. Huang, L. Sun, and Y. Feng. Study of kernel-based methods for chinese relation extraction. Information Retrieval Technology, pages 598–604, 2008.
[16] Y.-L. Kuo and J. Y.-j. Hsu. Resource-bounded crowdsourcing of commonsense knowledge. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 2470, 2011.
[17] W. Li, P. Zhang, F. Wei, Y. Hou, and Q. Lu. A novel feature-based approach to chinese entity relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 89–92. Association for Computational Linguistics, 2008.
[18] D. Liu, Z. Zhao, Y. Hu, and L. Qian. Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction. In Chinese Lexical Semantics, pages 11–21. Springer, 2013.
[19] H. Liu and P. Singh. Conceptnet—a practical commonsense reasoning tool-kit. BT technology journal, 22(4):211–226, 2004.
[20] O. L. Mangasarian and E. W. Wild. Multiple instance classification via successive linear programming. Journal of Optimization Theory and Applications, 137(3):555– 568, 2008.
[21] M. Minsky. A framework for representing knowledge. The Psychology of Computer Vision, pages 211–277, 1975.
[22] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, pages 1003–1011. Association for Computational Linguistics, 2009.
[23] D. L. Poole and A. K. Mackworth. Artificial Intelligence: foundations of computational agents. Cambridge University Press, 2010.
[24] L. Qiu and Y. Zhang. Zore: A syntax-based system for chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1870–1880. Association for Computational Linguistics, 2014.
[25] S. Ray and M. Craven. Supervised versus multiple instance learning: An empirical comparison. In Proceedings of the 22nd international conference on Machine learning, pages 697–704. ACM, 2005.
[26] S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In Machine Learning and Knowledge Discovery in Databases, pages 148–163. Springer, 2010.
[27] B. Schölkopf and A. J. Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press, 2002.
[28] R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17, 2004.
[29] M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465. Association for Computational Linguistics, 2012.
[30] L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1013–1023. Association for Computational Linguistics, 2010.
[31] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. The Journal of Machine Learning Research, 3:1083–1106, 2003.
[32] G. Zhou, J. Su, J. Zhang, and M. Zhang. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annual meeting on association for computational linguistics, pages 427–434. Association for Computational Linguistics, 2005.
[33] Z.-H. Zhou and J.-M. Xu. On the relation between multi-instance learning and semi- supervised learning. In Proceedings of the 24th international conference on Machine learning, pages 1167–1174. ACM, 2007.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51780-
dc.description.abstract「關係抽取」(Relation Extraction)意指從文本中學習有語意關係的詞對(Concept Pair),例如(台北,台灣)的關係是「...位於...」。此論文探討藉由關係抽取以擴增常識知識庫的方法。監督式學習是目前發展完整的方法之一,但是必須要有大量的標記資料才能達到好的效果。取而代之的是疏離監督式學習。疏離監督式學習是半監督式學習的一種,過去被用在無標記資料的關係抽取。針對知識庫中的某個關係,找出相關的詞對作為基礎,以此對大量未標記的文本自動做弱標記(Weakly Label),並作為訓練資料。這些詞對被預先標記關係,文本中提及這些詞對的句子會被自動標記與詞對相同的關係。此方法可以快速標記大量資料。但是當文本與知識庫的來源沒有關聯性時,標記的結果會很不可靠。
為了減輕錯誤標記造成的學習錯誤,我們在疏離監督式學習中加入多實例學習的假設。多實例學習的訓練資料必須為袋裝形式,用於學習二元分類。每一袋訓練資料都會有 +1 或 -1 標記。標記為 +1 的袋子中包含至少一個 +1 的實例;標記為 -1 的袋子只會有 -1 的實例。我們將提及同一種詞對的句子裝進袋中,並使用多實例學習對未知的袋子做分類。
我們以語意網(ConceptNet)作為標記基礎,中研院平衡語料庫的文本當作訓練資料,實作中文關係抽取的實驗,並比較單實例學習與多種多實例學習演算法的實驗結果。該實驗從文中抽取下列四種關係的詞對: AtLocation , CapableOf , HasProperty ,及 IsA 。 這個研究證實了我們的方法能夠藉由其他語料改進知識庫。
zh_TW
dc.description.abstractThis thesis investigates relation extraction, which learns semantic relations of concept pairs from text, as an approach to mining commonsense knowledge. To achieve good performance, state-of-the-art supervised learning requires a large labeled training set, which is often expensive to prepare. As an alternative, distant supervision, a semi-supervised learning method, was adopted to extract relations from unlabeled corpora. A training set consisting of a large amount of sentences can be weakly labeled automatically based on a set of concept pairs for any given relation in a knowledge base.
Labels generated with heuristics can be quite noisy. When the sources of sentences in the training set are not correlated with the knowledge base, the automatic labeling mechanism is unreliable. Instead of assuming all sentences are labeled correctly in the training set, multiple instance learning learns from bags of instances, provided that each positive bag contains at least one positive instance while negative bags contain only negative instances.
We conducted experiments on relation extraction in Chinese using concept pairs in ConceptNet, a commonsense knowledge base, as the seeds for labeling a set of predefined relations. The training bags were generated from the Sinica Corpus. The performance of multiple instance learning is compared with single-instance learning and a few other learning algorithms. Our experiments extracted new pairs for relations “AtLocation”, “CapableOf”, “HasProperty” and “IsA”. This study showed that a knowledge base can be improved by another corpus using the proposed approach.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T13:49:24Z (GMT). No. of bitstreams: 1
ntu-104-R01922049-1.pdf: 1518099 bytes, checksum: 070e11d39d34366490889cd928690702 (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents口試委員會審定書 iii
誌謝 v
摘要 vii
Abstract ix
1 Introduction 1
1.1 Motivation.................................. 1
1.1.1 Knowledge in Natural Language.................. 2
1.1.2 Situation of Chinese ConceptNet.................. 2
1.2 Problem Description ............................ 2
1.3 Proposed Solution.............................. 3
1.4 Thesis Organization............................. 4
2 Background 5
2.1 Knowledge ................................. 5
2.1.1 Knowledge Representation..................... 5
2.1.2 Knowledge Extraction ....................... 6
2.2 Related Work of Relation Extraction.................... 6
2.2.1 Supervised Learning ........................ 7
2.2.2 Distant Supervision......................... 9
2.2.3 Multiple Instance Learning..................... 9
2.3 Relation Extraction in Chinese ....................... 10
2.3.1 Characteristics in Chinese Relation Extraction . . . . . . . . . . . 10
2.3.2 Related Work in Chinese...................... 12
3 Methodology 13
3.1 Problem Definition ............................. 13
3.1.1 Notations.............................. 13
3.1.2 Relation Extraction Problem.................... 14
3.2 Framework ................................. 14
3.2.1 Bag Generator ........................... 16
3.2.2 Relation Predictor ......................... 16
3.2.3 Pair Evaluator............................ 18
3.3 Features of Data............................... 18
3.4 Assistant Labelling ............................. 21
3.5 Multiple Instance Learning......................... 22
3.5.1 Single Instance Learning: a Naive Approach . . . . . . . . . . . 22
3.5.2 Semi-Supervised Approach .................... 22
3.5.3 Multiple-Instance Classification Algorithm . . . . . . . . . . . . 23
3.5.4 Support Vector Machine for Multiple Instance Learning . . . . . 24
3.5.5 Multiple Instance Learning for Sparse Positive Bags . . . . . . . 25
3.6 Evaluation Process ............................. 26
4 Experiment and Result 27
4.1 Dataset ................................... 27
4.1.1 ConceptNet............................. 27
4.1.2 Sinica Corpus............................ 29
4.2 Experiment Setting ............................. 30
4.2.1 Multiple Relations ......................... 30
4.2.2 Bag Size Selection ......................... 32
4.2.3 Multiple Instance Learning..................... 34
4.2.4 Experiment Description ...................... 35
4.3 Evaluation.................................. 36
4.4 Result and Discussion............................ 37
4.4.1 Parameter Selection ........................ 37
4.4.2 System Evaluation ......................... 39
5 Conclusion 45
5.1 Summary and Contribution......................... 45
5.2 Future Work................................. 46
Bibliography 47
A List of CKIP Part-Of-Speech Tag 51
B List of Relations in Chinese ConceptNet 55
dc.language.isoen
dc.subject知識庫zh_TW
dc.subject關係抽取zh_TW
dc.subject多實例學習zh_TW
dc.subjectKnowledge Baseen
dc.subjectRelation Extractionen
dc.subjectMultiple Instance Learningen
dc.title半監督式學習於中文關係抽取以擴充知識庫之研究zh_TW
dc.titleChinese Relation Extraction by Semi-Supervised Learning for Knowledge Base Expansionen
dc.typeThesis
dc.date.schoolyear104-1
dc.description.degree碩士
dc.contributor.oralexamcommittee陳信希(Hsin-Hsi Chen),李育杰(Yuh-Jye Lee),張嘉惠(Chia-Hui Chang),蔡宗翰(Richard Tzong-Han Tsai)
dc.subject.keyword關係抽取,多實例學習,知識庫,zh_TW
dc.subject.keywordRelation Extraction,Multiple Instance Learning,Knowledge Base,en
dc.relation.page56
dc.rights.note有償授權
dc.date.accepted2015-10-23
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-104-1.pdf
  Restricted Access
1.48 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved