Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5919
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Jane Yung-jen Hsu)
dc.contributor.authorLee-Heng Maen
dc.contributor.author麻立恒zh_TW
dc.date.accessioned2021-05-16T16:18:21Z-
dc.date.available2014-09-12
dc.date.available2021-05-16T16:18:21Z-
dc.date.copyright2013-09-12
dc.date.issued2013
dc.date.submitted2013-08-15
dc.identifier.citation[1] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (COLT 1998). ACM, 1998.
[2] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010). AAAI, 2010.
[3] A.Carlson,J.Betteridge,R.C.Wang,E.R.H.Jr.,andT.M.Mitchell.Coupledsemi- supervised learning for information extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010). WSDM, 2010.
[4] J. R. Curran, T. Murphy, and B. Scholz. Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the 10th Conference of the Pacific Asso- ciation for Computational Linguistics (PACL 2007), 2007.
[5] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (OSDI 2004). USENIX Association, 2004.
[6] O.Etzioni,M.Cafarella,D.Downey,A.-M.Popescu,T.Shaked,S.Soderland,D.S. Weld, and A. Yates. Methods for domain-independent information extraction from the web: An experimental comparison. In Proceedings of the 19th national confer- ence on Artifical intelligence (AAAI 2004). AAAI, 2004.
[7] W. Foundation. Wikipedia, the free encyclopedia, 2013. http://www.wikipedia. org/.
[8] W. Foundation. Wiktionary, the free dictionary, 2013. http://www.wiktionary.org.
[9] M. M. Group. Internet world stats, 2012. http://www.internetworldstats.com/
stats.htm.
[10] G. Inc. Compact language detector library (cld), 2013. http://code.google.com/p/
chromium-compact-language-detector/.
[11] D. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communica-
tions of the ACM, 38:33–38, 1995.
[12] H. Liu and P. Singh. Conceptnet: A practical commonsense reasoning toolkit. BT
Technology Journal, 22:211–226, 2004.
[13] G. A. Miller. Wordnet: A lexical database for english. Communications of the ACM,
38:39–41, 1995.
[14] H. Nakamura. Radix tree naive implementation of radix tree for ruby, 2013. https:
//github.com/nahi/radixtree.
[15] F. U. of Berlin, the University of Leipzig, and O. Software. Dbpedia, 2013. http:
//dbpedia.org/.
[16] D. of Computer and U. o. P. Information Science. Chinese language processing at
penn, 2013. http://www.cis.upenn.edu/~chinese/.
[17] U. of Washington’s Turing Center. Reverb: Open information extraction software,
2013. http://reverb.cs.washington.edu/.
[18] T. P. M. C. (PMC). Icu - international components for unicode, 2013. http://site.
icu-project.org/.
[19] T. L. Project. The clueweb09 dataset, 2013. http://lemurproject.org/clueweb09/.
[20] M. Technologies. Freebase, 2013. http://www.freebase.com/.
[21] S. University. The stanford natural language processing group, 2013. http://
www-nlp.stanford.edu/.
[22] L.vonAhn,M.Kedia,andM.Blum.Verbosity:agameforcollectingcommon-sense facts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (SIGCHI 2006). ACM, 2006.
[23] R.C.WangandW.W.Cohen.Language-independentsetexpansionofnamedentities using the web. In Proceedings of IEEE International Conference on Data Mining (ICDM 2007). ICDM, 2007.
[24] 郭家寶 (BYVoid). Open chinese convert (opencc) 開放中文轉換, 2013. http://code. google.com/p/opencc/.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5919-
dc.description.abstract一個豐富的知識庫對於具有人工智慧的系統有很大的幫助,但是建立一個完整的知識庫卻需要花費無數的人力和時間。
在「自動化的知識收集與萃取」這個領域,Never Ending Language Learning (NELL) 做了一個很好的示範,但是它在中文語言處理上的能力有限。
本論文提出一個自動化中文知識萃取系統,我們發現在中文語句中,同一個類別的名詞常會和某些特定的動詞一起出現,我們利用這些動詞建立模版,來找到更多相同類別的名詞。
我們結合 NELL 下的跨語言知識蒐集系統,以提高整體的正確率。
最後,實驗證明我們的系統可以承載大規模的自動化中文知識蒐集。
zh_TW
dc.description.abstractRobust intelligent applications benefit from rich knowledge bases.
Building a rich and complete knowledge base is a time-comsuming and labor-intensive task.
Never Ending Language Learning (NELL) is a great demonstration for large-scale automatic knowledge extraction, but unfortunately some components in NELL are not suitable to deal with Chinese.
This thesis presents a Coupled Chinese Pattern Learner (CCPL), which extracts knowledge by textual patterns on relationships between nouns and verbs in Chinese sentences.
We also implement Coupled Set Expander for Any Language (CSEAL) to collaborate with CCPL.
The experiments show our system is capable of large-scale learning, and preserves high accuracy in automatic extraction for Chinese knowledge.
en
dc.description.provenanceMade available in DSpace on 2021-05-16T16:18:21Z (GMT). No. of bitstreams: 1
ntu-102-R99944037-1.pdf: 2543840 bytes, checksum: 9c80234a7718923c0f2874b016ec820b (MD5)
Previous issue date: 2013
en
dc.description.tableofcontents1 Introduction 1
1.1 Motivation.................................. 2
1.2 Problem Definition ............................. 3
1.2.1 Assumption............................. 3
1.2.2 Knowledge Collection Problem .................. 3
1.3 Proposed Solution.............................. 4
2 Related Work 7
2.1 Commonsense Knowledge Collection ................... 8
2.2 Text Mining from the Web ......................... 9
2.2.1 KnowItAll ............................. 9
2.2.2 Set Expander for any Language .................. 10
2.2.3 Never Ending Language Learner.................. 10
3 Framework 15
3.1 Central Ideals From NELL ......................... 16
3.1.1 Never Ending Learning....................... 16
3.1.2 Collaboration of Learners ..................... 16
3.2 ChNELL .................................. 16
3.2.1 System Behavior .......................... 17
3.2.2 System Architecture ........................ 17
4 Coupled Chinese Pattern Learner 19
4.1 Bootstrapped Learning ........................... 20
4.1.1 Semantic Drift ........................... 21
4.1.2 Coupled Constraints ........................ 22
4.2 Concepts Extraction............................. 25
4.2.1 Difficulties of Chinese Concept Learning ............. 26
4.2.2 Valid Instanceand Pattern ..................... 27
4.3 Concepts Selection ............................. 31
4.3.1 Filtering............................... 31
4.3.2 Ranking............................... 34
4.3.3 Promotion.............................. 36
4.4 CCPL Algorithm .............................. 36
5 Coupled Set Expander for Any Language 39
5.1 SEAL.................................... 40
5.1.1 Radix Tree ............................. 43
5.1.2 Ranking of SEAL.......................... 43
5.2 CSEAL ................................... 44
5.2.1 Relation Extraction......................... 44
5.2.2 More Constraints.......................... 45
5.2.3 Ranking Candidates ........................ 45
5.2.4 Querying Search Engine ...................... 46
5.3 CSEAL Algorithm ............................. 46
5.4 ChNELL Algorithm............................. 46
5.5 Experimental Evaluation .......................... 48
5.5.1 Ontology .............................. 48
5.5.2 Corpus ............................... 48
5.5.3 Configuration............................ 51
5.5.4 Result................................ 52
5.5.5 Discussion ............................. 55
6 Scalability 59
6.1 Introduction to Clueweb .......................... 60
6.2 Parallelization................................ 61
6.2.1 MapReduce............................. 61
6.2.2 Multi-Level MapReduce...................... 62
6.3 Experimental Evaluation .......................... 63
6.3.1 Ontology,Corpus and Configuration................ 63
6.3.2 Result................................ 64
6.3.3 Discussion ............................. 66
7 Conclusion and Future Work 69
7.1 Conclusion ................................. 70
7.2 Future Work................................. 70
Appendix A: Ontology 77
Appendix B: Result from ChNELL 83
Appendix C: Chinese-English Mapping Table 89
dc.language.isoen
dc.subject半監督式學習zh_TW
dc.subject知識擷取zh_TW
dc.subject文字探勘zh_TW
dc.subject機器學習zh_TW
dc.subjectText Miningen
dc.subjectSemi-Supervised Learningen
dc.subjectMachine Learningen
dc.subjectKnowledge Extractionen
dc.title多條件耦合之半監督式學習於中文知識擷取之研究zh_TW
dc.titleCoupled Semi-Supervised Learning for Chinese Knowledge Extractionen
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳信希(Hsin-Hsi Chen),劉昭麟(Chao-Lin Liu),張嘉惠(Chia-Hui Chang),蔡宗翰(Richard Tzong-Han Tsai)
dc.subject.keyword知識擷取,文字探勘,機器學習,半監督式學習,zh_TW
dc.subject.keywordKnowledge Extraction,Text Mining,Machine Learning,Semi-Supervised Learning,en
dc.relation.page90
dc.rights.note同意授權(全球公開)
dc.date.accepted2013-08-15
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf2.48 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved