高覆蓋率中文關連樣式探勘以加速及完備知識圖譜之建立

Sheng-Lun Wei; 魏聖倫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48917

標題:	高覆蓋率中文關連樣式探勘以加速及完備知識圖譜之建立 Chinese Relation Patterns Mining with High Coverage for Knowledge Base Acceleration and Completion
作者:	Sheng-Lun Wei 魏聖倫
指導教授:	陳信希(Hsin-Hsi Chen)
關鍵字:	知識庫,知識庫加速,知識庫完備,關聯萃取,關連樣式,資料探勘, knowledge base, knowledge base acceleration, knowledge base completion, relation extraction, relation pattern, data mining,
出版年 :	2016
學位:	碩士
摘要:	近年來，隨著網際網路的迅速發展，人們可以透過不同的管道取得大量的資訊，例如：網路新聞、社群網路、部落格、論壇等。人們每天在網路上製造大量的資訊，其中有些資訊經過蒐集、整理、歸納後是值得被人們儲存並再次利用的。知識庫即是常用來儲存這類有用資訊的方式之一，人們多半使用結構化的方式來儲存，以利往後能夠更加方便的使用這些知識。然而，由於多數知識庫皆由人類編輯彙整，在這資訊爆炸的時代，資訊產生的量遠高於志願編輯者所能負擔，使得從事件發生到被新增到知識庫中會有一定程度的時間間隔。因此，如何有效的加速知識庫的建立將會是個重大的課題。關連樣式是人們常用來加速知識庫建立的方式，但除了英文之外，很少有其他語言的關連樣式資源可以讓人們使用。本研究提出一套建立高覆蓋率中文關連樣式庫的方式，以加速知識庫的建立以及知識庫的應用。本研究以DBpedia的實體特性作為依據，針對每個實體特性進行探勘，找出其對應的中文關連樣式。我們將詳細的說明每個步驟的實作細節，包含文本的前處理、實體範例擷取、以及關連樣式萃取共三個部分。此外，我們也會討論過程中可能出現的問題，以及這些問題的影響與解決方式。最後，本研究使用人工標記者去衡量中文關連樣式的效能，並討論不同因素對於關連樣式品質的影響。以往人們可以藉由應用英文關連樣式庫做相關的研究，其他語言因沒有較完整地關連樣式資源不得其門而入。如今，可藉由本研究產生之高覆蓋率中文關連樣式庫進行相同領域的研究，讓知識庫相關的研究能夠不只在英文領域發展，也同樣能在中文領域開啟一片天。此外，雖然本研究提出的方式主要是針對中文關連樣式的建立，但我們認為其他和中文有類似特性的語言，例如：日文、韓文，皆可嘗試使用本研究提出的方法來建立該語言專屬的關連樣式庫。 With the rapid development of the Internet in recent years, people can get infor-mation from it through different sources such as online news, social network, and fo-rums. A lot of information is created by people every day and some of them can be col-lected, comprehended, and turned into knowledge by human beings. Knowledge base is a way that people store those information with structural format. However, it’s hard to keep knowledge base up-to-date because of the wide gap between limited editors and numerous information of entities. Knowledge base acceleration is a critical issue which focus on accelerating the construction of knowledge base. In addition, relation patterns are useful for knowledge base acceleration. However, there are no resources available in languages beyond English. In this study, we present a workflow for building relation pattern extraction system with high coverage for knowledge base acceleration and knowledge base completion. Our properties is based on the properties in DBpedia knowledge base. We will discuss many details of our method including corpus pre-processing, instance retrieval, and pat-tern extraction. Finally, we evaluate our relation patterns by human annotators and dis-cuss features that may affect the performance of the relation patterns. With Chinese relation patterns, many related work can be utilized in Chinese by transferring from English environment to Chinese environment. Other languages may also use our method to build their own relation pattern resources.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48917
DOI:	10.6342/NTU201603377
全文授權:	有償授權
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	1.34 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。