從文獻中擷取Metadata

Zong-Xun Yang; 楊宗勳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39433

標題:	從文獻中擷取Metadata Extract Metadata From Literature
作者:	Zong-Xun Yang 楊宗勳
指導教授:	翁昭旼
關鍵字:	metadata,word clustering,hidden markov model,
出版年 :	2004
學位:	碩士
摘要:	隨著網路的蓬勃發展，電子化文獻快速地傳播，無論是發表或取得都非常方便，這樣的現象使文獻大量地增加，但文獻大多散亂在無涯無際的網路世界，使得找尋相關文獻成為一件耗時費力的事。若有一套能夠將網路上互相關聯的文獻組織起來的系統，就能輕而易舉地查詢到相關參考文獻，這是使用者的一大福音。本文主要探討文獻Header和reference的內容，因為這兩個部分能給我們大量文獻的基本資訊，如標題、作者、出版商與出版日期等等，這些資訊非常適合用來整理文獻，它們提供我們能以各種不同的維度方向去觀察並做分類分群與搜尋。要整理文獻，首要的工作就是要整理出文獻的Metadata。本研究的工作就是要將非結構化的文獻資料整理成具結構化的資料並賦予其意義。工作內容共分成三階段：第一階段先分析文字的特徵，並依據特徵對文字做分群。第二階段將分群好的文字以Machine Learning的演算法將其適當的分段並給予合乎其意義的Metadata。最後再將這些有意義的結構化資料存入資料庫，以方便將來再使用。 Along with the network vigorous development, the electronic literature rapidly disseminates. It is very convenient to issue and obtain extremely. Such phenomenon makes the literature massively increase. The matter which literatures scattered in disorder in networks causes researchers consume time to search relevant articles. If we have a system which can organize relevant literature in networks, it is easy to query relevant references. It is a great good news to users. This article probes into Header and Reference in literatures mainly, because these two parts can give us a large number of basic information about literature, like title, author, publisher and publication date and so on. These information extremely suitably use for to reorganize the literature. They provide us to be able to observe, search and to make the classification by each kind of different dimension. To organize literature, the primary work is to organize the Metadata of literature. This research work is to have the non- structured literature change into the structured data and entrusts with its meanings. The work is divided into three stages：Analyse the feature of the token and make a cluster according to features at the first stage. At the second stage, clustering token will be segmented suitably with algorithms of Machine Learning and extract Metadata from segmented tokens. Finally, we will store these meaningful structured data into database in order to facilitate them in the future.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39433
全文授權:	有償授權
顯示於系所單位：	醫學工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-93-1.pdf 未授權公開取用	401.22 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。