請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40801
標題: | 台灣古契書自動分類與依分類定義契書角色 Automated Classification of Taiwanese Land Deeds |
作者: | Chia-Ching Lu 盧家慶 |
指導教授: | 項潔(Jieh Hsiang) |
關鍵字: | 臺灣歷史,契書,分類,詮釋資料,數位典藏, Taiwan History,land deeds,category,metadata,digital archives, |
出版年 : | 2008 |
學位: | 碩士 |
摘要: | 台灣古契書是反映民間社會生活的第一手資料,同時也是研究臺灣歷史最重要的第一手資料。蒐集古契書並進行數位典藏除了可以保存契書資料外,也能讓我們透過蒐集的契書資料來瞭解清代臺灣地權轉移與開發史。
由臺灣大學資訊工程所數位典藏與自動推論實驗室和臺灣大學圖書館合作建置的臺灣歷史數位圖書館(Taiwan History Digital Library, THDL)是一個全文數位圖書館,在古契書方面目前已收集由國立台中圖書館及國立台灣大學圖書館所數位化的契書全文共21,399件,其中有21,121件契書具詮釋資料(metadata),其契書來源包括已刊印古契書、臺灣總督府檔案、岸裡大社、新竹北門鄭家、北市文獻會、台大南部古契書等資料群。面對如此龐大的契書資料需要一套好的分類方法讓使用者對整體契書資料能快速地瞭解,並能透過分類有效地使用契書資料。 本研究嘗試利用各數位化單位已經建置完成的詮釋資料來對各古契書資料群進行一致的自動分類。在各資料群詮釋資料中僅有描述契書性質的欄位而沒有精確的分類欄位,且描述性質的標準不一致。我們先參考各專家對古契書建議的分類方法決定了一個初始的分類架構,接著找出各詮釋資料中相當於”契書性質分類”的欄位、搭配每篇古契書的標題,將一篇篇古契書自動對應到上述分類架構中的某一分類。最後為特定分類重新賦予契書關係人物一致的角色。 將前述的自動分類方法與特定分類下角色賦予應用在THDL中21,121件具詮釋資料的契書上,可以將20,698件成功分類,而有423件契書需要經由人工處理分類。同時也發現到在原有14個分類外還可以新增租穀與契尾兩個類別。至於角色賦予由於成果不彰,需重新找尋適合的解決方法,比如說以詮釋資料搭配契書全文的方式。 Before the modernization of land administration by the Japanese during their occupation of Taiwan (between 1895 and 1945), hand-written land deeds are the only proof of the transaction or leasing of land. Land deeds are thus an important source of primary documents for studying Taiwanese society before 1895. Collaborating with the National Taiwan University Library, the Digital Archives Laboratory of the Department of Computer Science of NTU built a full-text digital library of primary historical documents, the Taiwan History Digital Library (THDL), which includes, among other things, 21,399 land deeds in searchable full-text. We believe that it is the largest data base of its kind in existence. In order to provide a better understanding of the contents and make them easier to use, we attempt, in this thesis, to categorize the collection. The difficulty arises from the fact that the land deeds in THDL came from different sources. Although most of them (21,121) also contain metadata, they were produced by different people using different standards. Thus, one cannot classify them easily using the descriptions provided in the metadata. We first studied existing classification scheme and chose one, which classified land deeds into 14 categories, that seems most suitable for our purpose. (To simplify the task, we only considered those with metadata.) We then designed an algorithm that, takes each collection, re-classified its content according to the 14 categories. Our method successfully classified 20,698 of the land deeds. The remaining 423 required examination by experts. We also discovered that two more categories, zugu (租榖) – rental charges in rice, and qiwei (契尾) – official certification for transaction of land, could be added to better capture the nature of the land deeds. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40801 |
全文授權: | 有償授權 |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-97-1.pdf 目前未授權公開取用 | 5.93 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。