機器輔助控制詞彙索引之研究

伍健廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76346

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	伍健廷	zh_TW
dc.date.accessioned	2021-07-01T08:20:31Z	-
dc.date.available	2021-07-01T08:20:31Z	-
dc.date.issued	1998
dc.identifier.citation	中文部分（一）圖書何光國。圖書資訊組織原理。臺北市：三民，民79年。陳淑美。「財經新聞自動分類之研究」。國立臺灣大學圖書館學研究所，碩士論文，民國84年6月。黃雲龍。「中文全文檔群集索引理論研究－向量空間模型(Vector-Space Model)的建構」。國立臺灣大學商學研究所，博士論文，民國86年6月。黃慕萱。資訊檢索。臺北市：臺灣學生，民85年。黃慕萱。資訊檢索中「相關」概念之研究。臺北市：臺灣學生，民85年。顧皓光。「網路文件自動分類」。國立臺灣大學資訊管理研究所，碩士論文，民國86年6月。（二）期刊陳光華。「電子文獻主題之自動辨識」。中國圖書館學會會報第59期（民國86年12月），頁43-58。陳光華；陳信希。「檔內容分析－語料庫為本的模型」。圖畫館學刊第11期（民國85年12月），頁95-112。陳佳君。「從知識結構探討主題分析」。書府第16期（民國84年6月），頁30-48。陳昭珍。「主題索引問題初探」。ASIS臺北學生分會會訊（民國81年6月），頁14-35。陳昭珍。「主題索引理論之探討（上）」。書農第9期（民國81年12月），頁21-27。黃雲龍。「中文全文檔群集索引理論研究與實證」。圖書與資訊學刊第24期（民國87年2月），頁44-68。曾元顯。「關鍵詞自動擷取技術與相關詞回饋」。中國圖書館學會會報第59期（民國86年12月），頁59-64。張清正。「索引、索引法、索引自動化」。書農第9期（民國年月）,頁88-94。蔡明月。「論線上目錄之主題檢索」。教育資料與圖書館學第33卷第l期（民國84年月），頁53-67。二、英文部分 (一)圖書 Borko, Harold and Charles L. Bernier. Indexing Concepts and Methods. New York: Academic Press, Inc., 1978. Cleveland, Donald B. Introduction to Indexing and Abstracting. Littleton, Colorado: Libraries Unlimited, Inc., 1983. Meadow, Charles T. Text Information Retrieval Systems. San Diego: Academic Press, 1992. Dym, Eleanor D., ed. Subject and Information Analysis. New York: Marcel Dekker, Inc., 1985. Salton, Gerard. Automatic Text Processing: the transformation. analysis, and retrieval of information by computer. New York: Addison-Wesley Publishing Company, Inc., 1989. Salton, Gerard and Michael J. McGill Introduction to Modern Information Retrieval. New York: McGraw-Hill, Inc., 1983. Van Rijsbergen, C. J. Information Retrieval. London: Butterworth & CO Ltd, 1975. (二)期刊 Burgin, Robert and Dillon, Martin. Improving disambiguation in FASIT, Journal of the American Society for Information Science 43:2 (March 1992): 101-114. Clarke, D. C. and Bennett, J. L. An experimental framework for observing the indexing process, Journal of the American Society for Information Science 24:1 (January/February 1973): 9-24. Ciganik, Marek. Metainformational [sic] in action in the process of the automatic semantic analysis, Information Processing & Management 15:4(1979): 195-203. Chang, Jyun-sheng, Tseng, Tsung-yih, Cheng, Ying, Chen , Hueychyun, Cheng, Shun-der, Ker, Sur-jin and Liu, John S., A corpus-based statistical approach to automatic book indexing, Proceedings of the Third Conference on Applied Natural Language (Italy, Trento: Association for Computational Linguistics, 1992), 147-151. Chen, Kuang-hua. Topic indentification in dascourse, Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics (Ireland, Dublin: Association for Computational Linguistics, 1995), 267-271. Cheong, T. L. and Lip T. S. A statistical approach to automatic text extraction, Asian Libraries 3:1 (March 1993): 46-54. Cohen, Jonathan D. Highlights: language- and domain-independent automatic indexing terms for abstracting, Journal of the American Society for Information Science 46:3(April 1995): 162-74. Dillon, Martin and Grar, Ann S. FASIT: a fully automatic syntactically based indexing system, Journal of the American Society for Information Science 34:2(1983): 99-108. Dillon, Martin and McDonald, Laura K. Fully automatic book indexing, Journal of Documentation 39:3 (September 1983): 135-154. Dillon, Martin. Thesaurus-based automatic book indexing, Information Processing & Management 18:4(1982): 167-178. Fagan, Joel L. The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval, Journal of the American Society for Information Science 40:2 (March 1989): 115-132. Ginsberg, A. A unified approach to automatic indexing and information retrieval, IEEE Expert 8(1993): 46-46. Garfield, E. The relationship between mechanical indexing, structural linguistics and information retrieval, Interlending and Document Supply 18:5(1992): 343-354. Harter, Stephen P. A probabilistic approach to automatic keyword indexing, Journal of the American Society for Information Science 26:4(September/October 1975): 280-289. Hoppe, Alfred. Communicative grammar and machine-assisted text contents analysis, International Classification 11:1(1984): 9-12. Humphrey, Susanne M. and Miller, Nancy E. Knowledge-based indexing of the medical literature: the indexing aid project, Journal of the American Society for Information Science 38:3(1987): 184-196. Jones, Kevin P. Toward a theory of indexing [Documentation notes], Journal of Documentation 32:2(June 1976): 118-125. Jones, Leslie P., Gassie, Edward W. and Radhakrishnan, Sridhar. INDEX: the statistical basis for an automatic conceptual phrase-index system, Journal of the American Society for Information Science 41:2(1990): 87-97. Jones, Richard L. and Corbett, Dan. Automatic document content analysis: the AIDA project, Library Hi Tech 10:1-2 (1992): 111-117. Leung, Chi-hong and Kan, Wing-kay. A statistical learning approach to automatic indexing of controlled index terms, Journal of the American Society for Information Science 48:1 (January 1997): 55-65. O'Kane, Kevin C. Generating hierarchical document indices from common denominators in large document collections, Information Processing & Management 32:1(1996): 105-115. Rosenberg, V. A study of statistical measures for predicting terms used to index documents, Journal of the American Society for Information Science 22:1(January/February 1971): 41-50. Sabourin, C. F. Computational linguistics in information science: information retrieval (full-text or conceptual), automatic indexing, text abstraction, content analysis, information extraction, query languages, bibliography, Journal of the American Society for Information Science 47:3 (March 1996): 247-249. Sager, Naomi. Sublanguage Grammers in Science Information Processing, Journal of the American Society for Information Science 26:1 (January/February 1975): 10-16. Salton, Gerard. Term weighting approaches in automatic text retrieval, Information Processing & Management 24:5 (1988): 513-523. Salton, Gerard. A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART), Journal of the American Society for Information Science 23:2(March/April1972): 75-84. Salton, Gerard. A comparison between manual and automatic indexing methods, American Documentation 20:1 (January 1969): 61-71. Silvester, J. P. and Klingbiel, P. H. An operational system for subject switching between controlled vocabularies, Information Processing & Management 29:1 (Jan/Feb 1993): 47-59. Schweiger, M. J. Automatic assignment of molecular keywords, Journal of Chemical Information and Computer Sciences 33:1(Jan/Feb 1993): 128-130. Schuegraf, E. J. and Bommel, F. van. An automatic document indexing system based on cooperating expert systems: design and development, Canadian Journal of Information and Library Science 18:2(July 1993): 32-50. Sparck Jones, Karen. A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation 28:1(1972): 11-21. Sridhar, A. and Sreelatha, G. Generation of descriptors for the text of a technical paper: a case study, Library Science with a Slant to Documentation 30:1(March 1993): 25-35. Trubkin, Loene. Auto-indexing of the 1971-77 ABI/INFORM database, Database 2:2(June 1979): 56-61. Veenema, F. To index or not to index, Canadian Journal of Information and Library Science 21:2(July 1996): 1-22. Vleduts-Stokolov, Natasha. Concept recognition in an automatic text-processing system for the life sciences, Journal of the American Society for Information Science 38:4(1987): 269-287. Vleduts-Stokolov, Natasha. On automatic support to indexing a life science data base, Information Processing & Management 18:6(1982): 313-321. Wagner, M. M. and Cooper, G. F. Evaluation of Meta-l-based atuomatic indexing method for medical documents, Computers and Biomedical Research 25(1992): 226-350. Ward, M. L. The future of the human indexer, Journal of Librarianship and Information Science 28:4(December 1996):217-25
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76346	-
dc.description.abstract	本論文於詞彙頻率統計的基礎下，利用大量經人工控制詞彙索引的檔，配合控制詞彙所提供的語意訊息，設計一個自動索引模型，透過模型，控制詞彙索引可以很容易的自動化。新的索引模型在簡單的訓練下，能夠建立控制詞彙本身與自然語言詞彙之間的關聯，關聯儲存於模型內轉化成索引特徵，當新檔出現時，透過檔的詞彙與索引特徵的比對，就可以達到控制詞彙自動化的目的。在索引模型中，新的詞彙顯著性計算公式TF×OSDF×CSIDF修正傳統以TF×IDF,無法將主題專指性詞彙從主題相近的檔集合中分離出來的問題。在不增加額外訓練檔前提下，利用相同訓練檔之間的合併與分離，分別計算出不同用途的檔頻率，讓對於主題辨識具有顯著貢獻的主題專指性詞彙從一般性詞彙與領域專指性詞彙中分離出來。實驗針對100個MeSH標題，利用總數60,400篇檔的摘要與題名進行訓練與測試，結果顯示索引模型的表現相當優良。摘要部份的索引精確率與索引回收率可同時到達90%以上，題名部份則在索引精確率90%的要求下，維持索引回收率於70%。實驗數據透過統計後，進一步發現索引模型也適用於具有索引典架構的控制詞彙索引。透過索引模型產生大量的控制詞彙建議名單，將可以減輕索引一致性的問題，並節省花費的時間與精力。經由自動索引模型的輔助，可以提高檔的控制詞彙索引數量，改善傳統控制詞彙索引因為產量過少，導致檢索時精確率雖高，但回收率卻不如自然語言索引的現象。	zh_TW
dc.description.abstract	Based on statistics of word frequencies and supported by semantic information of controlled vocabularies, a new model for automatic controlled vocabulary indexing is proposed in this thesis. Through sample training of documents indexed manually, the model could construct associations between a certain controlled vocabulary and a set of natural language vocabularies, then associations are transferred into indexing features. With matching between indexing features and words in document, the aim of automatic controlled vocabulary indexing achieves. In the proposed model, a new formula of term significance TF × OSDF × CSIDF amends the flow of TF × IDF which subject-specific words with high benefit to subject identification cannot be distinguished from other words in the document collection of the same or close subject. Increasing no additional training document, the formula employs varied document frequencies for different purposes through recombination of the same training documents to separate subject-specific words from common words and domain-specific words. Involving with 100 MeSH subject heading and 60,400 abstracts and titles, results of thesis experiment achieve high performance, whereas indexing precision and recall exceed 90% concurrently in abstract section, and indexing precision reaches 90%, indexing recall keeps 70% in title section. In further analyses, the proposed model is justified to be usable to controlled vocabularies with thesaurus structure. By consulting plentiful candidates of controlled vocabulary index terms generated by the model, problem of indexer consistency could be alleviated. Besides, much time and cost saved will directly prompt quality and quantity of controlled vocabulary index terms, and finally improve retrieval performance indirectly.	en
dc.description.provenance	Made available in DSpace on 2021-07-01T08:20:31Z (GMT). No. of bitstreams: 0 Previous issue date: 1998	en
dc.description.tableofcontents	中文摘要………………………………i 英文摘要………………………………ii 目次……………………………………iii 圖目次…………………………………v 表目次…………………………………vii 第一章緒論二……………………………1 第一節問題陳述…………………………1 第二節研究目的…………………………3 第三節研究假設…………………………4 第四節研究範圍與限制…………………4 第五節解釋名詞…………………………5 註釋………………………………………8 第二章文獻分析…………………………9 第一節主題索引…………………………9 第二節自動索引…………………………15 註釋…………………………………………21 第三章研究設計……………………………29 第一節實驗模型設計…………………………29 第二節索引模型………………………………32 註釋………………………………………………39 第四章實驗步驟與方法…………………………41 第一節實驗資料庫與控制詞彙標題簡介……………………41 第二節實驗步驟…………………………42 註釋…………………………………………54 第五章結果評估與探討…………………………55 第一節實驗結果比較與討論…………………………55 第二節問題進一步探討…………………………66 第三節相關研究比較…………………………70 註釋……………………………………73 第六章結論與建議……………………75 第一節結論………………………………75 第二節本論文貢獻………………………………78 第三節進一步研究之建議…………………………80 參考書目...........................................................................83 一、中文部分……………………………………83 二、英文部分……………………………………84 附錄一：MEDLINE記錄範例………………91 附錄二：實驗測試MeSH標題………………93 附錄三：實驗標題與檔平均索引分數………………96 附錄四：Pearson積差相關檢定總表……………………100 附錄五：英文部份......................................................101
dc.language.iso	zh-TW
dc.title	機器輔助控制詞彙索引之研究	zh_TW
dc.title	A Study on Machine-Aided Controlled Vocabulary Indexing	en
dc.date.schoolyear	86-2
dc.description.degree	碩士
dc.relation.page	67
dc.rights.note	未授權
dc.contributor.author-dept	文學院	zh_TW
dc.contributor.author-dept	圖書館學研究所	zh_TW
顯示於系所單位：	圖書資訊學系

文件中的檔案：

沒有與此文件相關的檔案。

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。