結合文件分類及分群之術語組織技術

Tsung-Pei Chou; 周宗霈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28123

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡立峰(Lee-Feng Chien)
dc.contributor.author	Tsung-Pei Chou	en
dc.contributor.author	周宗霈	zh_TW
dc.date.accessioned	2021-06-13T00:01:21Z	-
dc.date.available	2008-08-03
dc.date.copyright	2007-08-03
dc.date.issued	2007
dc.date.submitted	2007-07-30
dc.identifier.citation	參考文獻 [1] Altavista Search Engine, (2007). http://www.altavista.com/ [2] Anick, P. (2003). Using terminological feedback for web search refinement: a log-based study, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 88-95. [3] Begelman, G., Keller, P., & Smadja, F. (2006). Automated tag clustering: Improving search and exploration in the tag space. In Collaborative Web Tagging Workshop, 15th International World Wide Web Conference. [4] Beitzel, S. Jensen, E., Lewis, D., Chowdhury, A., Kolcz, A., & Frieder, O. (2005). Improving Automatic Query Classification via Semi-supervised Learning. Proceedings of the Fifth IEEE International Conference on Data Mining, 42-49. [5] Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., & Moore, J. (1999). Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3), 329-341. [6] Chuang, S. L., & Chien, L. F. (2004). A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. Proceedings of the thirteenth ACM international conference on Information and knowledge management, 127-136. [7] Cui, H., Wen, J. R., Nie, J. Y., & Ma, W. Y. (2002). Probabilistic query expansion using query logs, In Proceedings of International World Wide Web Conference, 325-332. [8] Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, 318-329. [9] Google Search Engine, (2007) http://www.google.com [10] MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 281-297. Berkeley: University of California Press. [11] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264-323. [12] Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real Life Information Retrieval: A Study of User Queries on the Web. ACM SIGIR FORUM, 32(1), 5-17. [13] Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement, Proceedings of the 13th international conference on World Wide Web, 666-674 [14] Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 16-22. [15] Luhn, H. P. (1957). A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development, 2(2), 309-317. [16] Open Directory Project, (2007) http://dmoz.org/ [17] Pu, H. T., Chuang, S. L., & Yang, C. (2002). Subject categorization of query terms for exploring web users’ search interests. Journal of the American Society for Information Science and Technology, 53(8), 617-630. [18] Ricardo, B. Y., & Berthier, R. N. (1999). Modern Information Retrieval. New York: ACM Press and Addison Wesley. [19] Salton, G., Yang, C. S., & Wong, A. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613-620. [20] Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text Retrieval. Information Processing and Management, 24(5), 513-523. [21] Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47. [22] Shen, D., Sun, J. T., Yang, Q., & Chen, Z. (2006). Building bridges for web query classification. In Proceedings of the international ACM SIGIR conference on Research and development in information retrieval, 131-138. [23] Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR forum, 33(1), 6-12. [24] Spärck Johns, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21. [25] Taiwan Stock Exchange Corporation, (2007) http://www.tse.com.tw/ [26] Tsen, Y. H. (1998). Multilingual keyword extraction for term suggestion, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 377-378. [27] Voorhees, E. M. (1986). Implementing agglomerative hierarchical clustering algorithms for use in document retrieval, Information Processing and Management, 22, 465-476. [28] Wen, J. R., Nie, J. Y., & Zhang, J. H. (2001). Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference, 162–168. [29] Wordnet, (2007) http://wordnet.princeton.edu/ [30] Wen, J. R., Nie, J. Y., & Zhang, H. J. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20(1), 59-81. [31] Yahoo! Directory, (2007) http://dir.yahoo.com/ [32] Yahoo Search Engine, (2007) http://search.yahoo.com/
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28123	-
dc.description.abstract	術語是句子、文章精煉之後的詞彙，是資訊呈現的基本單位，常做為概念的導引，術語的組織有助於使用者了解主題，進而快速掌握重要的資訊。當術語來源及使用者需求不固定時，傳統的術語組織方法難以直接滿足使用者的需求，分群結果缺乏語意上的解釋，分類方法則需耗費大量的人力。在本論文中，我們提出一結合分類及分群之術語組織方式，我們運用分群方法發掘重要的術語主題，幫助使用者快速掌握整個術語中的重要概念，使用者可依此決定術語類別並從分群結果中擷取訓練語料，最終全部術語以分類方法進行組織。分群及分類方法為一反覆交替過程，過程中可不斷接受使用者回饋，而持續修正組織結果。此方式使得術語組織的過程大為簡化，且能考量不同使用者的偏好，依使用者自訂的類別組織術語。我們從初步實驗獲得的結果發現，本研究所提方法能使組織結果更為理想。	zh_TW
dc.description.abstract	Terms, short and meaningful word string which extracted from sentences and articles, can be the basic unit of information and guideline of concept. The organization of terms can help user understand topics and therefore grasp the key point quickly. When the sources of terms and requests of user are varied, conventional methods, clustering and classification, cannot satisfy users. The clustered results are lack of comprehensive explanations and the classification method need much manual work. In this thesis, we develop an approach to combine the clustering and classification methods on term organization which provide a more comprehensive overview on terms. We use clustering method to extract the main topic and then user can decide the target classes from clustering results. Finally, all terms will be classified to their belonging classes. The clustering and classification methods are iterative to achieve a better performance.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T00:01:21Z (GMT). No. of bitstreams: 1 ntu-96-R94725028-1.pdf: 639625 bytes, checksum: fdf0e3ff94d5bf184ceb2569bb514034 (MD5) Previous issue date: 2007	en
dc.description.tableofcontents	目錄謝辭 II 論文摘要 III THESIS ABSTRACT IV 目錄 V 表次 VII 圖次 VIII 第一章緒論 1 第一節研究背景 1 第二節研究動機與目的 3 第三節論文架構 5 第二章文獻探討 6 第一節向量空間模型 6 2.1.1 文件相似度 7 第二節分群技術 9 2.2.1 階層式分群演算法 9 2.2.2 分割式分群演算法 11 第三節分類技術 12 2.3.1 Naïve Bayes 13 2.3.2 K個最近鄰居法 14 第四節術語組織相關研究探討 15 2.4.1 術語研究 15 2.4.2 以分群及分類技術為基礎的術語組織方法 16 第三章問題與研究方法 18 第一節問題陳述 18 第二節研究架構與方法 19 第三節系統實作 21 3.3.1 系統架構 21 3.3.2 術語特徵值擷取 21 3.3.3 術語組織 23 第四節系統特性 28 第四章實驗結果 29 第一節實驗流程 29 第二節術語分群實驗 30 4.2.1 實驗設置 30 4.2.2 實驗評估方式 30 4.2.3 術語分群之效能 31 4.2.4 分析及討論 35 第三節類別標示實驗 36 4.3.1 利用類別名稱進行分類之效能 36 4.3.2 利用類別內術語進行分類之效能 37 第四節結合分群與分類之術語組織實驗 37 4.4.1 實驗設置 37 4.4.2 實驗結果 38 4.4.3 分析及討論 43 第五章結論與未來展望 44 第一節結論 44 第二節未來展望 44 參考文獻 46 附錄一中研院平衡語料庫詞類標記集 50
dc.language.iso	zh-TW
dc.subject	分群技術	zh_TW
dc.subject	全球資訊網	zh_TW
dc.subject	分類技術	zh_TW
dc.subject	術語組織	zh_TW
dc.subject	Clustering	en
dc.subject	World Wide Web	en
dc.subject	Classification	en
dc.subject	Term Organizing	en
dc.title	結合文件分類及分群之術語組織技術	zh_TW
dc.title	Organization of Term Associations through a Combination of Text Classification and Clustering	en
dc.type	Thesis
dc.date.schoolyear	95-2
dc.description.degree	碩士
dc.contributor.coadvisor	王柏堯(Bow-Yaw Wang)
dc.contributor.oralexamcommittee	王新民(Hsin-Min Wang),陳光華(Kuang-hua Chen),莊裕澤(Yuh-Jzer Joung)
dc.subject.keyword	術語組織,分群技術,分類技術,全球資訊網,	zh_TW
dc.subject.keyword	Term Organizing,Clustering,Classification,World Wide Web,	en
dc.relation.page	51
dc.rights.note	有償授權
dc.date.accepted	2007-07-31
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 未授權公開取用	624.63 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。