請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28123完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 簡立峰(Lee-Feng Chien) | |
| dc.contributor.author | Tsung-Pei Chou | en |
| dc.contributor.author | 周宗霈 | zh_TW |
| dc.date.accessioned | 2021-06-13T00:01:21Z | - |
| dc.date.available | 2008-08-03 | |
| dc.date.copyright | 2007-08-03 | |
| dc.date.issued | 2007 | |
| dc.date.submitted | 2007-07-30 | |
| dc.identifier.citation | 參考文獻
[1] Altavista Search Engine, (2007). http://www.altavista.com/ [2] Anick, P. (2003). Using terminological feedback for web search refinement: a log-based study, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 88-95. [3] Begelman, G., Keller, P., & Smadja, F. (2006). Automated tag clustering: Improving search and exploration in the tag space. In Collaborative Web Tagging Workshop, 15th International World Wide Web Conference. [4] Beitzel, S. Jensen, E., Lewis, D., Chowdhury, A., Kolcz, A., & Frieder, O. (2005). Improving Automatic Query Classification via Semi-supervised Learning. Proceedings of the Fifth IEEE International Conference on Data Mining, 42-49. [5] Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., & Moore, J. (1999). Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3), 329-341. [6] Chuang, S. L., & Chien, L. F. (2004). A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. Proceedings of the thirteenth ACM international conference on Information and knowledge management, 127-136. [7] Cui, H., Wen, J. R., Nie, J. Y., & Ma, W. Y. (2002). Probabilistic query expansion using query logs, In Proceedings of International World Wide Web Conference, 325-332. [8] Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, 318-329. [9] Google Search Engine, (2007) http://www.google.com [10] MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 281-297. Berkeley: University of California Press. [11] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264-323. [12] Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real Life Information Retrieval: A Study of User Queries on the Web. ACM SIGIR FORUM, 32(1), 5-17. [13] Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement, Proceedings of the 13th international conference on World Wide Web, 666-674 [14] Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 16-22. [15] Luhn, H. P. (1957). A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development, 2(2), 309-317. [16] Open Directory Project, (2007) http://dmoz.org/ [17] Pu, H. T., Chuang, S. L., & Yang, C. (2002). Subject categorization of query terms for exploring web users’ search interests. Journal of the American Society for Information Science and Technology, 53(8), 617-630. [18] Ricardo, B. Y., & Berthier, R. N. (1999). Modern Information Retrieval. New York: ACM Press and Addison Wesley. [19] Salton, G., Yang, C. S., & Wong, A. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613-620. [20] Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text Retrieval. Information Processing and Management, 24(5), 513-523. [21] Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47. [22] Shen, D., Sun, J. T., Yang, Q., & Chen, Z. (2006). Building bridges for web query classification. In Proceedings of the international ACM SIGIR conference on Research and development in information retrieval, 131-138. [23] Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR forum, 33(1), 6-12. [24] Spärck Johns, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21. [25] Taiwan Stock Exchange Corporation, (2007) http://www.tse.com.tw/ [26] Tsen, Y. H. (1998). Multilingual keyword extraction for term suggestion, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 377-378. [27] Voorhees, E. M. (1986). Implementing agglomerative hierarchical clustering algorithms for use in document retrieval, Information Processing and Management, 22, 465-476. [28] Wen, J. R., Nie, J. Y., & Zhang, J. H. (2001). Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference, 162–168. [29] Wordnet, (2007) http://wordnet.princeton.edu/ [30] Wen, J. R., Nie, J. Y., & Zhang, H. J. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20(1), 59-81. [31] Yahoo! Directory, (2007) http://dir.yahoo.com/ [32] Yahoo Search Engine, (2007) http://search.yahoo.com/ | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28123 | - |
| dc.description.abstract | 術語是句子、文章精煉之後的詞彙,是資訊呈現的基本單位,常做為概念的導引,術語的組織有助於使用者了解主題,進而快速掌握重要的資訊。當術語來源及使用者需求不固定時,傳統的術語組織方法難以直接滿足使用者的需求,分群結果缺乏語意上的解釋,分類方法則需耗費大量的人力。
在本論文中,我們提出一結合分類及分群之術語組織方式,我們運用分群方法發掘重要的術語主題,幫助使用者快速掌握整個術語中的重要概念,使用者可依此決定術語類別並從分群結果中擷取訓練語料,最終全部術語以分類方法進行組織。分群及分類方法為一反覆交替過程,過程中可不斷接受使用者回饋,而持續修正組織結果。 此方式使得術語組織的過程大為簡化,且能考量不同使用者的偏好,依使用者自訂的類別組織術語。我們從初步實驗獲得的結果發現,本研究所提方法能使組織結果更為理想。 | zh_TW |
| dc.description.abstract | Terms, short and meaningful word string which extracted from sentences and articles, can be the basic unit of information and guideline of concept. The organization of terms can help user understand topics and therefore grasp the key point quickly. When the sources of terms and requests of user are varied, conventional methods, clustering and classification, cannot satisfy users. The clustered results are lack of comprehensive explanations and the classification method need much manual work.
In this thesis, we develop an approach to combine the clustering and classification methods on term organization which provide a more comprehensive overview on terms. We use clustering method to extract the main topic and then user can decide the target classes from clustering results. Finally, all terms will be classified to their belonging classes. The clustering and classification methods are iterative to achieve a better performance. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T00:01:21Z (GMT). No. of bitstreams: 1 ntu-96-R94725028-1.pdf: 639625 bytes, checksum: fdf0e3ff94d5bf184ceb2569bb514034 (MD5) Previous issue date: 2007 | en |
| dc.description.tableofcontents | 目錄
謝 辭 II 論文摘要 III THESIS ABSTRACT IV 目錄 V 表次 VII 圖次 VIII 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機與目的 3 第三節 論文架構 5 第二章 文獻探討 6 第一節 向量空間模型 6 2.1.1 文件相似度 7 第二節 分群技術 9 2.2.1 階層式分群演算法 9 2.2.2 分割式分群演算法 11 第三節 分類技術 12 2.3.1 Naïve Bayes 13 2.3.2 K個最近鄰居法 14 第四節 術語組織相關研究探討 15 2.4.1 術語研究 15 2.4.2 以分群及分類技術為基礎的術語組織方法 16 第三章 問題與研究方法 18 第一節 問題陳述 18 第二節 研究架構與方法 19 第三節 系統實作 21 3.3.1 系統架構 21 3.3.2 術語特徵值擷取 21 3.3.3 術語組織 23 第四節 系統特性 28 第四章 實驗結果 29 第一節 實驗流程 29 第二節 術語分群實驗 30 4.2.1 實驗設置 30 4.2.2 實驗評估方式 30 4.2.3 術語分群之效能 31 4.2.4 分析及討論 35 第三節 類別標示實驗 36 4.3.1 利用類別名稱進行分類之效能 36 4.3.2 利用類別內術語進行分類之效能 37 第四節 結合分群與分類之術語組織實驗 37 4.4.1 實驗設置 37 4.4.2 實驗結果 38 4.4.3 分析及討論 43 第五章 結論與未來展望 44 第一節 結論 44 第二節 未來展望 44 參考文獻 46 附錄一 中研院平衡語料庫詞類標記集 50 | |
| dc.language.iso | zh-TW | |
| dc.subject | 分群技術 | zh_TW |
| dc.subject | 全球資訊網 | zh_TW |
| dc.subject | 分類技術 | zh_TW |
| dc.subject | 術語組織 | zh_TW |
| dc.subject | Clustering | en |
| dc.subject | World Wide Web | en |
| dc.subject | Classification | en |
| dc.subject | Term Organizing | en |
| dc.title | 結合文件分類及分群之術語組織技術 | zh_TW |
| dc.title | Organization of Term Associations through a Combination of Text Classification and Clustering | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 95-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 王柏堯(Bow-Yaw Wang) | |
| dc.contributor.oralexamcommittee | 王新民(Hsin-Min Wang),陳光華(Kuang-hua Chen),莊裕澤(Yuh-Jzer Joung) | |
| dc.subject.keyword | 術語組織,分群技術,分類技術,全球資訊網, | zh_TW |
| dc.subject.keyword | Term Organizing,Clustering,Classification,World Wide Web, | en |
| dc.relation.page | 51 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2007-07-31 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-96-1.pdf 未授權公開取用 | 624.63 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
