請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43883完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳信希(Hsin-Hsi Chen) | |
| dc.contributor.author | Ming-Shun Lin | en |
| dc.contributor.author | 林敏順 | zh_TW |
| dc.date.accessioned | 2021-06-15T02:31:34Z | - |
| dc.date.available | 2009-08-18 | |
| dc.date.copyright | 2009-08-18 | |
| dc.date.issued | 2009 | |
| dc.date.submitted | 2009-08-15 | |
| dc.identifier.citation | Aleman-Meza, B., Nagarajan, M., Ramakrishnan, C., Ding, L., Kolari, P., Sheth, A., Arpinar, I., Joshi, A. and Finin, T. 2006. Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In Proceedings of the 15th international conference on World Wide Web, p.407-416.
Alvarez, M. A. and Lim, S. 2007. A Graph Modeling of Semantic Similarity between Words. In Proceedings of the International Conference on Semantic Computing, p.355-362. Bagga, A. and Baldwin, B. 1998. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of 36th COLING-ACL Conference, p.79-85. Bekkerman, R. and McCallum, A. 2005. Disambiguating Web Appearances of People in a Social Network. In Proceedings of the 14th international conference on World Wide Web, p.463-470. Bennett, C., Vitanyi, P., Zurek, M., Res, W. and Center, Y. 1998. Information distance. Information Theory, IEEE Transactions on 44(4), 1407-1423. Bollegala, D., Matsuo, Y. and Ishizuka, M. 2007. Measuring Semantic Similarity between Words Using Web Search Engines. In Proceedings of the 16th international conference on World Wide Web, p.757-766. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7): 107-117. Budanitsky, A. and Hirst, G. 2006. Evaluating Wordnet-based Measures of Semantic Distance. Computational Linguistics, 32(1), 13-47. Chen, H.H., Ding, Y.W. and Tsai, S.C. 1998. Named Entity Extraction for Information Retrieval. Computer Processing of Oriental Languages, Special Issue on Information Retrieval on Oriental Languages, p.75-85. Chen, H.H., Lin, M.S. and Wei, Y.C. 2006. Novel Association Measures using Web Search with Double Checking. In Proc. of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, p.1009-1016. Chung, S., Jun, J. and Mcleod, D. 2006. A Web-Based Novel Term Similarity Framework for Ontology Learning. In LECTURE NOTES IN COMPUTER SCIENCE, p.1092-1109. Cilibrasi, R. and Vitányi, P. 2007. The Google Similarity Distance. IEEE Transcations on Knowledge and Data Engineering, p.370-383. Cimiano, P. and Staab, S. 2004. Learning by googling. ACM SIGKDD Explorations Newsletter, 6(2): 24-33. Çınlar, E. 1975. Introduction to stochastic processes. Prentice Hall, Englewood Cliffs, NJ. Culotta, A., Bekkerman, R. and McCallum, A. 2004. Extracting Social Networks and Contact Information from Email and the Web. In Proceedings of the First Conference on Email and Anti-Spam. Ferragina, P. and Gulli, A. 2005. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. In Special interest tracks and posters of the 14th international conference on World Wide Web, p.801–810. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. 2002. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1): 116-131. Fleischman, M.B. and Hovy, E. 2004. Multi-document Person Name Resolution. In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Reference Resolution Workshop. Gabrilovich, E and Markovitch, S. 2007. 'Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, p.1606-1611. Gao, W., Niu, C., Nie, J., Zhou, M., Hu, J., Wong, K. and Hon, H. 2007. Cross-lingual query suggestion using query logs of different languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, p.463-470. Golub, G.H. and Greif, C. 2004. Arnoldi-type algorithms for computing stationary distribution vectors, with application to PageRank. Technical Report SCCM-04-15, Stanford University Technical Report. Hirst, G. and St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An Electronic Lexical Database, p.305-332. Jarmasz, M. 2003. Roget’s thesaurus as a lexical resource for natural language processing. Master Thesis, University of Ottawa. Jiang, J. and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, Taiwan. Keller, F. and Lapata, M. 2003. Using the Web to Obtain Frequencies for Unseen Bigrams. Computational Linguistics, 29(3): 459-484. Keller, F., Lapata, M. and Ourioupina, O. 2002. Using the web to overcome data sparseness. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, p.230-237. Kleinberg, J. M. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, p.668-677. Kolmogorov, A. 1965. Three approaches to the quantitative definition of information. Problems Inform Transmission, 1:1-7. Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database ,49(2): 265-283. Lenat, D., Guha, R., Pittman, K., Pratt, D. and Shepherd, M. 1990. Cyc: toward programs with common sense. Communications of the ACM, 33(8): 30-49. Li, Y., Bandar, Z. and Mclean, D. 2003. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, p.871-882. Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, p.296-304. Lin, H. C. and Chen, H. H. 2004. Comparing Corpus-based Statistics and Web-based Statistics: Chinese Segmentation as an Example. In Proceedings of 16th ROCLING Conference, p.89-100. Lin, M. S. and Chen, H. H. 2009. A Web-Based Relatedness Measure by Conditional Query. In Proc. of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence. Lin, M. S. and Chen, H. H. 2008. Labeling Categories and Relationships in an Evolving Social Network. In LECTURE NOTES IN COMPUTER SCIENCE, p.77-88. Lin, M. S. and Chen, H. H. 2006. Constructing a Named Entity Ontology from Web Corpora. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, p.1450-1453. Lin, M. S., Chen, C. P. and Chen, H. H. 2005. An Approach of Using the Web as a Live Corpus for Spoken Transliteration Name Access. In Proceedings of 17th ROCLING Conference, p.361-370. Luo, G., Tang, C. and Tian, Y. 2007. Answering relationship queries on the web. In Proceedings of the 16th international conference on World Wide Web, p.561-570. Maguitman, A., Menczer, F., Roinestad, H. and Vespignani, A. 2005. Algorithmic detection of semantic similarity. In Proceedings of the 14th international conference on World Wide Web, p.107-116. Mann, G. and Yarowsky, D. 2003. Unsupervised Personal Name Disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL, p.33-40. Markert, K., Nissim, M. and Modjeska, N. 2003. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, p.39-46. Matsuo, Y., Mori, J., Hamasaki, M., Nishimura, T., Takeda, H., Hasida, K. and Ishizuka, M. 2007. POLYPHONET: An advanced social network extraction system from the Web. In Proceedings of the 16th international conference on World Wide Web, p.262-278. Matsuo, Y., Tomobe, H., Hasida, K. and Ishizuka, M. 2004. Finding social network for trust calculation. In Proceedings of the 16th European Conference on Artificial Intelligence, p.510-514. Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K. 1990. Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4): 235-244. Miller, G. and Charles, W. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1): 1-28. Milne, D. and Witten, I. H. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the first AAAI Workshop on Wikipedia and Artifical Intellegence (WIKIAI'08), p.1419-1424. Mori, J., Ishizuka, M. and Matsuo, Y. 2007. Extracting Keyphrases to Represent Relations in Social Networks from Web. In Proceedings of the International Joint Conference on Artificial Intelligence, p.2820-2827. Page, L., Brin, S., Motwani,R. and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford University. Patwardhan, S. and Pedersen, T. 2006. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In Proceedings of the EACL Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics, p.1-8. Raghavan, H., Allan, J. and McCallum, A. 2004. An Exploration of Entity Models, Collective Classification and Relation Descriptions. In Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection, p.1-10. Resnik, P. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In International Joint Conference On Artificial Intelligence, p.448-453. Resnik, P. 1999. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence, 11(11): 95-130. Resnik, P. and Smith, N. 2003. The Web as a parallel corpus. Computational Linguistics, 29(3): 349-380. Ponzetto, S. P. and Strube, M. 2007. Knowledge Derived from Wikipedia for Computing Semantic Relatedness. Journal of Artificial Intelligence Research, p.181-212. Richardson, R., Smeaton, A. and Murphy, J. 1994. Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words. In Proceedings of AICS Conference. Trinity College, Dublin. Rubenstein, H. and Goodenough, J. 1965. Contextual correlates of synonymy. Communications of the ACM, 8(10): 627-633. Sahami, M. and Heilman, T. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web, p.377-386. Strube, M. and Ponzetto, S. 2006. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proceedings Of The National Conference On Artificial Intelligence (AAAI'06), p.1419-1426. Thompson, P. and Dozier, C. 1997. Name Searching and Information Retrieval. In Proceedings of Second Conference on Empirical Methods in Natural Language Processing, p.134-140. Waliszewski, P. and Konarski, J. 2005. A Mystery of the Gompertz Function. Mathematics and Biosciences in Interaction, p.277-286. Wei, Y. C., Lin, M. S. and Chen, H. H. 2006. Name Disambiguation in Person Information Mining. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, p.378-381. Winsor, C. 1932. The Gompertz Curve as a Growth Curve. Proceedings of the National Academy of Sciences, 18(1): 1-8. Wu, Z. and Palmer, M. 1994. Verbs semantics and lexical selection. Association for Computational Linguistics Morristown, p.133-138. Yang, D. and Powers, D. 2005. Measuring semantic similarity in the taxonomy of WordNet. In Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38, p.315-322. Yang, W. and Li, X. 2002. Chinese Keyword Extraction Based on Max-Duplicated Strings of the Documents. In Proceedings of the 25th ACM SIGIR Conference, p.439-440. Zhang, Y. and Vines, P. 2004. Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, p.162-169. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43883 | - |
| dc.description.abstract | 在自然語言處理中,做統計計算時資源是最重要的,現今有很多現成的語料及被驗証過的語言模型幾乎是隨手可得,而以語料庫為基礎的各種不同的研究中,總是會面臨到語料庫是否可以反應出最新詞彙相關性意義的麻煩。因為語言是活的,新字以及新詞彙每天都會被創造出來,如何知道新字以及新詞彙的相關性意義是一個非常重要的研究議題。
在這篇論文中我們定義了一個新穎的以網頁為基礎的相關性量測方法,並且把網頁當成是一種語料,而我們也會探討不同的網頁領域上對這個方法所帶來的影響。兩個詞彙的關係分數會根據這兩個詞的網頁內容以及詞彙的頻率資訊來加以獲得,兩個詞彙真正的相關性分數是把他們的關係分數透過一個轉移函式的計算而得到的,本論文中一共提出四種轉移函式,分別為卜瓦松函式、對數凸函式、冪凸函式以及岡帕氏函式,在實驗中我們分別使用三個有名的測試集來測試這四個模型,並詳細的與各研究團隊做比較。 在以往的研究中人名一直是非重要的語料資源,我們會利用關係分數來判斷兩個人名是否有關係,在這個關係的辨識中我們提出三個策略:分別為直接關聯法、關聯矩陣法以及純量關聯矩陣法來驗證我們的相關性量測是合理的。我們會利用上述的相關性量測方法去建立一個社群網路,並對這個社群網路的每個配對利用馬可夫隨機程序去標記他們的類別,而且會試著從網頁中抽出關鍵詞當成他們的關係。 在論文中我們也利用我們的相關性量測做查詢詞的推薦,這個查詢詞的推薦與傳統的查詢詞推薦不同,傳統的查詢詞推薦是根據被查詢的記錄檔,我們的查詢詞推薦是從網頁中直接抽取出來。在實驗中我們所提出的方法證明有高度的認同值。 | zh_TW |
| dc.description.abstract | In statistical natural language processing, resources used to compute the statistics are indispensable. Different kinds of corpora have made available and many language models have been experimented. One major issue behind the corpus-based approaches is: if corpora adopted can reflect the up-to-date usage. As we know, languages are live. New terms and phrases are used in daily life. How to capture the new usages is an important research topic.
This thesis defines a novel web-based relatedness measure and explores snippets in various web domains as corpora. Mutual dependency score between two objects is calculated according to content information and frequent information of the two objects. The relatedness score of the two objects is defined as projecting the dependency score by a transfer function. Four transfer functions based on Poisson, Log-concave Power-concave and Gompertz function are considered. Three famous benchmark datasets, including WordSimilarity-353, Miller-Charles and Rubenstein-Goodenough, verify the four transfer functions. Named entities are common foci of searchers. We apply the dependency score to evaluate named level association by three strategies, direct association, association matrix and scalar association matrix. Modeling and naming general entity-entity relationships is challenging in construction of social networks. Given a seed denoting a person name, we utilize Google search engine, NER (Named Entity Recognizer) parser, and the web-based relatedness measure to construct an evolving social network. For each entity pair in the network, we apply Markov chain random process to extract potential categories defined in the ODP. Moreover, for labeling their relationships, we try to combine the tf×idf scores of noun phrases extracted from snippets and the rank scores of the categories. Different from traditional query suggestion which is extracted from query logs,we extract suggestion terms from snippets. We apply our relatedness measures to the query suggestion. Using the proposed relatedness measures, our query suggestion extracted shows a high agreement of relatedness. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T02:31:34Z (GMT). No. of bitstreams: 1 ntu-98-D91922022-1.pdf: 1820763 bytes, checksum: f4d7303d52494bb7f5ccdd082f253482 (MD5) Previous issue date: 2009 | en |
| dc.description.tableofcontents | 誌謝 i
摘要 iii Abstract v Table of Contents vii List of Tables x List of Figures xii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 A Web-based Metric by Conditional Query 4 1.3 Community Chain Detection 6 1.4 Query Suggestion 9 1.5 The Goal of the Study 9 Chapter 2 Approaches to the Relatedness Measure 11 2.1 Three Benchmark Datasets 11 2.2 Thesaurus-based Approaches 12 2.3 Corpus-based Approaches 16 2.4 Wikipedia-based Approaches 16 Chapter 3 A Web-based Relatedness Measure by Conditional Query 19 3.1 Introduction 19 3.2 Frequency Information by Conditional Query 21 3.3 Association of Common Words 25 3.4 Association of Named Entities 28 3.5 Applications 34 3.5.1 An NE Ontology Generation Engine 34 3.5.2 Disambiguation Using Association of Named Entities 39 3.6 Discussion 43 Chapter 4 The Enhanced Relatedness Measure and Its Application on Community Chain Detection and Query Suggestion 45 4.1 Introduction 45 4.2 Web-based Relatedness Measure 48 4.2.1 Mutual dependency between objects 48 4.2.2 Transfer Functions with the Dependency Score 50 4.3 Bag of Words Model 52 4.4 The Computation of Word Level Relatedness 54 4.4.1 The Results of the Bag of Words Model 55 4.4.2 Optimal Strategy Parameters 57 4.4.3 Experiment Results 62 4.4.4 Comparisons with the Related Works and Summaries 68 4.5 Community Chain Detection 71 4.5.1 Link Detection Test 71 4.5.2 Community Chain Clustering 72 4.5.3 Experiments of the Community Chain Detection 74 4.5.4 Experiments of Community Chain Clustering 79 4.6 Application: Query Suggestion 80 4.7 Discussion 83 Chapter 5 Labeling Categories and Relationships in an Evolving Social Network 85 5.1 Introduction 85 5.2 Evolving Social Network 86 5.3 Building a Directed Graph by the ODP Resource 89 5.3.1 Cue Patterns 89 5.3.2 Building a Directed Graph 89 5.4 Extracted Critical Nodes from a Directed Graph 91 5.5 Experiments 94 5.5.1 Evolving Social Networks 94 5.5.2 Extracting Potential Categories 95 5.5.3 Labeling Relationships 98 5.6 Discussion 100 Chapter 6 Conclusions and Future Work 101 6.1 Achievements 101 6.2 Future Work 102 References 105 Appendixes 111 Appendix A. 100 Transliteration Names 111 Appendix B. The TC-353 Testing dataset 112 Appendix C. Some Examples of the Query Suggestion 116 | |
| dc.language.iso | en | |
| dc.subject | 相關性量測 | zh_TW |
| dc.subject | 演化中的社群網路 | zh_TW |
| dc.subject | 關係標記 | zh_TW |
| dc.subject | 類別標記 | zh_TW |
| dc.subject | 查詢詞推薦 | zh_TW |
| dc.subject | 社群偵測 | zh_TW |
| dc.subject | Query Suggestion | en |
| dc.subject | Relatedness Measure | en |
| dc.subject | Evolving Social Network | en |
| dc.subject | Relationships Labeling | en |
| dc.subject | Category Labeling | en |
| dc.subject | Community Chain Detection | en |
| dc.title | 以網際網路語料為基礎之相關性量測研究及其在社群偵測與查詢詞推薦之應用 | zh_TW |
| dc.title | A Study on Web-based Relatedness Measure and Its Applications on Community Chain Detection and Query Suggestion | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 97-2 | |
| dc.description.degree | 博士 | |
| dc.contributor.oralexamcommittee | 傅楸善(Chiou-Shann Fuh),張俊盛(Jason-S. Chang),陳克健(Keh-Jiann Chen),梁婷(Tyne Liang),吳宗憲(Chung-Hsien Wu),曾元顯(Yuen-Hsien Tseng) | |
| dc.subject.keyword | 相關性量測,社群偵測,查詢詞推薦,類別標記,關係標記,演化中的社群網路, | zh_TW |
| dc.subject.keyword | Relatedness Measure,Community Chain Detection,Query Suggestion,Category Labeling,Relationships Labeling,Evolving Social Network, | en |
| dc.relation.page | 117 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2009-08-17 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-98-1.pdf 未授權公開取用 | 1.78 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
