整合多種字詞相似度的方法

Chen-Lun Huang; 黃振綸

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68008

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李允中
dc.contributor.author	Chen-Lun Huang	en
dc.contributor.author	黃振綸	zh_TW
dc.date.accessioned	2021-06-17T02:11:10Z	-
dc.date.available	2018-02-26
dc.date.copyright	2018-02-26
dc.date.issued	2017
dc.date.submitted	2018-01-18
dc.identifier.citation	[1] Owls-tc. http://projects.semwebcentral.org/projects/owls-tc/. [2] Petscan. https://petscan.wmflabs.org/. [3] Xml wsdl. http://www.w3schools.com/xml/xml_wsdl.asp. [4] D. M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77–84, Apr. 2012. [5] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. [6] G. Bouma. Normalized (pointwise) mutual information in collocation extraction. unknown. [7] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, pages 50–57, New York, NY, USA, 1999. ACM. [8] T. Landauer, P. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes, 25:259–284, 1998. [9] J. Lee, S.-P. Ma, and K.-H. Hsu. Service discovery through elasticity-based graph matching. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, X, 2014. [10] Z. Liu, Y. Zhang, E. Y. Chang, and M. Sun. Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3):26:1–26:18, May 2011. [11] O. Medelyan, E. Frank, and I. H. Witten. Human-competitive tagging using automatic keyphrase extraction. In Internat. Conference of Empirical Methods in Natural Language Processing, EMNLP-2009,, 2009. [12] O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with Wikipedia. In Proceedings of the Wikipedia and AI workshop at AAAI-08. AAAI, 2008. [13] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013. [14] T. Mikolov, S. W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics, May 2013. [15] G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41, Nov. 1995. [16] N. C. Stuart Rose, Dave Engel and W. Cowley. Automatic keyword extraction from individual documents. John Wiley and Sons, Ltd, 2010. [17] H. M.Wallach, D. M. Mimno, and A. McCallum. Rethinking lda: Why priors matter. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1973–1981. Curran Associates, Inc., 2009. [18] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries, DL ’99, pages 254–255, New York, NY, USA, 1999. ACM.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68008	-
dc.description.abstract	近年來，網路服務的數量與日俱增，將網路服務組合起來的研究也隨之變得重要，而在組合服務的過程中，服務的配對扮演著不可或缺的角色。為了要找出最適合並能夠滿足需求的服務，在服務文件中的資訊必須被完整地取出，並將之放入良好的結構之中藉此以量化兩個服務之間的差異，以此來改善服務配對的結果。而為了要將服務的差異量化，文字之間的語意關係必須被納入考量。我們使用了一些機器學習的演算法並將其訓練在大量的資料上以期盼能夠得到更精準的文字的語意以及字與字之間的關係。在這篇論文中，我們提出了一個整合不同語意關係的架構，此架構是用來找出服務之間的特色並使用語意關係來幫助配對服務。	zh_TW
dc.description.abstract	In the recent years, as the number of web services has risen up, a mechanism to compose services become more important. In the process of composing services, service matching plays an important role and is indispensable. In order to find a most suitable service to satisfy the requirements, the information of the service must be extracted completely from its document, and putting the information in a structure which describes them well is also essential. By converting information into some kind of structure, the difference between two services could be quantified, and the comparison results would be helpful for matching services. To quantify the difference, word semantics must be taken into consideration. We use some machine learning algorithms and hope that using them training on huge data set could help capture more precise word semantics and relations between words. In this thesis, we propose a framework to aggregate different semantic measures for data extracted from WSDL. This framework is designed to identify the features of a service, and aggregate several measures to compare two services with their word semantics and structure.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:11:10Z (GMT). No. of bitstreams: 1 ntu-106-R04922101-1.pdf: 6464556 bytes, checksum: 1649fb8cf8af57d5aa118e394ae5dd30 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	Acknowledgement i 摘要 ii Abstract iii List of Figures viii List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Semantic System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Keyphrase Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Service Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 3 Semantic System 6 3.1 Word Co-occurrence Measurement . . . . . . . . . . . . . . . . . . . . 7 3.2 Knowledge-based Representation . . . . . . . . . . . . . . . . . . . . 7 3.3 Vector Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.1 Latent Semantic Allocation . . . . . . . . . . . . . . . . . . . 9 3.3.2 Probabilistic Latent Semantic Allocation . . . . . . . . . . . . 11 3.3.3 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . 13 3.3.4 Word2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.5 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4 Corpus Preparing 23 Chapter 5 Graph Matching Service Discovery 25 5.1 WSDL Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Test Dataset: OWLS-TC4 . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3 Graph Matching Service Discovery . . . . . . . . . . . . . . . . . . . 26 Chapter 6 Aggregation Framework 28 6.1 Corpus and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Semantic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3 Steps in Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.4 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.5 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 7 Conclusion 41 Bibliography 42
dc.language.iso	en
dc.title	整合多種字詞相似度的方法	zh_TW
dc.title	An Aggregation Framework for Word Similarity Measurement	en
dc.type	Thesis
dc.date.schoolyear	106-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蘇木春,蔣偉寧,徐國勛,馬尚彬
dc.subject.keyword	網路服務,服務組合,資訊擷取,服務配對,字詞語意,	zh_TW
dc.subject.keyword	Web Service,Service Composition,Information Retrieval,Service Matching,Word Semantics,	en
dc.relation.page	43
dc.identifier.doi	10.6342/NTU201800093
dc.rights.note	有償授權
dc.date.accepted	2018-01-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	6.31 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。