利用向量組合方法改善網路服務匹配

Che-An Lee; 李哲安

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70852

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李允中
dc.contributor.author	Che-An Lee	en
dc.contributor.author	李哲安	zh_TW
dc.date.accessioned	2021-06-17T04:41:01Z	-
dc.date.available	2019-08-08
dc.date.copyright	2018-08-08
dc.date.issued	2018
dc.date.submitted	2018-08-06
dc.identifier.citation	[1] D. M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77–84, Apr. 2012. [2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. [3] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM, 2000. [4] T. Calin ́ski and J. Harabasz. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1):1–27, 1974. [5] R. Chinnici, M. Gudgin, J. J. Moreau, and S. Weerawarana. Web services description language (WSDL) version 1.2 w3c working draft. W3C, 9 July 2002. [6] M. Crasso, A. Zunino, and M. Campo. A survey of approaches to web service dis- covery in service-oriented architectures. Journal of Database Management (JDM), 22(1):102–132, 2011. [7] D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2):224–227, 1979. [8] J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973. [9] M. Fabian, K. Gjergji, W. Gerhard, et al. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In 16th International World Wide Web Conference, WWW, pages 697–706, 2007. [10] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith. Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166, 2014. [11] M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association, 32(200):675–701, 1937. [12] J. Ganitkevitch, B. Van Durme, and C. Callison-Burch. Ppdb: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 758–764, 2013. [13] J. Goikoetxea, E. Agirre, and A. Soroa. Single or multiple? combining word repre- sentations independently learned from text and wordnet. In AAAI, pages 2608–2614, 2016. [14] M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of intelligent information systems, 17(2-3):107–145, 2001. [15] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998. [16] M. Klusch. Overview of the s3 contest: Performance evaluation of semantic service matchmakers. In Semantic web services, pages 17–34. Springer, 2012. [17] M. Klusch and P. Kapahnke. The isem matchmaker: A flexible approach for adaptive hybrid semantic service selection. Web Semantics: Science, Services and Agents on the World Wide Web, 15:1–14, 2012. [18] M. Klusch, P. Kapahnke, S. Schulte, F. Lecue, and A. Bernstein. Semantic web service search: a brief survey. KI-Ku ̈nstliche Intelligenz, 30(2):139–147, 2016. [19] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu. Understanding of internal clustering vali- dation measures. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 911–916. IEEE, 2010. [20] L. Lova ́sz et al. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty, 2(1):1–46, 1993. [21] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pages 142–150. Association for Computational Linguistics, 2011. [22] A. L. Maas and A. Y. Ng. A probabilistic model for semantic word vectors. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, pages 1–8. ACM, 2010. [23] L. McInnes, J. Healy, and S. Astels. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205, 2017. [24] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed repre- sentations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013. [25] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to wordnet: An on-line lexical database. International journal of lexicography, 3(4):235– 244, 1990. [26] G. Mohr, M. Stack, I. Rnitovic, D. Avery, and M. Kimpton. Introduction to heritrix. In 4th International Web Archiving Workshop, pages 109–115, 2004. [27] N. Mrkˇsi ́c, D. O. S ́eaghdha, B. Thomson, M. Gaˇsi ́c, L. Rojas-Barahona, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016. [28] A. Murom ̈agi, K. Sirts, and S. Laur. Linear ensembles of word embedding models. arXiv preprint arXiv:1704.01419, 2017. [29] Owls-tc. http://projects.semwebcentral.org/projects/owls-tc/. [30] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word rep- resentation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [31] P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987. [32] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965. [33] R. Vall ́ee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot: A java bytecode optimization framework. In CASCON First Decade High Impact Papers, pages 214–224. IBM Corp., 2010. [34] L. Van Der Maaten, E. Postma, and J. Van den Herik. Dimensionality reduction: a comparative. J Mach Learn Res, 10:66–71, 2009. [35] Wikipedia. Plagiarism — Wikipedia, the free encyclopedia, 2004. [Online; accessed 22-July-2004]. [36] H. Xiong, G. Pandey, M. Steinbach, and V. Kumar. Enhancing data analysis with noise removal. IEEE Transactions on Knowledge and Data Engineering, 18(3):304–319, 2006. [37] M. Yu, M. Gormley, and M. Dredze. Factor-based compositional embedding models. In NIPS Workshop on Learning Semantics, pages 95–101, 2014. [38] M. Yu, M. R. Gormley, and M. Dredze. Combining word embeddings and feature embeddings for fine-grained relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1374–1379, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70852	-
dc.description.abstract	在text-based的網路服務比對方法中，由於將網路服務表示為純文字描述，並使用文字來表示網路服務和進行比對，因此文字比對的準確性會對服務比對的表現造成很大的影響。本研究中我們透過下面四個步驟提升服務比對的準確性: 1. 從網路服務描述文件中取得關鍵字，並將關鍵字透過預先訓練的詞向量模型轉換為向量表示，2. 從參考資料中取得文字關係，3. 將文字關係用於向量結合，改善預訓練詞向量中的不足，4. 計算關鍵字詞向量的cosine similarity來得到網路服務相似度。在實驗中我們使用了網路服務比對的benchmark OWLS-TC V4來評估提出方法的表現，並利用假設檢定將我們的方法與現有的服務比對方法iSeM做比較，在比較結果中我們的方法 (MAP=0.9242) 表現優於iSeM (MAP=0.8529)。	zh_TW
dc.description.abstract	In text-based service matchmaking approach, since the web service is treated as a plain text and use term tokens as the internal representation to match services, the accuracy of the text comparison will affect the performance of service matchmaking. In this research, we improve the performance of service matchmaking through the following four steps: 1. extract keywords from WSDL and convert them into vector representations through pre-trained word vector model, 2. extract word relations from reference data, 3. use word relations for vector combination to improve the quality of pre-trained word vectors, and 4. calculate the cosine similarity between keyword word vectors to get the similarity of two web services. An experiment is also conducted based on an OWLS-TC V4 service matchmaking benchmark with hypothesis testing to compare our proposed approach with the iSeM approach. The result of the experiment shows that our approach (MAP=0.9242) excels iSeM (MAP=0.8529) by.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:41:01Z (GMT). No. of bitstreams: 1 ntu-107-R05922096-1.pdf: 2665381 bytes, checksum: 2a35258a791315336eb56dd4025269fa (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstracts iv List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Corpus................................... 4 2.2 WordRepresentations .......................... 6 2.3 VectorCombination............................ 9 2.4 ReferenceData .............................. 11 2.5 Clustering................................. 12 2.6 ServiceMatchmaking........................... 14 Chapter 3 Vector Combination 16 3.1 EntityLinking............................... 17 3.2 VectorCombination............................ 23 Chapter 4 Service Matchmaker 26 4.1 KeywordExtractor ............................ 27 4.2 VectorCombiner ............................. 28 4.3 SimilarityCalculator ........................... 29 Chapter 5 Experiments 31 5.1 EvaluationBenchmark .......................... 31 5.2 Word Representation and Relational Information . . . . . . . . . . . 33 5.3 ExperimentResults............................ 37 5.4 PerformanceAnalysis........................... 40 5.5 Discussion................................. 44 Chapter 6 Conclusion 46 Bibliography 48 A Service Matchmaking Example 52 List of Figures 2.1 DependencyGraph ............................ 5 2.2 Word2VecModel ............................. 7 2.3 LDAModel ................................ 9 2.4 WordNetStructure ............................ 12 3.1 EntityLinkingConcept.......................... 18 3.2 EntityLinkerModule........................... 18 3.3 ClusteringProcess ............................ 20 3.4 WikipediaInformation .......................... 22 3.5 VectorCombinerModule......................... 23 3.6 CombinerPipeline ............................ 24 4.1 ServiceMatchmaker ........................... 26 4.2 KeywordExtractor ............................ 27 4.3 KeywordExtractionExample ...................... 28 4.4 VectorCombiner ............................. 28 4.5 VectorCombinationExample ...................... 29 4.6 SimilarityCalculator ........................... 30 4.7 SimilarityCalculationExample ..................... 30 5.1 Top-KPrecision.............................. 41 5.2 Top-KRecall ............................... 41 5.3 R-Precision ................................ 42 5.4 AveragePrecision............................. 42 A.1 RequestWSDL .............................. 52 A.2 CandidateWSDL............................. 53 A.3 ConvertWordVectors .......................... 53 A.4 CalculateServiceSimilarity ....................... 53 List of Tables 2.1 CompareDifferentVectorCombination................. 10 5.1 SingleRelationResults1......................... 38 5.2 SingleRelationResults2......................... 39 5.3 CombineRelationsResults........................ 40 5.4 NormalityTest .............................. 43 5.5 FriedmanTest............................... 43
dc.language.iso	en
dc.subject	網路服務	zh_TW
dc.subject	服務比對	zh_TW
dc.subject	文字關係	zh_TW
dc.subject	向量結合	zh_TW
dc.subject	文字向量	zh_TW
dc.subject	Vector Combination	en
dc.subject	Web Service	en
dc.subject	Word Vector	en
dc.subject	Word Relation	en
dc.subject	Service Matchmaking	en
dc.title	利用向量組合方法改善網路服務匹配	zh_TW
dc.title	Web Services Matchmaking with Vectors Combination	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	施吉昇,蘇木春,鄭有進,馬尚彬
dc.subject.keyword	網路服務,服務比對,文字關係,文字向量,向量結合,	zh_TW
dc.subject.keyword	Web Service,Service Matchmaking,Word Relation,Word Vector,Vector Combination,	en
dc.relation.page	54
dc.identifier.doi	10.6342/NTU201802568
dc.rights.note	有償授權
dc.date.accepted	2018-08-06
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	2.6 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。