於社群網路中之高效能鏈結預測與群組查詢

Yi-Ling Chen; 陳怡伶

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2602

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲
dc.contributor.author	Yi-Ling Chen	en
dc.contributor.author	陳怡伶	zh_TW
dc.date.accessioned	2021-05-13T06:42:53Z	-
dc.date.available	2019-03-01
dc.date.available	2021-05-13T06:42:53Z	-
dc.date.copyright	2017-02-21
dc.date.issued	2017
dc.date.submitted	2017-02-13
dc.identifier.citation	[1] L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25:211–230, 2001. [2] A. An, M. Kargar, and M. ZiHayat. Finding affordable and collaborative teams from a network of experts. In SIAM Int’l Conf. Data Mining (SDM), 2013. [3] L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In Proc. 12th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2006. [4] B. Balasundaram, S. Butenko, and I. V. Hicks. Clique relaxations in social network analysis: The maximum k-plex problem. Operations Research, 59(1):133–142, 2011. [5] A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. [6] D. Berlowitz, S. Cohen, and B. Kimelfeld. Efficient enumeration of maximal k-plexes. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2015. [7] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. 7th Int’l Conf. World Wide Web (WWW), 1998. [8] D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM Int’l Conf. Data Mining (SDM), 2004. [9] V. Chaoji, S. Ranu, R. Rastogi, and R. Bhatt. Recommendations to boost content spread in social networks. In Proc. 21st Int’l Conf. World Wide Web (WWW), 2012. [10] J. Chen, W. Geyer, C. Dugan, M. J. Muller, and I. Guy. Make new friends, but keep the old: recommending people on social networking sites. In Proc. 27th ACM SIGCHI Int’l Conf. Human Factors in Computing Systems (CHI), 2009. [11] A. Epasto, S. Lattanzi, and M. Sozio. Efficient densest subgraph computation in evolving graphs. In Proc. 24th Int’l Conf. World Wide Web (WWW), 2015. [12] U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3):410–421, 2001. [13] A. Gubichev, S. J. Bedathur, S. Seufert, and G. Weikum. Fast and accurate estimation of shortest paths in large graphs. In Proc. 19th ACM Int’l Conf. Information and Knowledge Management (CIKM), 2010. [14] B. He, Y. Ding, J. Tang, V. Reguramalingam, and J. Bollen. Mining diversity subgraph in multidisciplinary scientific collaboration networks: A meso perspective. J. Informetrics, 7(1):117–128, 2013. [15] B. Hendrickson. Chaco. In Encyclopedia of Parallel Computing, pages 248–249. Springer, 2011. [16] D. Hennessey, D. Brooks, A. Fridman, and D. E. Breen. A simplification algorithm for visualizing the structure of complex graphs. In Proc. 12th Int’l Conf. Information Visualization (ICIV), 2008. [17] X. Hu. Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In Proc. 1st IEEE Int’l Conf. Data Mining (ICDM), 2001. [18] X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2014. [19] G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In Proc. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2002. [20] E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Evaluating multi-query sessions. In Proc. 34th Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2011. [21] G. Karypis. Metis and parmetis. In Encyclopedia of Parallel Computing, pages 1117–1124. Springer, 2011. [22] L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953. [23] L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2):181–207, 2003. [24] T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In Proc. 15th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2009. [25] J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proc. 12th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2006. [26] C. Li, P. Resnick, and Q. Mei. Multiple queries as bandit arms. In Proc. 25th ACM Int’l Conf. Information and Knowledge Management (CIKM), 2016. [27] K. Li, W. Lu, S. Bhagat, L. V. S. Lakshmanan, and C. Yu. On social event organization. In Proc. 20th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2014. [28] Y. Li, D. Wu, J. Xu, B. Choi, and W. Su. Spatial-aware interest group queries in location-based social networks. In Proc. 21st ACM Int’l Conf. Information and Knowledge Management (CIKM), 2012. [29] D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. J. Association for Information Science and Technology, 58(7):1019–1031, 2007. [30] R. Lichtenwalter and N. V. Chawla. Lpmade: Link prediction made easy. J. Machine Learning Research, 12:2489–2492, 2011. [31] R. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In Proc. 16th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2010. [32] K. Madduri and D. A. Bader. Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. In Proc. 23rd IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS), 2009. [33] M. Mathioudakis, F. Bonchi, C. Castillo, A. Gionis, and A. Ukkonen. Sparsification of influence networks. In Proc. 17th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2011. [34] A. Maunz, C. Helma, and S. Kramer. Efficient mining for structurally diverse subgraph patterns in large molecular databases. Machine Learning, 83(2):193–218, 2011. [35] J. J. McAuley and J. Leskovec. Discovering social circles in ego networks. ACM Trans. on Knowledge Discovery from Data, 8(1):4:1–4:28, 2014. [36] S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967. [37] K. Mouratidis, J. Li, Y. Tang, and N. Mamoulis. Joint search by social and spatial proximity. IEEE Trans. Knowledge Data Eng., 27(3):781–793, 2015. [38] M. E. J. Newman. The structure of scientific collaboration networks. Proc. of the Nat’l Academy of Sciences USA, 98(2):404–409, 2001. [39] L. Qin, R. Li, L. Chang, and C. Zhang. Locally densest subgraph discovery. In Proc. 21st ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2015. [40] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [41] S. S. Rangapuram, T. Bühler, and M. Hein. Towards realistic team formation in social networks based on densest subgraphs. In Proc. 22nd Int’l Conf. World Wide Web (WWW), 2013. [42] L. Rokach. Ensemble-based classifiers. Artificial Intelligence Rev., 33(1-2):1–39, 2010. [43] R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and M. M. A. Patwary. Fast maximum clique algorithms for large graphs. In Proc. 23rd Int’l Conf. World Wide Web (WWW), 2014. [44] S. B. Roy, L. V. S. Lakshmanan, and R. Liu. From group recommendations to group formation. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2015. [45] N. Ruan, R. Jin, and Y. Huang. Distance preserving graph simplification. In Proc. 11th IEEE Int’l Conf. Data Mining (ICDM), 2011. [46] G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, 1984. [47] V. Satuluri, S. Parthasarathy, and Y. Ruan. Local graph sparsification for scalable clustering. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2011. [48] R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proc. 14th Int’l Conf. Machine Learning (ICML), 1997. [49] D. Shin, S. Si, and I. S. Dhillon. Multi-scale link prediction. In Proc. 21st ACM Int’l Conf. Information and Knowledge Management (CIKM), 2012. [50] M. Sloan and J. Wang. Dynamical information retrieval modelling: a portfolio-armed bandit machine approach. In Proc. 21st Int’l Conf. World Wide Web (WWW), 2012. [51] H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In Proc. 9th ACM SIGCOMM Internet Measurement Conference (IMC), 2009. [52] M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proc. 16th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2010. [53] I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. 18th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2012. [54] H. Toivonen, S. Mahler, and F. Zhou. A framework for path-oriented network simplification. In Proc. 9th Int’l Conf. Advances in Intelligent Data Analysis (IDA), 2010. [55] C. E. Tsourakakis. The k-clique densest subgraph problem. In Proc. 24th Int’l Conf. World Wide Web (WWW), 2015. [56] K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connect. Sci., 8(3):385–404, 1996. [57] T. T. Vu, D. Song, A. Willis, S. N. Tran, and J. Li. Improving search personalisation with dynamic group formation. In Proc. 37th Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2014. [58] H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2003. [59] J. Xiang, C. Guo, and A. Aboulnaga. Scalable maximum clique computation using mapreduce. In Proc. 29th Int’l Conf. Data Eng. (ICDE), 2013. [60] D. Yang, C. Shen, W. Lee, and M. Chen. On socio-spatial group query for location-based social networks. In Proc. 18th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2012. [61] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015. [62] S. Yang, X. Yan, B. Zong, and A. Khan. Towards effective partition management for large graphs. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2012. [63] Y. Yang, D. Yan, H. Wu, J. Cheng, S. Zhou, and J. C. S. Lui. Diversified temporal subgraph pattern mining. In Proc. 22nd ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), 2016. [64] Z. Yang, A. W. Fu, and R. Liu. Diversified top-k subgraph querying in a large graph. In Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD), 2016. [65] M. Yuan, L. Chen, P. S. Yu, and T. Yu. Protecting sensitive labels in social network data anonymization. IEEE Trans. Knowledge Data Eng., 25(3):633–647, 2013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2602	-
dc.description.abstract	As the development and popularization of social networking websites, many recommendation systems tend to leverage the information in social networks to provide helpful suggestions for users, and a great deal of research studies on social network analysis are thereby motivated. Recently, the sizes of social networks have been increasing rapidly, and this growth results in a significant increase in the computational cost of the sophisticated recommendations. The huge size and complexity of social networks create a considerable burden for recommendation systems while processing the information from social networks to provide suggestions. Therefore, in this dissertation, we study three important recommendation problems in social networks and aim to improve their efficiency. First, we focus on the relationship between two users and study the link prediction problem in large-scale networks. During the link prediction, numerous feature values need to be calculated and then combined to make recommendations, and the computational cost grows quickly as the network size becomes larger. Some previous studies involving network processing attempt to lower the computational cost by reducing the network size via sparsification. However, sparsification might remove important information and hurt the prediction accuracy. To address this issue, we propose a framework called Diverse Ensemble of Drastic Sparsification (DEDS), which constructs ensemble classifiers with good accuracy while keeping the prediction time short. DEDS includes various sparsification methods that are designed to preserve different measures of a network. Therefore, DEDS can generate sparsified networks with significant structural differences and increase the diversity of the ensemble classifier, which is key to improving prediction performance. Second, we extend the scope from the relationship between two users to the relationship among a group of users, and study the social group query problem with its applications in activity planning. Considering social links among all users to recommend a mutually acquainted group of attendees for an activity is an NP-hard problem. In addition to finding a group of attendees familiar with each other, selecting an activity period available to all is also essential for activity planning. Therefore, we need to further consider the available time of users, which makes the problem even harder due to the complexity of social connectivity and the diversity of user schedules. In this dissertation, we propose the Social-Temporal Group Query (STGQ) to find suitable time and attendees with minimum total social distance. We design two algorithms, SGSelect and STGSelect, which include various effective pruning strategies to substantially reduce running time. Experimental results indicate that SGSelect and STGSelect are significantly more efficient than baseline approaches. We also conduct a user study to compare the proposed approach with manual activity coordination. The results show that our approach obtains higher quality solutions with less coordination effort, thereby increasing users' willingness to organize activities. Finally, we study the consecutive group query problem to support a sequence of recommendations. When planning an activity, it is difficult for a user to specify all the conditions right at once to find the perfect group of attendees and time. Fortunately, with the aforementioned social-temporal group query, it is easy for the user to tune the parameters to find alternative recommendations. As users may iteratively adjust query parameters to fine tune the results, we further propose Consecutive Social Group Query (CSGQ) to support such needs. Envisaging that exploiting the intermediate solutions of previous queries may improve processing of the succeeding queries, we design a new tree structure, namely, Accumulative Search Tree, which caches the intermediate solutions of historical queries in a compact form for reuse. To facilitate efficient lookup, we further propose a new index structure, called Social Boundary, which effectively indexes the intermediate solutions required for processing each CSGQ with specified parameters. According to the experimental results, with the caching mechanisms, processing time of consecutive queries can be further reduced considerably.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T06:42:53Z (GMT). No. of bitstreams: 1 ntu-106-F96921038-1.pdf: 7265580 bytes, checksum: 01a581a941319eb0cb677b4086b03c14 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	1 Introduction 1 1.1 Motivation and Overview of the Dissertation . . . . . . . . . . . . . . . 1 1.1.1 Efficient Link Prediction in Large-Scale Networks . . . . . . . . 3 1.1.2 Efficient Social-Temporal Group Query . . . . . . . . . . . . . . 4 1.1.3 Efficient Consecutive Group Query Processing . . . . . . . . . . 4 1.2 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . 5 2 Ensemble of Diverse Sparsifications for Link Prediction in Large-Scale Networks 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Proximity Measures in Link Prediction . . . . . . . . . . . . . . 12 2.3.2 Supervised Framework for Link Prediction . . . . . . . . . . . . 13 2.3.3 Data and Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 15 2.4 Diverse Ensemble of Drastic Sparsification . . . . . . . . . . . . . . . . 15 2.4.1 Diverse Sparsification Methods . . . . . . . . . . . . . . . . . . 16 2.4.2 Ensemble with Diversity . . . . . . . . . . . . . . . . . . . . . . 19 2.4.3 Feature Subset Selection . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Strategies for Ensemble Generation . . . . . . . . . . . . . . . . . . . . 22 2.5.1 Accuracy-Based Weight Setting . . . . . . . . . . . . . . . . . . 23 2.5.2 Ensemble Size Augmentation . . . . . . . . . . . . . . . . . . . 26 2.6 Analysis on Accuracy and Efficiency . . . . . . . . . . . . . . . . . . . . 27 2.6.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6.2 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . 30 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Efficient Social-Temporal Group Query with Acquaintance Constraint 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Social Group Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Social-Temporal Group Query . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.2 Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5.2 Performance Analysis of SGQ and STGQ . . . . . . . . . . . . . 59 3.5.3 User Study of Manual Activity Coordination . . . . . . . . . . . 66 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Efficient Processing of Consecutive Group Queries for Social Activity Planning 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Consecutive Social Group Query . . . . . . . . . . . . . . . . . . . . . . 72 4.3.1 Accumulative Search Tree and Social Boundary . . . . . . . . . 73 4.3.2 Solution Acquisition Using AST and SB . . . . . . . . . . . . . 76 4.4 Index Construction and Maintenance . . . . . . . . . . . . . . . . . . . . 78 4.4.1 Node Indexing of AST Using SB . . . . . . . . . . . . . . . . . 78 4.4.2 Updating of AST and SB . . . . . . . . . . . . . . . . . . . . . . 87 4.5 Solution Optimality and Extensions . . . . . . . . . . . . . . . . . . . . 89 4.5.1 Solution Optimality . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.2 Extensions in Temporal Dimension . . . . . . . . . . . . . . . . 93 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6.1 Performance Analysis of CSGQ . . . . . . . . . . . . . . . . . . 95 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Conclusion and Future Work 99 Bibliography 101 Appendices 109 A Detailed Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1 Proof of Theorem 4.5.1 . . . . . . . . . . . . . . . . . . . . . . . 109 B Pseudo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
dc.language.iso	en
dc.title	於社群網路中之高效能鏈結預測與群組查詢	zh_TW
dc.title	Efficient Link Prediction and Group Query in Social Networks	en
dc.type	Thesis
dc.date.schoolyear	105-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	陳良弼,曾新穆,楊得年,沈之涯
dc.subject.keyword	演算法設計與分析,社群網路,鏈結預測,網路稀疏化,查詢處理,索引結構,	zh_TW
dc.subject.keyword	Algorithm Design and Analysis,Social Networks,Link Prediction,Network Sparsification,Query Processing,Index Structure,	en
dc.relation.page	115
dc.identifier.doi	10.6342/NTU201700431
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2017-02-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf	7.1 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。