基於圖論之交易、地理與社群資料探勘技術

Chih-Hua Tai; 戴志華

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38061

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲(Ming-Syan Chen)
dc.contributor.author	Chih-Hua Tai	en
dc.contributor.author	戴志華	zh_TW
dc.date.accessioned	2021-06-13T15:59:26Z	-
dc.date.available	2013-08-11
dc.date.copyright	2011-08-11
dc.date.issued	2011
dc.date.submitted	2011-08-10
dc.identifier.citation	[1] R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of ICDE, 1995. [2] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of ACM SIGMOD, 2000. [3] C. D. Ahrens. Meteorology Today, 9th ed., Brooks-Cole, 2009. [4] L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proc. of WWW, 2007. [5] L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proc. of KDD, 2006. [6] A. Ben-Hur, D. Horn, H. Siegelmann, and V. Vapnik. Support vector clustering. J. Machine Learning Research, 2, 2001. [7] R. Buyya, C. S. Yeo, and S. Venugopal. Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities, in. In Proc. of CSSE, pages 10–1016, 2008. [8] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan. Time bounds for selection. J. Comput. System Sci., 7, 1973. [9] L. Cao, P. S. Yu, C. Zhang, and H. Zhang. Data Mining for Business Applications. Springer, 2008. [10] D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In Proc. of SDM, 2004. [11] M. Chein and M.-L. Mugnier. Graph-based Knowledge Representation: Computational Foundations of Conceptual Graphs (Advanced Information and Knowledge Processing), Springer, 2008. [12] Chein, M., Mugnier, M.-L. Graph-based Knowledge Representation: Computational Foundations of Conceptual Graphs, Springer, 2009. [13] M.-S. Chen, J. Han, and P. S. Yu. Data mining: An overview from database perspective. IEEE TKDE, 8(6), 1996. [14] J. Cheng, A. W. Fu, and J. Liu, K-isomorphism: privacy preserving network publication against structural attacks. In Proc. of SIGMOD, 2010. [15] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E., 70(6), 2004. [16] Graham Cormode, Divesh Srivastava, Ting Yu, and Qing Zhang. Anonymizing Bipartite Graph Data using Safe Groupings. In Proc. of VLDB, 2008. [17] D.J. Cook and L.B. Holder, Mining Graph Data, John Wiley & Sons, 2007. [18] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of KDD, 1996. [19] V. Estivill-Castro and I. Lee. Autoclust+: Automatic clustering of point-data sets in the presence of obstacles. In Proc. of TSDM, 2000. [20] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proc. of ACM SIGKDD, 2002. [21] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. Cambridge, Mass: MIT Press, 1996. [22] S. Fortunato. Community detection in graphs. Physics Reports, 2010. [23] T. Gaertner. A survey of kernels for structured data. SIGKDD Explorations, 5(1), 2003. [24] B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS., 2006. [25] W. Geamsakul, T. Matsuda, T. Yoshida, H. Motoda, and T. Washio. Classifier construction by graph-based induction for graph-structured data. In Proc. of PAKDD, 2003. [26] F. Giannotti, L. V. Lakshmanan, A. Monreale, D. Pedreschi, and H. Wang. Privacy-preserving mining of association rules from outsourced transaction databases. In SPCC Workshop, 2010. [27] J. Gonzalez, L. B. Holder, and D. J. Cook, Graph-Based Relational Concept Learning, Proceedings of the International Machine Learning Conference, 2002. [28] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. of SIGMOD, 1998. [29] S. Guha, K. Shim, and R. Rastogi. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of ICDE, 1999. [30] J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proc. of VLDB, 1995. [31] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. [32] M. Hay, G. Miklau, D. Jensen, D. F. Towsley, and P. Weis. Resisting structural re-identification in anonymized social networks. In Proc. of VLDB, 2008. [33] H. He, H. Wang, J. Yang, P. S. Yu: BLINKS: ranked keyword searches on graphs. In Proc. of SIGMOD, 2007. [34] D. Hush and C. Scovel. Polynomial-time decomposition algorithms for support vector machines. In Machine Learning, 2003. [35] A. Inokuchi, T.Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50, 2003. [36] I. Jonyer, L. Holder, and D. J. Cook, MDL-Based Context-Free Graph Grammar Induction, Proceedings of the Sixteenth International Conference of the Florida AI Research Society, May 2003. [37] M. Kamber and J. Han. Data Mining: Concepts and Techniques. Morgan Kaufmann; 2 edition, 2005. [38] M. Kantarcioglu, R. Nix, and J. Vaidya. An efficient approximate protocol for privacy-preserving association rule mining. In Proc. of PAKDD, 2009. [39] D. Aloise, A. Deshpande, P. Hansen, P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75(2), 2009. [40] B. King. Step-wise clustering procedures. J. Am. Statistical Assoc., 69, 1967. [41] R. Kondor and J. LaRerty. DiRusion kernels on graphs and other discrete input space. In Proc. of ICML, 2002. [42] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proc. of SIGKDD, 2006. [43] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, 2001. [44] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In Proc. of WWW, 2008. [45] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. of ICDE, 2007. [46] C.-R. Lin, K.-H. Liu, and M.-S. Chen. Dual clustering: Integrating data clustering over optimization and constraint domains. IEEE TKDE, 17(5), 2005. [47] K. Liu, K. Das, T. Grandison, and H. Kargupta. Privacy-preserving data analysis on graphs and social networks. In H. Kargupta, J. Han, P. Yu, R. Motwani, and V. Kumar, editors, Next Generation Data Mining. CRC Press, 2008. [48] K. Liu and E. Terzi. Towards identity anonymization on graphs. In Proc. of ACM SIGMOD, 2008. [49] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1), 2007. [50] J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. In Proc. of Berkeley Symposium on Mathematical Statistics and Probability, 1967. [51] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. [52] T. Mielik¨ainen. Privacy problems with anonymized transaction databases. In Proc. of Discovery Science, 2004. [53] I. Molloy, N. Li, and T. Li. On the (in)security and (im)practicality of outsourcing precise association rule mining. In Proc. of ICDM, 2009. [54] A. Nanopoulos, Y. Theodoridis, and Y. Manolopoulos. C2p: Clustering based on closest pairs. In Proc. of VLDB, 2001. [55] R. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In Proc. of VLDB, 1994. [56] S. Nijssen and J. Kok. Faster association rules for multiple relations. In Proc. of IJCAI, 2001. [57] J. Pei and B. Zhou. Preserving privacy in social networks against neighborhood attacks. In Proc. of ICDE, 2008. [58] L. Qiu, Y. Li, and X. Wu. Preserving privacy in association rule mining with bloom filters. J. Intell. Inf. Syst, 29(3):253–278, 2007. [59] L. D. Raedt and S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding. In Proc. of IJCAI, 2001. [60] P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. In Proc. of ACM PODS, 1998. [61] G. Sheikholeslami, S. Chatterjee, and A. Zhang. Wavecluster: A wavelet based clustering approach for spatial data in very large database. VLDBJ, 8(3/4), 1999. [62] R. Spillman, M. Janssen, B. Nelson, and M. Kepner. Use of a genetic algorithm in the cryptanalysis of simple substitution ciphers. Cryptologia, XVII(1):31–44, 1993. [63] R. Srikant and R. Agrawal. Mining generalized association rules. In Proc. of VLDB, 1995. [64] X. Sun and P. S. Yu. A border-based approach for hiding sensitive frequent itemsets. In Proc. of ICDM, 2005. [65] L. Sweeney. k-anonymity: A model for protecting privacy. IJUFKS, 10(5), 2002. [66] C.-H. Tai, B.-R. Dai and M.-S. Chen. Incremental Clustering in Geography and Optimization Spaces. In Proc. of PAKDD, 2007. [67] C.-H. Tai, D.-N. Yang, L.-T. Lin, and M.-S. Chen. Recommending Personalized Scenic Itinerary with Geo-Tagged Photos. In Proc. of ICME, 2008. [68] C.-H. Tai, P. S. Yu, and M.-S. Chen. k-Support Anonymity based on Pseudo Taxonomy for Outsourcing of Frequent Itemset Mining. In Proc. of KDD, 2010. [69] C.-H. Tai, P. S. Yu, and M.-S. Chen. Structural Diversity for Privacy in Publishing Social Networks. In Proc. of SDM, 2011. [70] C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen. Privacy-Preserving Social Network Publication Against Friendship Attacks. In Proc. of KDD, 2011. [71] A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Spatial clustering in the presence of obstacles. In Proc. of ICDE, 2001. [72] J. M.Wallace and P. V. Hobbs. Atmospheric Science: An Introductory Survey, 2nd Ed., Academic Press, 2006. [73] W. Wang, J. Yang, R. Muntz. STING: A Statistical Information grid Approach to Spatial Data Mining. In Proc. of VLDB, 1997. [74] J. T. L. Wang, M. J. Zaki, H. T. T. Toivonen, and D. Shasha. Data Mining in Bioinformatics. Springer-Verlag London, UK, 2005. [75] T. Washio, H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations 5(1), 2003. [76] W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N. Mamoulis. Security in outsourcing of association rule mining. In Proc. of VLDB, 2007. [77] X. Wu, X. Ying, K. Liu, and L. Chen. A Survey of Privacy- Preservation of Graphs and Social Networks. Springer US, 2010. [78] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc. of ICDM, 2002. [79] X. Ying and X. Wu. Randomizing social networks: a spectrum preserving approach. In Proc. of SDM, 2008. [80] C. You, L. Holder and D. Cook, Learning Patterns in the Dynamics of Biological Networks, In Proc. of SIGKDD, June 2009. [81] C. You, L. Holder and D. Cook, Substructure Analysis of Metabolic Pathways by Graph-based Relational Learning, in A. Sidhu, T. Dillon, and E. Chang (Editors), Biomedical Data and Applications, Springer, September 2009. [82] O. R. Zaiane, A. Foss, C. H. Lee, and W. Wang. On data clustering analysis: Scalability, constraints, and validation. In Proc. of PAKDD, 2002. [83] M. Zaki. Effciently mining frequent trees in a forest. In Proc. of KDD, 2002. [84] J. Zhang, W. Hsu, and M. L. Lee. Clustering in dynamic spatial database. Journal of Intelligent Information System, 24(1), 2005. [85] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An Efficient Data. Clustering Method for Very Large. Databases. In Proc. of SIGMOD, 1996. [86] L. Zhang and W. Zhang. Edge anonymity in social network graphs. In Proc. of CSE, 2009. [87] P. Zhao and J. Han. On Graph Query Optimization in Large Networks. In Proc. of VLDB, 2009. [88] P. Zhao, J. X. Yu, P. S. Yu: Graph Indexing: Tree Delta >= Graph. In Proc. of VLDB, 2007. [89] E. Zheleva and L. Getoor. Preserving the privacy of sensitive relationships in graph data. In Proc. of PinKDD, 2007. [90] B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks. In Proc. of ICDE, 2008. [91] B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explorations, 10(2), 2008. [92] L. Zou, L. Chen, and M. T. O¨ zsu. K-automorphism: a general framework for privacy preserving network publication. In Proc. of VLDB, 2009. [93] F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. of ICDE, 2007.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38061	-
dc.description.abstract	資料探勘是一門與資料應用方式相關的技術領域，在過去十年此領域已受到相當的矚目。在此領域興起之初，各種技術發展主要以處理集合式資料以及序列式資料為主，應用範圍包含市場交易、電腦網路訊息、生物序列處理……等。然而，在更多實際的應用範圍裡，資料是與結構化資訊息息相關，顯示出研發新技術需求的急迫性，因此，基於圖論的資料探勘技術之發展顯得越來越重要。基於圖論之資料探勘技術其應用範圍相當廣泛，在本論文中，我們將依序各別探討其在三種複雜度之資料應用的實際問題。首先，由於雲端計算的興起，那些缺乏資料探勘專業技術的人們如今也能靠著外包服務享受資料探勘所帶來的好處，然而，任何外包服務也將伴隨著資料隱私安全性的疑慮。在第二章，我們探討如何保護頻繁項集挖掘外包服務中的資料隱私安全。此議題主要有兩個挑戰：如何用合理的開銷保護資料隱私安全，以及如何對抗具有相關背景知識者之攻擊。為了解決這些挑戰，我們提出了k-support anonymity隱私保護概念與k-support anonymization演算法。此演算法藉由建立假的分類樹隱藏敏感物項，通過利用只有在葉級層的物項需要出現在交易資料中的特性，在保護資料隱私的同時亦能限制儲存開銷。接下來，我們注意到由感應器收集到的資料包含地理位置屬性和訊息屬性，一般空間叢集因為只考慮地理屬性，只能找出資料點密集區，因此不適用於規劃訊息相似的空間區塊。所以，我們在第三章探討訊息數據的空間叢集問題。此議題的挑戰在於地理位置屬性和訊息屬性代表著不同的概念，在叢集的過程中不適合用同一種方式處理。為因應此挑戰，我們提出BiAgree演算法，此演算法藉由建立一圖形結構，將訊息屬性和地理位置屬性分別整合在圖形結構的節點與邊上。之後，只要將圖形結構分割成訊息一致的連接組件，就可以不受資料點密度的影響，規劃出訊息相似的空間區塊。此外，靠著維護此圖形結構，BiAgree演算法也具有即時計算能力，以獲得高質量或高效率的空間叢集結果。最後，有鑑於社群資料及其在各業應用與服務的快速增長，社群資料中用戶者的隱私問題也逐漸受到關注。近幾年已有些研究被提出來解決用戶個人資料、用戶身分鑑定、用戶關係鑑定……等隱私議題，然而，相對於社群資料中富含的大量訊息，在被公開分享的社群資料中，用戶隱私議題尚未被充分解決。在第四章，我們探討一新的隱私議題──群體關係鑑定。一個人的群體關係資訊會自然隱含在社群資料之中，指示此人所參與的團體以及和他人的連繫關係，也可能代表著一些敏感的個人隱私，如在網路上的政治活動、疾病支持群組、朋友群……等等。為保護這種資訊，我們提出k-structural diversity隱私模型，並設計整數規劃方法尋找最加解決方案。此外，考量社群資料的龐大，我們也從不同的觀點，提出三個具擴展性的經驗演算法。	zh_TW
dc.description.abstract	Data Mining is a data-and-application dependant technique, and has received significant attentions in the last decade. In the past years, various techniques have been developed to deal with set or sequence data in business marketing, computer networks, bioinformatics, to name a few. Many real applications, however, have called for the need of new techniques to tackle data with structural information, i.e., graphs. Graph-based data mining, which discovers novel knowledge in graph-represented data, is thus becoming more and more important. In this dissertation, motivated by the fact that graph-based data mining is still in its fancy compared to the wide applications, we attempt to address the use of graph-based data mining in realistic problems with three kinds of data complexity, respectively. First, due to the rise of cloud computing, people who lack of expertise in data mining and/or computational resources now can also take advantages from data mining by outsourcing their mining tasks. However, for any outsourcing service, privacy is a major concern. In Chapter 2, we study the problem of privacy protection in outsourcing frequent itemset mining. This problem has two challenges. One is on how to protect sensitive information, including the raw data and the frequent itemsets, with reasonable overhead and preserve the precise mining results. The other is how to protect against an attacker with related background knowledge such as item support information. To overcome these challenges, we propose k-support anonymity and develop a novel encryption approach that constructs a pseudo taxonomy tree to hide sensitive items. By leveraging the property that only the items at the leaf level of the taxonomy need to be appear at the transactions, the storage overhead is limited while the privacy protection is conformed. Second, note that data collected by sensors can consist of not only geographic attributes but also informative attributes. Since the spatial-alone clustering approaches consider only the geographic attributes to identify spatial clusters at data-dense regions, it is infeasible to obtain spatial clusters with informatively similar data points from such data by the spatial-alone clustering approaches. Therefore, we address the informative spatial data clustering (ISDC) problem in Chapter 3. One of the main challenges in this problem is that geographic and informative attributes represent different concepts and should not be tackled in the same way in clustering. To overcome this challenge, we proposed Algorithm BiAgree that introduces a graph structure, named NeiGraph, to integrate informative attributes and geographic attributes in vertices and edges, respectively. Afterward, Algorithm BiAgree is able to identify informatively similar regions regardless of the data density by partitioning NeiGraph into informative-consistent connected components. In addition, by maintaining NeiGraph, Algorithm BiAgree also provides the online computing capability to acquire the solutions with high quality and smaller computation time respectively. Finally, as the rapid growth in the number of services and applications leverage social network data, there is increasing concern about privacy issues in published social networks. Recently several studies have addressed the privacy issues on vertex/edge attributes, vertex identity, link disclosure, and so on. However, compared to the rich information inherent in graph data, the privacy issues in publications of social networks have not been fully solved. In Chapter 4, we address a new privacy issue, referred to as the community identification. The community identity of an individual is a kind of structural information that indicates the neighborhood or connections of the individual. The community identity could also represent the personal privacy information sensitive to the public, such as on-line political activity group, on-line disease support group information, or friend group in a social network. To protect such information, therefore, we propose a new privacy model, named k-structural diversity, and develop an Integer Programming formulation to find the optimal solutions to k-SDA. Moreover, we devise three scalable heuristics to solve the large instances of k-SDA with different perspectives.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T15:59:26Z (GMT). No. of bitstreams: 1 ntu-100-F92921029-1.pdf: 2242264 bytes, checksum: 0e3b9a2d914d41e0e0bf1aa7da32bafc (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Support Anonymity Based on Pseudo Taxonomy for Outsourcing of Frequent Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Geographic Space Clustering Based on Informative Sensor Data . . . . . . . . 3 1.2.3 Structural Diversity for Privacy in Publishing Social Networks . . . . . . . . . 4 1.3 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Support Anonymity Based on Pseudo Taxonomy for Outsourcing of Frequent Itemset Mining 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 A Naive Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Encryption & Decryption Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 Generalization of the Mining Task . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.3 k-Support Anonymity with Taxonomy Tree . . . . . . . . . . . . . . . . . . . 18 2.3.4 Decryption of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.5 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Performance Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Cost of Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Geographic Space Clustering Based on Informative Sensor Data 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Algorithm BiAgree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.2 The Binding Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.3 The Aggregation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Incremental Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.2 The BiAgree-IQual algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 The BiAgree-IEff algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5 Performance Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5.2 Spatial Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.3 Purity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.5 Incremental Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Structural Diversity for Privacy in Publishing Social Networks 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.1 Formulation with Adding Edges . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Formulation with Adding Edges & Splitting Vertices . . . . . . . . . . . . . . 73 4.4 Scalable Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 Algorithm EdgeConnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.2 Algorithm CreateBySplit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.3 Algorithm MergeBySplit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.2 Privacy Violation in Real Social Networks . . . . . . . . . . . . . . . . . . . . 86 4.5.3 Anonymization Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5 Conclusion 95 Bibliography 97
dc.language.iso	en
dc.subject	隱私保護	zh_TW
dc.subject	基於圖論之資料探勘	zh_TW
dc.subject	匿名	zh_TW
dc.subject	分群	zh_TW
dc.title	基於圖論之交易、地理與社群資料探勘技術	zh_TW
dc.title	Graph-based Data Mining for Transactional,Spatial and Social-networking Data	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	李素瑛(Suh-Yin Lee),陳良弼(Arbee L.P. Chen),李官陵(Guan-Ling Lee),曾新穆(Vincent S. Tseng),戴碧如(Bi-Ru Dai),吳尚鴻(Brandon Shan-Hung Wu)
dc.subject.keyword	基於圖論之資料探勘,隱私保護,匿名,分群,	zh_TW
dc.subject.keyword	Graph-based Data Mining,Privacy-preserving,Anonymity,Grouping,	en
dc.relation.page	101
dc.rights.note	有償授權
dc.date.accepted	2011-08-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 未授權公開取用	2.19 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。