架構於雲端之平行化密度分群演算法

Tze-Yu Chen; 陳則諭

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28796

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲(Ming-Syan Chen)
dc.contributor.author	Tze-Yu Chen	en
dc.contributor.author	陳則諭	zh_TW
dc.date.accessioned	2021-06-13T00:23:12Z	-
dc.date.available	2013-08-08
dc.date.copyright	2011-08-08
dc.date.issued	2011
dc.date.submitted	2011-08-04
dc.identifier.citation	[1] D. Arlia and M. Coppola, “Experiments in parallel clustering with dbscan”, In Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 326-331. Springer, Heidelberg, 2001. [2] E. Januzaj, H.-P. Kriegel, and M. Pfeifle, “Scalable density-based distributed clustering”, In Boulicaut, J,-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 231-244. Springer, Heidelberg, 2004. [3] Hadoop. “http://hadoop.apache.org/” [4] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters”, In Proc. of the 6th OSDI Symp., 2004. [5] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, In Proc. of th 2nd KDD Conf., pp. 226-231, 1996. [6] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks”, In Proc. of the 2007 EuroSys Conf., pp. 59-72, 2007. [7] M. Stonebraker, J. Frew, K. Gardels, and J. Meredith, “The SEQUOIA 2000 Benchmark”, Proc. ACM SIGMOD Int. Conf. on Management of Data, Washington, DC, pp. 2-11, 1993. [8] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles”, Proc. ACM SIGKDD Int. Conf. on Management of Data (SIGMOD'90), Atlantic City, NJ, ACM Press, New York, pp. 322-331, 1990. [9] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System”, In Proc. of 19th ACM Symposium on Operating Systems Principles, Lake George, NY, 2003. [10] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: A New Data Clustering Algorithm and Its Applications”, Kluwer Academic Publishers, pp. 1-40, 1998. [11] V. Gaede and O. Gunther, “Multidimensional access methods”, ACM Comput. Surv. 30(2), 170-231, 1998. [12] X. Xu, J. Jager, and H.-P. Kriegel, “A fast parallel clustering algorithm for large spatial databases”, Data Min. Knowl. Discov. 3(3), 263-290, 1999. [13] Y.-C. Kwon, D. Nunley, J. P. Gardner, M. Balazinska, B. Howe, and S. Loebman, “Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster”, SSDBM 10.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28796	-
dc.description.abstract	DBSCAN是一個頗負盛名的密度分群演算法，它的特色是可以在充滿雜訊的環境中找到具有任意形狀的群體。然而當資料的規模愈來愈大時，因為單一的電腦很難在效能上有所提升，DBSCAN無法有效率地處理這大量的資料。雲端運算在近期逐漸發長成熟，他可以幫助我們處理傳統演算法在面對大量資料時效率不佳的問題。在這篇論文裡，我們提出了CDBSCAN演算法，CDBSCAN代表以雲端(Cloud)為基礎的DBSCAN演算法，它是一個分散式版本的DBSCAN演算法，並且被實施在一個雲端的平台上─Hadoop。我們利用Map/Reduce的工作將在各個區間的資料做分群，並且將這些各自的分群結果做整合，成為最後地分群結果。我們的實驗顯示CDBSCAN是一個高度平行化的演算法，他只需要一個Map/Reduce的工作，並達到趨近於線性的擴展性。	zh_TW
dc.description.abstract	DBSCAN is one of the well-known density-based clustering algorithms which can identify clusters with arbitrary shape in a noisy space. However, when the scales of the data become larger and larger, DBSCAN is unable to process the data efficiently due to the difficulty of a single machine to scale up. Recently, the development of cloud computing is gradually mature which can help us manage the issue of scalability. In this thesis, we propose an algorithm CDBSCAN, standing for cloud based DBSCAN, which is a distributed version of DBSCAN and is implemented on the Hadoop platform. We use Map/Reduce jobs to cluster the partitioned data set and merge the individual clustering results. The experimental evaluations show that CDBSCAN is a highly parallel algorithm that only requires one Map/Reduce job and achieves near-linearly scalability.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T00:23:12Z (GMT). No. of bitstreams: 1 ntu-100-R98921044-1.pdf: 628604 bytes, checksum: b9b3a8b3c9cd5321b470a0ec5aa6a264 (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	口試委員會審定書 # Acknowledgement i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES v Chapter 1 Introduction 1 Chapter 2 The Algorithm DBSCAN 5 Chapter 3 CDBSCAN: A Density-Based Clustering Algorithm on Cloud 7 3.1 Data Partition 10 3.2 Local Clustering 15 3.3 Overlap Analysis 16 3.4 Cluster Merging 20 Chapter 4 Experimental Evaluation 21 4.1 Experiment Setup 21 4.2 Experimental Results 22 4.2.1 Size-up 23 4.2.2 Scale-up 25 4.2.3 Discussion on Distributed Scheme 26 Chapter 5 Related Work 28 Chapter 6 Conclusion 33 REFERENCE 34
dc.language.iso	en
dc.title	架構於雲端之平行化密度分群演算法	zh_TW
dc.title	CDBSCAN:Cloud Based DBSCAN Clustering Algorithm	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	呂俊賢(Chun-Shien Lu),林永松(Yeong-Sung Lin),吳尚鴻(Shan-Hung Wu)
dc.subject.keyword	分群演算法,平行式演算法,分散式演算法,雲端運算,Hadoop,	zh_TW
dc.subject.keyword	Clustering Algorithms,Parallel Algorithms,Distributed Algorithms,Cloud Computing,Hadoop,	en
dc.relation.page	35
dc.rights.note	有償授權
dc.date.accepted	2011-08-05
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	613.87 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。