Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57088
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor廖婉君(Wan-Jiun Liao)
dc.contributor.authorYu-Sheng Chenen
dc.contributor.author陳宇陞zh_TW
dc.date.accessioned2021-06-16T06:34:39Z-
dc.date.available2019-09-04
dc.date.copyright2014-09-04
dc.date.issued2014
dc.date.submitted2014-08-04
dc.identifier.citation1. Lloyd, S., Least squares quantization in PCM. Information Theory, IEEE Transactions on, 1982. 28(2): p. 129-137.
2. Arthur, D. and S. Vassilvitskii. k-means++: The advantages of careful seeding. in Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007. Society for Industrial and Applied Mathematics.
3. Bahmani, B., et al., Scalable k-means++. Proceedings of the VLDB Endowment, 2012. 5(7): p. 622-633.
4. Kanungo, T., et al., An efficient k-means clustering algorithm: Analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2002. 24(7): p. 881-892.
5. Elkan, C. Using the triangle inequality to accelerate k-means. in ICML. 2003.
6. Dhillon, I.S. and D.S. Modha, Concept decompositions for large sparse text data using clustering. Machine learning, 2001. 42(1-2): p. 143-175.
7. Cordeiro de Amorim, R. and B. Mirkin, Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognition, 2012. 45(3): p. 1061-1075.
8. Kaufman, L. and P. Rousseeuw, Clustering by means of medoids. 1987, Statistical Data Analysis Based on the L1 Norm and Related Methods North-Holland.
9. Sibson, R., SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 1973. 16(1): p. 30-34.
10. Defays, D., An efficient algorithm for a complete link method. The Computer Journal, 1977. 20(4): p. 364-366.
11. Girvan, M. and M.E. Newman, Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 2002. 99(12): p. 7821-7826.
12. Newman, M.E. and M. Girvan, Finding and evaluating community structure in networks. Physical review E, 2004. 69(2): p. 026113.
13. Blondel, V.D., et al., Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008. 2008(10): p. P10008.
14. Zachary, W.W., An information flow model for conflict and fission in small groups. Journal of anthropological research, 1977: p. 452-473.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57088-
dc.description.abstract分群問題在許多領域都受到廣泛的應用,包括機器學習、資料探勘、圖像分析及生物信息…等。分群是把資料中的個體透過特定的方法分成不同的組別,使在同一個群集中的個體都具有相似的特性,而在度量空間中我們探討的是更短的空間距離。我們審視了前人提出的眾多演算法,諸如:K-means、K-medoids或其他階層式的分群演算法,我們可以發現這些演算法對群集的定義都是概念式的敘述或操作型的定義,而無法以數學方式精確的定義並描繪出一個群集應該具有的特性。
  在這篇論文中,我們從「相對」的角度出發,定義了點和點以及集合和集合之間的相對距離以及凝聚力。利用集合間的凝聚力我們定義了群集是一群和自己凝聚力強大的點所形成的集合。我們提出一個基於凝聚力的階層式演算法。與其他階層式演算法不同的是我們提出的演算法有一個明確的中止標準,且其輸出的集合皆是我們定義的群集。此外,我們定義了點和集合之間的三角距離,並提出另一個演算法,可以將資料分成固定K個集合的演算法─K-sets 演算法,透過後面的實驗可以推論輸出的K個集合有極大的可能會是我們所定義的群集。我們更進一步證明若我們放寬對距離的限制(如:允許非對稱與非正值的距離),我們所提出的定義和演算法仍然適用。最後透過實驗結果,我們可以發現現實中的群集確實符合我們對其的定義,且我們提出的兩個演算法可以正確地將資料中的群集分類。
zh_TW
dc.description.abstractClustering analysis is widely used in so many fields, such as machine learning, data mining, image analysis and bioinformatics. Clustering is the task of grouping a set of objects in a specific way that objects in the same cluster have similar characteristic. Here, we consider the shorter distance measure in the metric spaces as the similar characteristic. We review some known clustering algorithms, such as K-means, K-medoids, and hierarchical clustering algorithms. We find out that all of these algorithms only give cluster a conceptual description or an operational definition. However, they cannot use a mathematical point of view to describe what a cluster looks like.
  In this work, we start from the point of view of “relative” and define the relative distance and cohesion measure between any two points or two sets. Making use of the cohesion measure, we define a cluster is a set of data points which is cohesive to itself. Then, we propose a hierarchical algorithm based on the cohesion measure. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a stop criterion and it stops with a partition of clusters. In addition, we define the triangular distance between a point and a set, and propose another clustering algorithm that can separate the dataset into fixed-K sets. We will show that all the K sets may probably be clusters in the experiment. We further proof that our definitions and algorithms will still hold if we release some constraints of the distance measure. Finally, we can show that clusters in reality really meet the definition of ours, and the two proposed algorithms can correctly separate the dataset into several clusters.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T06:34:39Z (GMT). No. of bitstreams: 1
ntu-103-R01921042-1.pdf: 1936667 bytes, checksum: 6fb0a6057b4231e3119d4383fc7b069e (MD5)
Previous issue date: 2014
en
dc.description.tableofcontents口試委員會審定書 #
誌謝 i
中文摘要 iii
ABSTRACT v
CONTENTS vii
LIST OF FIGURES ix
LIST OF TABLES xi
Chapter 1 Introduction 1
1.1 Clustering in Metric Spaces 1
1.2 Review of Known Clustering Algorithm 2
1.2.1 K-means Algorithm 2
1.2.2 K-medoids Algorithm 3
1.2.3 Hierarchical Clustering Algorithm 4
1.3 Motivation 4
1.4 Thesis Organization 5
Chapter 2 Clusters in Metric Spaces 7
2.1 Relative Distance and Cohesion Measure 7
2.2 Clusters 17
2.3 Hierarchical Agglomerative Clustering Algorithm 21
Chapter 3 Triangular Distance and the K-sets Algorithm 25
3.1 Triangular Distance 25
3.2 The K-sets Algorithm 29
3.3 The Sparse K-sets Algorithm 32
Chapter 4 Extension 37
4.1 The Constraint of Distance Measure 37
Chapter 5 Experimental Result 55
5.1 Evaluation of Cluster Definition 55
5.1.1 Zachary’s Karate Club Network 55
5.1.2 American College Football Network 56
5.2 Evaluation of Hierarchical Agglomerative Clustering Algorithm 59
5.2.1 Computer Generated Data 59
5.2.2 Zachary’s Karate Club Network 60
5.3 Evaluation of the K-sets Algorithm 62
5.3.1 Computer Generated Data 62
5.3.2 Two Rings 63
5.3.3 Clusters Verification 65
Chapter 6 Conclusion and Future Work 67
6.1 Conclusion 67
6.2 Future Work 67
REFERENCE 69
dc.language.isoen
dc.subject階層式演算法zh_TW
dc.subject凝聚力zh_TW
dc.subject距離zh_TW
dc.subject分群zh_TW
dc.subject分區式演算法zh_TW
dc.subjectclusteringen
dc.subjectdistance measureen
dc.subjectcohesion measureen
dc.subjecthierarchical algorithmen
dc.subjectpartitional algorithmen
dc.title度量空間分群問題的數學理論zh_TW
dc.titleA Mathematical Theory for Clustering in Metric Spacesen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.oralexamcommittee張正尚(Cheng-Shang Chang),陳銘憲(Ming-Syan Chen),林軒田(Hsuan-Tien Lin),陳和麟(Ho-Lin Chen)
dc.subject.keyword分群,距離,凝聚力,階層式演算法,分區式演算法,zh_TW
dc.subject.keywordclustering,distance measure,cohesion measure,hierarchical algorithm,partitional algorithm,en
dc.relation.page70
dc.rights.note有償授權
dc.date.accepted2014-08-04
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
1.89 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved