請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70293
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 王奕翔 | |
dc.contributor.author | Chung-Yi Lin | en |
dc.contributor.author | 林宗毅 | zh_TW |
dc.date.accessioned | 2021-06-17T04:25:23Z | - |
dc.date.available | 2019-08-15 | |
dc.date.copyright | 2018-08-15 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-15 | |
dc.identifier.citation | [1] E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18(177):1 – 86, 2017.
[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016. [3] E. Abbe and C. Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688, Oct 2015. [4] E. Abbe and C. Sandon. Recovering communities in the general stochastic block model without knowing the parameters. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 676–684. Curran Associates, Inc., 2015. [5] E. Abbe and C. Sandon. Achieving the ks threshold in the general stochastic block model with linearized acyclic belief propagation. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1334–1342. Curran Associates, Inc., 2016. [6] E. Abbe and C. Sandon. Proof of the achievability conjectures for the general stochastic block model. Communications on Pure and Applied Mathematics, 71(7):1334–1406, 2017. [7] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. Proceedings of International Conference on Machine Learning (ICML), pages 17–24, 2006. [8] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise clustering. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 838–845 vol. 2, 6 2005. [9] K. Ahn, K. Lee, and C. Suh. Community recovery in hypergraphs. In Allerton Conference on Communication, Control and Computing. UIUC, 2016. [10] N. Alon and J. H. Spencer. The probabilistic method. John Wiley & Sons, 2004. [11] P. Andritsos, P. Tsaparas, R. J. Miller, and K. C. Sevcik. Limbo: Scalable clustering of categorical data. In International Conference on Extending Database Technology, pages 123–146. Springer, 2004. [12] M. C. Angelini, F. Caltagirone, F. Krzakala, and L. Zdeborová. Spectral detection on sparse hypergraphs. In Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, pages 66–73. IEEE, 2015. [13] C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725, Jul 2000. [14] C. Bordenave, M. Lelarge, and L. Massoulié. Nonbacktracking spectrum of random graphs: Community detection and nonregular ramanujan graphs. Ann. Probab., 46(1):1–71, Jan 2018. [15] S. R. Buló and M. Pelillo. A game-theoretic approach to hypergraph clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1312–1327, 6 2013. [16] S. Chatterjee. Matrix estimation by universal singular value thresholding. Ann. Statist., 43(1):177–214, Feb 2015. [17] K. Chen and J. Lei. Network cross-validation for determining the number of communities in network data. Journal of the American Statistical Association, 113(521):241–251, 2018. [18] Y. Chen and C. Suh. Spectral MLE: top-k rank aggregation from pairwise comparisons. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pages 371–380, 2015. [19] I. Chien, C.-Y. Lin, and I.-H. Wang. Community detection in hypergraphs: Optimal statistical limit and efficient algorithms. In A. Storkey and F. Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 871–879, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018. PMLR. [20] I. E. Chien, C. Lin, and I. Wang. On the minimax misclassification ratio of hypergraph community detection. CoRR, abs/1802.00926, 2018. [21] P. Chin, A. Rao, and V. Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. In COLT, pages 391–423, 2015. [22] F. Chung and M. Radcliffe. On the spectra of general random graphs. Electr. J. Comb., 18, Oct 2011. [23] A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116–140, March 2001. [24] C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970. [25] O. Duchenne, F. Bach, I. S. Kweon, and J. Ponce. A tensor-based algorithm for high-order graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):2383–2395, 12 2011. [26] S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75 – 174, 2010. [27] C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou. Achieving optimal misclassification proportion in stochastic block models. Journal of Machine Learning Research, 18(60):1–45, 2017. [28] D. Ghoshdastidar and A. Dukkipati. Consistency of spectral partitioning of uniform hypergraphs under planted partition model. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pages 397–405, 2014. [29] D. Ghoshdastidar and A. Dukkipati. A provable generalized tensor spectral method for uniform hypergraph partitioning. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pages 400–409, 2015. [30] D. Ghoshdastidar and A. Dukkipati. Consistency of spectral hypergraph partitioning under planted partition model. Ann. Statist., 45(1):289–315, 02 2017. [31] B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hidden community. IEEE Transactions on Information Theory, 63(8):4729–4745, Aug 2017. [32] J. M. Hofman and C. H. Wiggins. Bayesian approach to network modularity. Phys. Rev. Lett., 100:258701, Jun 2008. [33] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983. [34] B. P. J. and S. Purnamrita. Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1):253–273, 2016. [35] V. Jog and P. Loh. Information-theoretic bounds for exact recovery in weighted stochastic block models using the renyi divergence. CoRR, abs/1509.06418, 2015. [36] H. Kesten and B. P. Stigum. A limit theorem for multidimensional galton-watson processes. Ann. Math. Statist., 37(5):1211–1223, Oct 1966. [37] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013. [38] P. Latouche, E. Birmelé, and C. Ambroise. Variational bayesian inference and complexity control for stochastic block models. Statistical Modelling, 12(1):93–115, 2012. [39] C. M. Le and E. Levina. Estimating the number of communities in networks by spectral methods. CoRR, abs/1507.00827, 2015. [40] J. Lei. A goodness-of-fit test for stochastic block models. Ann. Statist., 44(1):401–424, Feb 2016. [41] J. Lei, A. Rinaldo, et al. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015. [42] P. Li, H. Dau, G. Puleo, and O. Milenkovic. Motif clustering and overlapping clustering for social network analysis. arXiv preprint arXiv:1612.00895, 2016. [43] M. Lichman. UCI machine learning repository, 2013. [44] C.-Y. Lin, I. E. Chien, and I.-H. Wang. On the fundamental statistical limit of community detection in random hypergraphs. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2178–2182, Jun 2017. [45] A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: Theory of Majorization and its Applications, volume 143. Springer, second edition, 2011. [46] M. Mezard and A. Montanari. Reconstruction on trees and spin glass transition. Journal of Statistical Physics, 124(6), Dec 2006. [47] M. A. Riolo, G. T. Cantwell, G. Reinert, and M. E. J. Newman. Efficient method for estimating the number of communities in a network. Phys. Rev. E, 96:032310, Sep 2017. [48] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1997. [49] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist., 39(4):1878–1915, Oct 2011. [50] L. Rozovsky. A lower bound of large-deviation probabilities for the sample mean under the cramér condition. Journal of Mathematical Sciences, 118(6):5624–5634, 2003. [51] A. Saade, F. Krzakala, and L. Zdeborová. Spectral clustering of graphs with the bethe hessian. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pages 406–414, Cambridge, MA, USA, 2014. MIT Press. [52] G. Stewart and J. guang Sun. Matrix Perturbation Theory. Computer science and scientific computing. Academic Press, 1990. [53] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007. [54] Y. X. R. Wang and P. J. Bickel. Likelihood-based model selection for stochastic block models. Ann. Statist., 45(2):500–528, Apr 2017. [55] B. Yan, P. Sarkar, and X. Cheng. Provable estimation of the number of blocks in block models. In A. Storkey and F. Perez-Cruz, editors, Proceedings of the TwentyFirst International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1185–1194, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018. PMLR. [56] S.-Y. Yun and A. Proutiere. Optimal cluster recovery in the labeled stochastic block model. ArXiv e-prints, October 2015. [57] A. Y. Zhang and H. H. Zhou. Minimax rates of community detection in stochastic block models. Ann. Statist., 44(5):2252–2280, 10 2016. [58] C. Zhang, S. Hu, Z. G. Tang, and T.-H. H. Chan. Re-revisiting learning on hypergraphs: Confidence interval and subgradient method. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4026–4034, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. [59] D. Zhou, J. Huang, and B. Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In NIPS, volume 19, pages 1633–1640, 2006. [60] J. Y. Zien, M. D. F. Schlag, and P. K. Chan. Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(9):1389–1399, 9 1999. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70293 | - |
dc.description.abstract | 此論文旨在探究隨機區塊模型 (SBM) 中的叢集偵測問題。上半部討論叢集偵測問題從一般圖至超圖中的延伸。我們提出一更一般化之超圖生成模型 (d-hSBM),且完成其中漸進極小化極大錯誤率之刻畫。上界部分,於數量級為 3 時首先由最大似然估計元達成;其後,我們提出一有效率之二步驟演算法,並證明其可於任意數量級之隨機超圖模型中,有此極小化極大最佳錯誤率之表現。下界部分之匹配,則是經由指認一包含最主要錯誤事件之子參數空間所達成。此論文之下半部份考慮如何估計叢集個數此一分群問題中之子問題。作為一初步理論嘗試,我們證明了最大似然估計元之統計一致性;所得之上界結果也間接暗示了此叢集個數估計問題允許一數量級上更稀疏之隨機圖。此外,我們亦提出一基於頻譜間隔之 EigenGap 演算法,並展示其統計一致性。於生成資料點與實際資料集上的實驗結果在在印證了我們的理論發現。 | zh_TW |
dc.description.abstract | The problem of community detection in the Stochastic Block Model SBM is considered. The first half of this thesis is devoted to the community detection problem extended from graphs to hypergraphs. We propose a more general hypergraph generative model termed d-hSBM, and characterize the asymptotic misclassification ratio in the minimax sense under it. Achievability part is settled first information-theoretically with the Maximum Likelihood Estimator (MLE) under 3-hSBM and then computation-efficientlly with a two-step algorithm for any order d-hSBM. The converse lower bound is set by finding a smaller parameter space which contains the most dominant error events. The second half of this thesis considers the problem of estimating the number of communities itself apart from the clustering task. As an attempt to characterize the fundamental limit in such formulation, we demonstrate that the MLE, which is optimal under a Bayesian perspective, is consistent, whose form might further endorse a sparser connectivity level. In addition, an efficient spectral method EigenGap is proposed along with a theoretical guarantee. Experimental results on both synthetic data and real-world data consolidate our theoretical finding. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T04:25:23Z (GMT). No. of bitstreams: 1 ntu-107-R05942127-1.pdf: 2086638 bytes, checksum: 89b9401b9334f30706542085be6f0fac (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 誌謝 (i)
摘要 (ii) Abstract (iii) I Community Detection in Hypergraphs (1) 1 Introduction (2) 2 Problem Formulation (7) 2.1 Community Relations (7) 2.2 Random Hypergraph Model: d-hSBM (9) 2.3 Performance Measure (10) 3 Minimax Risk: From SBM to 3-hSBM (12) 3.1 Improvements with Hyperedge Observation (13) 3.2 Minimax Upper Bound with MLE (13) 4 Minimax Risk: General d-hSBM (17) 4.1 Prior Works (17) 4.2 Main Contribution (18) 4.3 Minimax Risk Exponent in Term of the Order d (20) 4.4 Implications to Exact Recovery (21) 4.5 Comparison with [27] (22) 5 Two-Step Algorithm (24) 5.1 Refinement Scheme (24) 5.2 Spectral Initialization (25) 5.3 Time Complexity (28) 6 Theoretical Guarantees (30) 6.1 Refinement Step (32) 6.2 Spectral Clustering (35) 6.2.1 Comparison with [30] (35) 6.2.2 Proof of Theorem 6.0.3 (36) 6.3 Minimax Lower Bound (39) 7 Experiments (42) 7.1 Synthetic Data (42) 7.2 Real-World Data (43) 7.2.1 Categorical Data (44) 7.2.2 Numerical Data (44) 8 Discussions (47) II Estimating the Number of Communities (49) 9 Introduction (50) 9.1 Related Work (50) 9.2 This Work (51) 10 Problem Formulation (53) 10.1 The Stochastic Block Model (53) 10.2 Parameter Space (53) 10.3 Performance Metric (55) 11 Consistency of the ML Estimator (56) 11.1 Proof of Theorem 11.0.1 (57) 12 EigenGap Algorithm (60) 12.1 Spectral Method (60) 12.2 The EigenGap Algorithm (61) 12.2.1 Time Complexity (61) 12.3 Performance Guarantee (62) 13 Analysis of EigenGap (63) 13.1 Spectrum In Expectation (63) 13.1.1 Warm-Up (63) 13.1.2 More General Setting (64) 13.2 Accounting for Randomness (65) 13.2.1 Consistency of EigenGap (66) 13.2.2 Semi-Homogeneous Case (68) 14 Experiments (70) 15 Conclusion (74) A Auxiliary Results in Section 3.2 (76) A.1 Proof of Claim 3.2.1 (76) A.2 Proof of Claim 3.2.2 (76) A.3 Proof of Lemma 3.2.1 (77) A.4 Proof of Lemma 3.2.2 (78) A.4.1 Case 1 (79) A.4.2 Case 2 (80) A.5 Proof of Lemma 3.2.5 (84) A.5.1 Case 1 (85) A.5.2 Case 2 (86) A.5.3 Case 3 (87) B Proof of Lemma 6.1.1 (89) C Proof of Lemma 6.1.2 (92) D Proof of Lemma 6.2.1 (95) E Proof of Lemma 6.0.1 (100) F Proof of Lemma 6.3.2 (103) G Proof of Lemma 6.3.3 (106) H Proof of Auxiliary Results for MLE (109) H.1 Cardinality Bound (109) H.2 Chernoff Bound (110) H.3 Edge Disagreements (111) H.4 Divergence Estimates (111) I Proof of Auxiliary Results for EigenGap (113) I.1 Spectrum Under SBM1 (114) I.2 Spectrum Under Semi-Homogeneous Case (115) I.3 Concentration of Spectrum (120) Bibliography (124) | |
dc.language.iso | en | |
dc.title | 統計觀點下的叢集偵測問題 | zh_TW |
dc.title | Community Detection: A Statistical Approach | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 謝宏昀,張正尚 | |
dc.subject.keyword | 分類,叢集偵測,隨機區塊模型,超圖,極小化極大錯誤率,瑞尼發散項,二步驟演算法,區塊個數估計,圖拉普拉斯矩陣,克-史門檻,頻譜演算法, | zh_TW |
dc.subject.keyword | clustering,community detection,stochastic block model,hypergraph,minimax risk,Renyi divergence,two-step algorithms,block-number estimation,graph Laplacian,Kesten-Stigum threshold,spectral algorithms, | en |
dc.relation.page | 128 | |
dc.identifier.doi | 10.6342/NTU201803335 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2018-08-15 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 2.04 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。