用於無監督學習的通用半參數群集索引分布模型研究

鄧仁傑; Jen-Chieh Teng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102193

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	江金倉	zh_TW
dc.contributor.advisor	Chin-Tsang Chiang	en
dc.contributor.author	鄧仁傑	zh_TW
dc.contributor.author	Jen-Chieh Teng	en
dc.date.accessioned	2026-04-08T16:11:18Z	-
dc.date.available	2026-04-09	-
dc.date.copyright	2026-04-08	-
dc.date.issued	2025	-
dc.date.submitted	2026-03-30	-
dc.identifier.citation	An, L. T. H. and Tao, P. D. (2005). The DC (direrence of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 133, 23-46. Bellman, R. E. (1961). Adaptive Control Processes: A Guide Tour. New Jersey: Princeton. Bliss, C. I. (1934). The method of probits. Science 79, 38-39. Borisenko, A. A. and Nikolaevskii, Y. A. (1991). Grassmann manifolds and the Grassmann image of submanifolds. Russian Mathematical Surveys 46, 45-94. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1-122. Brusco, M. J., Cradit, J. D., and Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. Journal of Marketing Research 40, 225-234. Charles, C. (1977). Regression typologique et reconnaissance des formes. PhD thesis, University de Paris IX. Chen, J., Tran-Dinh, Q., Kosorok, M. R., and Liu, Y. (2021). Identifying heterogeneous effect using latent supervised clustering with adaptive fusion. Journal of Computational and Graphical Statistics 30, 43-54. Cheng, S., Wei, L. J., and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika 82, 835-845. Chiang, C. T. and Huang, M. Y. (2012). New estimation and inference procedures for a single-index conditional distribution model. Journal of Multivariate Analysis 111, 271-285. Cook, R. D. (1994). On the interpretation of regression plots. Journal of the American Statistical Association 89, 177-189. Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions through Graphics. New York: Wiley. Cook, R. D. and Lee, H. (1999). Dimension reduction in binary response regression. Journal of the American Statistical Association 94, 1187-1200. Cook, R. D. and Weisberg, S. (1991). Sliced inverse regression for dimension reduction: Comment. Journal of the American Statistical Association 86, 328-332. Cosslett, S. R. (1983). Distribution-free maximum likelihood estimator of the binary choice model. Econometrica 51, 765-782. Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society B20, 215-232. Delecroix, M., H ardle, W., and Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis 86, 213-226. DeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification 5, 249-282. Detrano, R., Janosi, A., Steinbrunn, W., P sterer, M., Schmid, J.-J., Sandhu, S., Guppy, K. H., Lee, S., and Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology 64, 304-310. Devijver, E. (2017). Model-based regression clustering for high-dimensional data: application to functional data. Advances in Data Analysis and Classification 11, 243-279. Devroye, L., Gyorfi , L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition. New York: Springer. Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics 13, 342-368. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348-1360. Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768-769. Fritsch, J., Finke, M., and Waibel, A. (1996). Adaptively growing hierarchical mixtures of experts. Advances in Neural Information Processing Systems 9, 459-465. Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society A150, 119-137. Hall, P. and Yao, Q. (2005). Approximating conditional distribution functions using dimension reduction. Annals of Statistics 33, 1404-1421. Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., and Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335, 1081-1090. Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator. Journal of Econometrics 35, 303-316. Han, M., Lin, Y., Liu, W., and Wang, Z. (2024). Robust inference for subgroup analysis with general transformation models. Journal of Statistical Planning and Inference 229, 106100. Hartley, H. O. and Rao, J. N. K. (1967). Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93-108. He, Y., Zhou, L., Xia, Y., and Lin, H. (2023). Center-augmented l2-type regularization for subgroup learning. Biometrics 79, 2157-2170. Hoeffding, W. (1948). Probability inequalities for sums of random variables. Annals of Statistics 10, 293-325. Hoeffding, W. (1961). The strong law of large numbers for U-statistics. Technical report, Department of Statistics, North Carolina State University. Hosmer, Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied Logistic Regression. New York: Wiley. Huang, M. Y. and Chiang, C. T. (2017). An e effective semiparametric estimation approach for the sufficient dimension reduction model. Journal of the American Statistical Association 112, 1296-1310. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation 3, 79-87. Jiang, W. and Tanner, M. A. (1999). Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Annals of Statistics 27, 987-1011. Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6, 181-214. Kiefer, N. M. (1978). Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46, 427-434. Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer. Kruskal, W. H. (1952). A nonparametric test for the several sample problem. Annals of Statistics 23, 525-540. Lauer, M. S., Francis, G. S., Okin, P. M., Pashkow, F. J., Snader, C. E., and Marwick, T. H. (1999). Impaired chronotropic response to exercise stress testing as a predictor of mortality. Journal of the American Medical Association 281, 524-529. Li, B. andWang, S. (2007). On directional regression for dimension reduction. Journal of the American Statistical Association 102, 997-1008. Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86, 316-327. Li, K. C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma. Journal of the American Statistical Association 87, 1025-1039. Linton, O. and Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82, 93-100. Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-137. Ma, S. and Huang, J. (2017). A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association 112, 410-423. Ma, S., Huang, J., Zhang, Z., and Liu, M. (2020). Exploration of heterogeneous treatment effects via concave fusion. The International Journal of Biostatistics 16, 20180026. Ma, Y. and Zhu, L. (2013). Efficient estimation in sufficient dimension reduction. Annals of Statistics 41, 250-268. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1, 281-297. Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Statistics 18, 50-60. McFadden, D. (1973). Conditional Logit Analysis of Qualitative Choice Behavior. ed. P. Zarembka, Frontiers in Econometrics. New York: Academic Press. Muthen, B. and Asparouhov, T. (2009). Multilevel regression mixture analysis. Journal of the Royal Statistical Society A172, 639-657. Muthen, B. and Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55, 463-469. Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society A135, 370-384. Pakes, A. and Pollard, D. (1989). Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027-1057. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine and Journal of Science 50, 157-175. Peng, F., Jacobs, R. A., and Tanner, M. A. (1996). Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. Journal of the American Statistical Association 91, 953-960. Pollard, D. (1984). Convergence of Stochastic Processes. New York: Springer. Quandt, R. E. (1972). A new approach to estimating switching regressions. Journal of the American Statistical Association 67, 306-310. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846-850. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461-464. Shen, J. and He, X. (2015). Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association 110, 303-312. Sherman, R. P. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. Annals of Statistics 22, 439-459. Shin, S. J., Wu, Y., Zhang, H. H., and Liu, Y. (2014). Probability-enhanced sufficient dimension reduction for binary classification. Biometrics 70, 546-555. Shin, S. J., Wu, Y., Zhang, H. H., and Liu, Y. (2017). Principal weighted support vector machines for sufficient dimension reduction in binary classification. Biometrika 104, 67-81. Smith, J. W., Everhart, J. E., Dickson, W., Knowler, W. C., and Johannes, R. S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the Annual Symposium on Computer Application in Medical Care 9, 261-265. Spath, H. (1979). Algorithm 39: clusterwise linear regression. Computing 22, 367-373. Tang, X., Xue, F., and Qu, A. (2021). Individualized multidirectional variable selection. Journal of the American Statistical Association 116, 1280-1296. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B58, 267-288. Wang, L. (2009). Wilcoxon-type generalized Bayesian information criterion. Biometrika 96, 163-173. Wang, Y., Yin, W., and Zeng, J. (2019). Global convergence of ADMM in nonconvex nonsmooth optimization. Journal of Scientific Computing 78, 29-63. Wedderburn, R. W. (1976). On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika 63, 27-32. Wedel, M. and Kistemaker, C. (1989). Consumer bene t segmentation using clusterwise linear regression. International Journal of Research in Marketing 6, 45-59. Wei, S. and Kosorok, M. R. (2013). Latent supervised learning. Journal of the American Statistical Association 108, 957-970. Xia, Y. (2007). A constructive approach to the estimation of dimension reduction directions. Annals of Statistics 35, 2654-2690. Xia, Y., Tong, H., Li, W. K., and Zhu, L. X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society B64, 363-410. Yao, W., Nandy, D., Lindsay, B. G., and Chiaromonte, F. (2019). Covariate information matrix for sufficient dimension reduction. Journal of the American Statistical Association 114, 1752-1764. Yin, X. and Li, B. (2011). Sufficient dimension reduction based on an ensemble of minimum average variance estimators. Annals of Statistics 39, 3392-3416. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894-942. Zhang, X., Mai, Q., and Zou, H. (2020). The maximum separation subspace in sufficient dimension reduction with categorical response. Journal of Machine Learning Research 21, 1-36.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102193	-
dc.description.abstract	本研究介紹了一種新穎的半母數依群組指標分配模型，旨在揭示潛在分群對反應變數與感興趣的共變數之間關係的影響。透過應用充分降維來考慮共變數對分群變數的影響，我們開發了一種獨特的方法來估計模型參數。我們的方法首先將偽積分最小平方和與分離懲罰項或成對融合懲罰項結合，以分割觀測值，並在一系列調校參數下估計群組指標係數。隨後，所得的分割估計式將用於估計分群歸屬模型。基於估計出的依群組指標分配模型與分群歸屬模型，第二階段估計會建構出最佳分類規則，同時疊代更新分割與模型參數的估計式。此方法的一項關鍵創新，是發展出用於決定分群數量的半母數資訊準則。與已知分群下的分類與估計一致，估計出的分群結構具備一致性與最佳性，且模型參數估計式具有神諭估計的性質。為了實作第一階段估計，我們改良了交替方向乘子法以提升數值收斂性，並結合凸函數差規劃來處理分離懲罰項。我們利用來自單一指標分配模型的殘差過程進行初始分群，並透過重新分類系統來優化分群識別，為熱啟動初始值提供啟發式解。考量到第一階段估計的計算複雜度，我們提出了一種實用的替代方案，即直接應用改良後的估計程序，來更新從啟發式解法中獲得的估計式，儘管這可能僅能部分保證在分割觀測值時的統計一致性。最後，透過模擬研究與實證資料分析的廣泛驗證，證實了所提方法的穩健性與有效性。	zh_TW
dc.description.abstract	This study introduces a novel semi-parametric clusterwise-index distribution model to uncover the impact of latent clusters on the relationship between a response variable and the covariates of interest. By applying sufficient dimension reduction to account for the influence of covariates on the cluster variable, we develop a distinctive method to estimate the model parameters. Our method begins by integrating a pseudo sum of integrated squares with a separation or pairwise fusion penalty to partition observations and estimate the cluster index coefficients across a range of tuning parameters. The resulting partition estimator is subsequently used to estimate the cluster membership model. Based on the estimated clusterwise-index distribution and cluster membership models, the second-phase estimation constructs an optimal classification rule while iteratively updating the partition and model parameter estimators. A key innovation of this method is the development of semi-parametric information criteria for determining the number of clusters. In line with classification and estimation under known clusters, the estimated cluster structure attains consistency and optimality, and the model parameter estimators possess the oracle property. To implement the first-phase estimation, we refine the alternating direction method of multipliers to enhance numerical convergence and incorporate the difference of convex functions programming to address the separation penalty. Residual processes from a single-index distribution model are leveraged for initial clustering, and a reclassification system refines cluster identification, providing a heuristic solution for warm-start initial values. Given the computational complexity of the first-phase estimation, we propose a practical alternative by directly applying the refined estimation procedure to update the estimators obtained from the heuristic solution method, which may only partially guarantee statistical consistency in partitioning observations. Extensive validation through simulation studies and empirical data analyses confirms the robustness and effectiveness of the proposed methodology.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-08T16:11:18Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-04-08T16:11:18Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Abstract (Chinese) I Abstract II Contents III 1. Introduction 1 2. General Background 10 2.1 Proposed Models and Their Derivatives 10 2.2 Alternative Representations of CID Model 14 2.3 Estimation for the CID Model with Supervised Data 16 2.4 Estimation for the Cluster Membership Model with Supervised Data 22 3. Oracle Estimation and Its Refinement 27 3.1 Estimation for the CID Model 28 3.2 Estimation for the Cluster Membership Model 33 3.3 Optimal Classification and Refined Estimation 36 3.4 Semi-parametric Information Criteria 39 4. Estimation Implementation 44 4.1 A Heuristic Solution Model 45 4.2 PLISSP Estimation Implementation 48 4.3 PLISFP Estimation Implementation 54 4.4 Computational Procedure 58 5. Monte Carlo Simulations 61 5.1 Simulated Settings and Evaluation Metrics 61 5.2 Assessment of Estimators -Scenarios I 66 5.3 Assessment of Estimators -Scenarios II 70 5.4 Assessment of Estimators -Scenarios III 85 6. Applications 90 6.1 Application in Heart Disease Research 91 6.2 Application in Pima-Indian Diabetes Research 97 6.3 Application in AIDS Clinical Trials Group Study 175 109 7. Discussion 115 Bibliography 119 Appendix 131	-
dc.language.iso	en	-
dc.subject	交替方向乘子法	-
dc.subject	凸函數差規劃	-
dc.subject	群集索引分布	-
dc.subject	分群歸屬	-
dc.subject	最佳分類	-
dc.subject	神諭估計	-
dc.subject	成對融合懲罰項	-
dc.subject	分割	-
dc.subject	偽積分最小平方和	-
dc.subject	半參數資訊準則	-
dc.subject	分離懲罰項	-
dc.subject	個體指標分配模型	-
dc.subject	充分降維	-
dc.subject	Alternating direction method of multipliers	-
dc.subject	Difference of convex functions programming	-
dc.subject	Clusterwise-index distribution	-
dc.subject	Cluster membership	-
dc.subject	Heuristic solution	-
dc.subject	Optimal classification	-
dc.subject	Oracle estimation	-
dc.subject	Pairwise fusion penalty	-
dc.subject	Partition	-
dc.subject	Pseudo sum of integrated least squares	-
dc.subject	Semiparametric information criterion	-
dc.subject	Separation penalty	-
dc.subject	Subjectwise-index distribution model	-
dc.subject	Sufficient dimension reduction	-
dc.title	用於無監督學習的通用半參數群集索引分布模型研究	zh_TW
dc.title	A General Semi-parametric Clusterwise-index Distribution Model for Unsupervised Learning	en
dc.type	Thesis	-
dc.date.schoolyear	114-2	-
dc.description.degree	博士	-
dc.contributor.coadvisor	黃名鉞	zh_TW
dc.contributor.coadvisor	Ming-Yueh Huang	en
dc.contributor.oralexamcommittee	廖振鐸;黃冠華;黃禮珊;張中;蔡欣甫	zh_TW
dc.contributor.oralexamcommittee	Chen-Tuo Liao;Guan-Hua Huang;Li-Shan Huang;Chung Chang;Shin-Fu Tsai	en
dc.subject.keyword	交替方向乘子法,凸函數差規劃群集索引分布分群歸屬最佳分類神諭估計成對融合懲罰項分割偽積分最小平方和半參數資訊準則分離懲罰項個體指標分配模型充分降維	zh_TW
dc.subject.keyword	Alternating direction method of multipliers,Difference of convex functions programmingClusterwise-index distributionCluster membershipHeuristic solutionOptimal classificationOracle estimationPairwise fusion penaltyPartitionPseudo sum of integrated least squaresSemiparametric information criterionSeparation penaltySubjectwise-index distribution modelSufficient dimension reduction	en
dc.relation.page	178	-
dc.identifier.doi	10.6342/NTU202600892	-
dc.rights.note	未授權	-
dc.date.accepted	2026-03-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資料科學學位學程	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf 未授權公開取用	2.97 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。