用於無監督學習的通用半參數群集索引分布模型研究

鄧仁傑; Jen-Chieh Teng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102193

標題:	用於無監督學習的通用半參數群集索引分布模型研究 A General Semi-parametric Clusterwise-index Distribution Model for Unsupervised Learning
作者:	鄧仁傑 Jen-Chieh Teng
指導教授:	江金倉 Chin-Tsang Chiang
共同指導教授:	黃名鉞 Ming-Yueh Huang
關鍵字:	交替方向乘子法,凸函數差規劃群集索引分布分群歸屬最佳分類神諭估計成對融合懲罰項分割偽積分最小平方和半參數資訊準則分離懲罰項個體指標分配模型充分降維 Alternating direction method of multipliers,Difference of convex functions programmingClusterwise-index distributionCluster membershipHeuristic solutionOptimal classificationOracle estimationPairwise fusion penaltyPartitionPseudo sum of integrated least squaresSemiparametric information criterionSeparation penaltySubjectwise-index distribution modelSufficient dimension reduction
出版年 :	2025
學位:	博士
摘要:	本研究介紹了一種新穎的半母數依群組指標分配模型，旨在揭示潛在分群對反應變數與感興趣的共變數之間關係的影響。透過應用充分降維來考慮共變數對分群變數的影響，我們開發了一種獨特的方法來估計模型參數。我們的方法首先將偽積分最小平方和與分離懲罰項或成對融合懲罰項結合，以分割觀測值，並在一系列調校參數下估計群組指標係數。隨後，所得的分割估計式將用於估計分群歸屬模型。基於估計出的依群組指標分配模型與分群歸屬模型，第二階段估計會建構出最佳分類規則，同時疊代更新分割與模型參數的估計式。此方法的一項關鍵創新，是發展出用於決定分群數量的半母數資訊準則。與已知分群下的分類與估計一致，估計出的分群結構具備一致性與最佳性，且模型參數估計式具有神諭估計的性質。為了實作第一階段估計，我們改良了交替方向乘子法以提升數值收斂性，並結合凸函數差規劃來處理分離懲罰項。我們利用來自單一指標分配模型的殘差過程進行初始分群，並透過重新分類系統來優化分群識別，為熱啟動初始值提供啟發式解。考量到第一階段估計的計算複雜度，我們提出了一種實用的替代方案，即直接應用改良後的估計程序，來更新從啟發式解法中獲得的估計式，儘管這可能僅能部分保證在分割觀測值時的統計一致性。最後，透過模擬研究與實證資料分析的廣泛驗證，證實了所提方法的穩健性與有效性。 This study introduces a novel semi-parametric clusterwise-index distribution model to uncover the impact of latent clusters on the relationship between a response variable and the covariates of interest. By applying sufficient dimension reduction to account for the influence of covariates on the cluster variable, we develop a distinctive method to estimate the model parameters. Our method begins by integrating a pseudo sum of integrated squares with a separation or pairwise fusion penalty to partition observations and estimate the cluster index coefficients across a range of tuning parameters. The resulting partition estimator is subsequently used to estimate the cluster membership model. Based on the estimated clusterwise-index distribution and cluster membership models, the second-phase estimation constructs an optimal classification rule while iteratively updating the partition and model parameter estimators. A key innovation of this method is the development of semi-parametric information criteria for determining the number of clusters. In line with classification and estimation under known clusters, the estimated cluster structure attains consistency and optimality, and the model parameter estimators possess the oracle property. To implement the first-phase estimation, we refine the alternating direction method of multipliers to enhance numerical convergence and incorporate the difference of convex functions programming to address the separation penalty. Residual processes from a single-index distribution model are leveraged for initial clustering, and a reclassification system refines cluster identification, providing a heuristic solution for warm-start initial values. Given the computational complexity of the first-phase estimation, we propose a practical alternative by directly applying the refined estimation procedure to update the estimators obtained from the heuristic solution method, which may only partially guarantee statistical consistency in partitioning observations. Extensive validation through simulation studies and empirical data analyses confirms the robustness and effectiveness of the proposed methodology.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102193
DOI:	10.6342/NTU202600892
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf 未授權公開取用	2.97 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。