高維因子模型中基於資訊準則的秩估計方法

森元俊成; Toshinari Morimoto

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101813

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	陳素雲	zh_TW
dc.contributor.advisor	Su-Yun Huang	en
dc.contributor.author	森元俊成	zh_TW
dc.contributor.author	Toshinari Morimoto	en
dc.date.accessioned	2026-03-04T16:46:36Z	-
dc.date.available	2026-03-05	-
dc.date.copyright	2026-03-04	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-04	-
dc.identifier.citation	S. C. Ahn and A. R. Horenstein. Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227, 2013. doi: 10.3982/ECTA8968. J. Bai and S. Ng. Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221, 2002. ISSN 00129682, 14680262. Z. Bai and J. W. Silverstein. Spectral Analysis of Large Dimensional Random Matrices. Springer Series in Statistics. Springer, 2010. ISBN 978-1-4419-0660-1. doi: 10.1007/978-1-4419-0661-8. Z. Bai, K. P. Choi, and Y. Fujikoshi. Consistency of aic and bic in estimating the number of significant components in high-dimensional principal component analysis. The Annals of Statistics, 46(3):1050–1076, 2018. ISSN 00905364, 21688966. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6(58):1705–1749, 2005. URL http://jmlr.org/papers/v6/banerjee05b.html. E. Dobriban and A. B. Owen. Deterministic parallel analysis: an improved method for selecting factors and principal components. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(1):163–183, 2019. doi: 10.1111/rssb.12301. J. Fan, Y. Liao, and M. Mincheva. Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(4):603–680, 08 2013. ISSN 1369-7412. doi: 10.1111/rssb.12016. URL https://doi.org/10.1111/rssb.12016. J. Fan, J. Guo, and S. Zheng. Estimating number of factors by adjusted eigenvalues thresholding. Journal of the American Statistical Association, 117(538):852–861, 2022. doi:10.1080/01621459.2020.1825448. J. L. Horn. A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2):179–185, Jun 1965. ISSN 1860-0980. doi: 10.1007/BF02289447. J. Hu, W. Li, Z. Liu, and W. Zhou. High-dimensional covariance matrices in elliptical distributions with application to spherical test. The Annals of Statistics, 47(1):527– 555, 2019. doi: 10.1214/18-AOS1699. URL https://doi.org/10.1214/18-AOS1699. H. Hung, S.-Y. Huang, and C.-K. Ing. A generalized information criterion for high dimensional pca rank selection. Statistical Papers, 63(4):1295–1321, 2022. Z. T. Ke, Y. Ma, and X. Lin. Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis. Journal of the American Statistical Association, 118(541):374–392, 2023. S. Konishi and G. Kitagawa. Generalised information criteria in model selection. Biometrika, 83(4):875–890, 1996. ISSN 00063444. T. Morimoto, H. Hung, and S.-Y. Huang. A unified selection consistency theorem for information criterion-based rank estimators in factor analysis. Journal of Multivariate Analysis, 211:105498, 2026. ISSN 0047-259X. doi: https://doi.org/10.1016/j.jmva.2025.105498. URL https://www.sciencedirect.com/science/article/pii/S0047259X25000934. A. Onatski. Testing hypotheses about the number of factors in large factor models. Econometrica, 77(5):1447–1479, 2009. doi: 10.3982/ECTA6964. A. Onatski. Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics, 92(4):1004–1016, 2010. C. Spearman. “General intelligence,” objectively determined and measured. American Journal of Psychology, 15:201–293, 1904. J. Yao, S. Zheng, and Z. Bai. Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2015. doi: 10.1017/CBO9781107588080.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101813	-
dc.description.abstract	本研究探討因子模型中因子數估計的問題。由於模型具有多重解釋性，因子數並不存在嚴格且唯一的定義。然而，在樣本數n與維度p同時發散的漸近情境下，透過隨機矩陣理論（RMT），可以在未假設特定因子模型的情況下，將因子數以「秩」的形式，重新給予更嚴謹且抽象化的定義。本研究以秩估計為核心目標，特別關注基於資訊準則的秩估計量之理論性質。本研究的主要貢獻有二。第一，在RMT的假設下，我們針對包括AIC、BIC、GIC，以及PC類與IC類等資訊準則秩估計量，建立了統一選擇一致性定理。既有研究已討論過AIC、BIC與GIC達成選擇一致性的充要條件，亦即間隙條件，但我們進一步推導了PC類與IC類共六種估計量的間隙條件，並將其與既有結果整合，得到統一選擇一致性定理。該條件由最小信號特徵值強度與最大可容許雜訊特徵值強度之間的關係式所構成，揭示了各估計量在信號偵測能力與雜訊穩健性之間的權衡關係。我們發現，這種權衡關係在有限樣本情境下，構成了基於資訊準則的秩估計之實質性限制。第二，基於此，作為第二項貢獻，本研究提出了一種新的資訊準則，稱為「擴展廣義資訊準則（eGIC）」。eGIC的推導動機源自近年關於橢圓分佈族之隨機矩陣理論的發展，透過在GIC的推導過程中引入橢圓分佈作為工作分佈而得。所得之eGIC具有根據資料生成分佈的尾部厚度來縮放懲罰項的結構。此一設計使得eGIC不僅依賴譜資訊，亦能反映分佈的幾何特徵。透過數值模擬實驗，我們證明了eGIC與傳統GIC一樣，能捕捉到微弱的訊號特徵值，並大幅提升了雜訊穩健性，從而驗證了其在有限樣本下克服既有方法限制的有效性。	zh_TW
dc.description.abstract	This study investigates the problem of estimating the number of factors in factor models. Due to the multiple interpretability of models, a strict and unique definition of the number of factors generally does not exist. However, in the asymptotic regime where both the sample size n and dimension p diverge, Random Matrix Theory (RMT) allows for a rigorous reformulation of the number of factors as the "rank," without assuming factor models. In this study, we focus on rank estimation methods, and specifically investigate the theoretical properties of rank estimators based on information criterion. The main contributions of this study are as follows. First, under RMT assumptions, we establish a unified selection consistency theorem for information criterion-based rank estimators, including AIC, BIC, and GIC, as well as the PCs and ICs. Existing literature has discussed the necessary and sufficient conditions, specifically, the "gap conditions," for AIC, BIC, and GIC to achieve selection consistency. In this study, we further derive the gap conditions for six rank estimators based on PC and IC-type criterion. By integrating our findings with existing results, we obtain a unified selection consistency theorem. The gap conditions specify the minimum strength of signal eigenvalue and the maximum tolerable noise eigenvalue, revealing a trade-off between sensitivity to signal eigenvalues and robustness against noise eigenvalues. We find that this trade-off emerges as a substantial limitation for information criterion-based rank estimators in practical finite-sample settings. Second, based on these findings, we propose a new information criterion, namely the "extended Generalized Information Criterion (eGIC)." The derivation of eGIC is motivated by recent developments in RMT concerning elliptical distribution families. The eGIC is constructed by introducing elliptical distributions as the working distribution within the derivation process of GIC. The resulting eGIC is designed to scale the penalty term according to the tail heaviness of the data-generating distribution. This design enables eGIC to reflect not only spectral information but also the geometric characteristics of the data-generating distribution. Through numerical simulations, we demonstrate that eGIC, like the standard GIC, is capable of detecting weak signal eigenvalues, while significantly improving noise robustness. This validates its effectiveness in overcoming the limitations of existing methods encountered in finite-sample scenarios.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-04T16:46:36Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-04T16:46:36Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Background 1 1.2 Factor analysis and its inference problem 1 1.3 Purpose and contributions 3 1.4 Organization 5 1.5 Terms and notation 7 Chapter 2 RMT Fundamentals and the Concept of Rank 9 2.1 Review of RMT 9 2.2 Definition of rank 12 2.3 Relationship between the rank and the number of factors 13 Chapter 3 Review of Information Criterion-Based Estimators 17 3.1 AIC, BIC and GIC 17 3.2 Bai and Ng's criteria 20 Chapter 4 Unified Selection Consistency Theorem 23 Chapter 5 Numerical Studies 27 5.1 Study I: Numerical verification of the theorem 27 5.1.1 Experimental settings 27 5.1.2 Gap conditions 29 5.1.3 Results 36 5.2 Study II: Exploration of characteristics in information criterion-based estimators 40 5.2.1 Scenario 1: Model with high rank r0 42 5.2.2 Scenario 2: Model with a heavy-tailed distribution 43 5.2.3 Scenario 3: Model with a sparse population covariance matrix 44 5.2.4 Scenario 4: Model with extremely large λ1 45 5.2.5 Scenario 5: Power spiked model 45 5.2.6 Summary and motivation for eGIC 46 Chapter 6 Extended GIC 49 6.1 Introduction of extended GIC 49 6.2 Interpretation of the extended GIC 52 6.3 Estimation of η 53 Chapter 7 Extended Unified Selection Consistency Theorem 55 7.1 Review of elliptical RMT 55 7.2 Extended unified selection consistency 57 Chapter 8 Extended Numerical Studies 59 8.1 Study III: Numerical verification of the theorem under (E1)－(E3) 59 8.1.1 Simulation settings 59 8.1.2 Gap conditions 61 8.1.3 Results 66 8.2 Study IV: Demonstration of advantages of eGIC 69 Chapter 9 Conclusion 73 References 77 Appendix A － Proof: Unified Selection Consistency Theorem 81 A.1 Lemmas related to Theorem 1 81 A.2 Proof of Theorem 1 82 A.2.1 Gap conditions for PC3 82 A.2.2 Gap conditions for IC3 84 Appendix B－Supplementary Tables on Gap Condition Violations in Study I 89 B.1 Overview 89 B.2 Impact of gap condition violations on estimation bias 89 B.3 Numerical verification 90 Appendix C －Derivaion of Extended GIC 93 C.1 Derivation of M-estimators for the simple spiked model 93 C.1.1 Preliminaries 93 C.1.2 Derivation 94 C.2 Influence functions of M-estimator functionals 98 C.2.1 Mean vector 98 C.2.2 Covariance matrix 99 C.2.3 Leading eigenvalues 99 C.2.4 Tail eigenvalues 103 C.3 Proof of Theorem 2 104 C.3.1 Mean vector 105 C.3.1.1 Derivative of log-likelihood 105 C.3.1.2 Expectation 106 C.3.2 Leading eigenvectors 106 C.3.2.1 Derivatives on the Stiefel manifold 106 C.3.2.2 Expectation 108 C.3.3 Leading eigenvalues 110 C.3.4 Tail eigenvalues 112 Appendix D－Proof: Extended Unified Selection Consistency Theorem 115 D.1 Lemmas related to Theorem 3 115 D.2 Proof of Theorem 3 119 D.2.1 As necessary conditions 119 D.2.2 As sufficient conditions 120 D.2.2.1 Step 1. First gap condition 120 D.2.2.2 Step 2. Second gap condition 120 Appendix E－Rank Estimation Methods in Categories 2 and 3 121 E.1 Category 2 121 E.2 Category 3 123	-
dc.language.iso	en	-
dc.subject	因子模型	-
dc.subject	秩估計	-
dc.subject	資訊量準則	-
dc.subject	隨機矩陣理論	-
dc.subject	橢圓分布	-
dc.subject	factor model	-
dc.subject	rank estimation	-
dc.subject	information criterion	-
dc.subject	random matrix theory	-
dc.subject	elliptical distribution	-
dc.title	高維因子模型中基於資訊準則的秩估計方法	zh_TW
dc.title	Information Criterion-Based Rank Estimation in High-Dimensional Factor Models	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	博士	-
dc.contributor.coadvisor	王偉仲	zh_TW
dc.contributor.coadvisor	Weichung Wang	en
dc.contributor.oralexamcommittee	洪弘;陳定立;王紹宣;銀慶剛	zh_TW
dc.contributor.oralexamcommittee	Hung Hung;Ting-Li Chen;Shao-Hsuan Wang;Ching-Kang Ing	en
dc.subject.keyword	因子模型,秩估計資訊量準則隨機矩陣理論橢圓分布	zh_TW
dc.subject.keyword	factor model,rank estimationinformation criterionrandom matrix theoryelliptical distribution	en
dc.relation.page	124	-
dc.identifier.doi	10.6342/NTU202600445	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-02-06	-
dc.contributor.author-college	理學院	-
dc.contributor.author-dept	數學系	-
dc.date.embargo-lift	2026-03-05	-
Appears in Collections:	數學系

Files in This Item:

File	Size	Format
ntu-114-1.pdf Access limited in NTU ip range	2.63 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets