高維度平均值對等性檢定之研究

Chen-Hao Chiu; 邱振豪

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50850

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉仁沛
dc.contributor.author	Chen-Hao Chiu	en
dc.contributor.author	邱振豪	zh_TW
dc.date.accessioned	2021-06-15T13:02:11Z	-
dc.date.available	2019-07-26
dc.date.copyright	2016-07-26
dc.date.issued	2016
dc.date.submitted	2016-07-10
dc.identifier.citation	Altman D., Bland J.M. (1995). Absence of evidence is not evidence of absence. British Medical Journal, 311:485 Altman D., Bland J.M. (2004). Confidence intervals illuminate absence of evidence. British Medical Journal, 328:1016-1017 Anderson T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd Edition, New York, Wiley:156-163 Bai Z., Saranadasa H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica; 6:311-329 Cai T., Liu W., Luo X. (2011). A constrained l1 minimization approach to sparse precision matrix estimation. Journal of American Statistical Association; 106, 574-607 Cai T., Liu W., Xia Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B; 76, Part 2:349-372 Chen S.X., Qin Y.L. (2010). A two sample test for high dimensional data with applications to gene-set testing. The Annuals of Statistics; 38:808-835 Chow S.C., Liu J. P. (2010). Statistical Assessement of Biosimilar Products. Journal of Biopharmaceutical Statistics; 20:10-30 European Food Safety Authority (EFSA) (2010). EFSA GMO Panel Opinion on Statistical Considerations for the Safety Evaluation of GMOs. European Food Safety Authority, Parma, Italy FDA (2003) Guidance for Industry: Bioavailability and Bioequivalence Studies for Orally Administrated Drug Products─General Considerations, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Rockville, MD. Febrero-Bande M., Oviedo de la Fuente M. (2012). Statistical computing in functional data analysis: The R package fda.usc. Journal of Statistical Software; 51:1-28 Feng S., Liang Q., Kinser R. D. (2006). Testing equivalence between two laboratories or two methods using paired-sample analysis and interval hypothesis testing. Analytical and Bioanalytical Chemistry; 385:975-981 Gantz J., Reinsel D. (2012). The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Published by International Data Corporation, sponsored by EMC Corporation. Gregory K. B., Carroll R. J., Baladandayuthapani V., Lahiri S. N. (2014). A two-sample test for equality of means in high dimension. Journal of American Statistical Association; 110:837-849 Hauck W. W., Anderson S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Pharmacodynamics; 12:83-91 Little T. A. (2015). Equivalence testing for comparability. BioPharm International; 28 (2):45-48 Ruiz-Meana M., Garcia-Dorado D., Pina P., Inserte J., Agullo, L., Soler-Soler J. (2003). Cariporide preserves mitochondrial proton gradient and delays atp depletion in cardiomyocytes during ischemic conditions. American Journal of Physiology - Heart and Circulatory Physiology; 285:H999-H1006 Schuirmann D.J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15: 657-680. Srivastava M. (2007). Multivariate theory for analyzing high dimensional data. Journal of the Japan Statistical Society; 37:53-86 Srivastava M. S., Kubokawa T. (2013). Tests for multivariate analysis of variance in high dimension under non-normality. Journal of Multivariate Analysis; 115:204-216 Trifu M. R., Ivan M. L. (2014). Big data: present and future. Database Systems Journal; 5(1):32-41 Walker E., Nowacki A. S. (2011). Understanding Equivalence and Noninferiority Testing. Journal of General Internal Medicine; 26(2):192-196 Wu Y., Genton M. C., Stefanski L. A. (2006). A multivariate two-sample test for small sample size and missing data. Biometrics; 62:877-885 Incite-Group (2014). Big Data Infographics of World Popular Social Services. Published on Internet: http://www.incite-group.com/data-and-insights/social-mining-part-1-how-big-data-transforming-customer-insights. Accessed date: March 26, 2016
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50850	-
dc.description.abstract	傳統上若要進行多變數的兩樣本檢定，研究者多會使用Hotelling’s T2檢定，但由於高維度資料的其中一項特性是通常變數數目 (維度)遠大於樣本數，這會使得Hotelling’s T2檢定在高維度下因為無法計算出統計檢定量所需的樣本共變異矩陣之反矩陣而無法使用，同時也讓統計學者在相關推論上比較困難。前人們陸陸續續提出過許多改善方法，例如Cai, et al. (2014)提出在變異數矩陣同質假設及高維度下檢定兩平均向量是否相等的方法，並證明檢定統計量在變數個數趨近於無窮大時其近似分布為第一型極值分布 (Type I extreme-value distribution)。對等性 (equivalence)評估之目的在評估兩族群指標平均向量差異是否落於研究者所訂定的對等範圍內 (equivalence limits)，然而在如生物相似性產品 (biosimilar product)、基改產品 (GMO)等領域上若執行對等性評估，有可能會遇到評估的變數很多但觀測值相對很少的情況，但目前尚無高維度的對等性評估統計檢定方法。有鑑於此，我們在研究中將會提出兩種方法，第一種為根據交集-聯集原則所建立，在每個變數上都執行t對等性檢定的方法；第二種則為應用Cai, et al. (2014)的研究，命名其為最大Z2的高維度對等性檢定，其中包含了兩種程序，分別為在兩族群共變異矩陣是否已知的情況下作推導其形式與結果會有些許不同。在這份研究中我們展示大規模的模擬研究成果，並提供實際的高維度資料範例證明提出方法之應用。根據模擬研究成果顯示，對於第一種的交集-聯集方法，由於其要求每個變數都要能宣稱對等性成立，以至於太過於保守而近乎無法拒絕虛無假說來宣稱整體的對等性。而另一方面，第二種的最大Z2對等性檢定則由於只看最大值，因此不會如前種方法嚴謹，且在模擬的型一錯誤率部分，兩個程序都能夠控制在我們所要求的顯著水準附近，同時在模擬的檢定力部分，兩個程序也都能夠提供足夠大的檢定力，證明最大Z2對等性檢定除了能夠控制型一錯誤率外，也能夠在真實的對等性情況去作偵測。	zh_TW
dc.description.abstract	Hotelling’s T2 test is a canonical multivariate two-sample test. However, due to the property of high-dimensional data that the number of variables is typically greater than the sample size, the inverse of the sample covariance matrix required by the test is not obtainable. Hotelling’s T2 test hence becomes inapplicable to high-dimensional data. Such property has become a conundrum for researchers to solve, and a few researchers have proposed improved methods accordingly. For example, Cai, et al. (2014) proposed a high-dimensional test of means under the assumption of common covariance matrix, and proved the asymptotic distribution of the test statistic, as the number of the variables becomes infinity, converges to a Type I extreme-value distribution. Equivalence testing is used for evaluating whether the mean difference between two populations falls within the equivalence limits specified by researchers. When using equivalence testing on the evaluation of biosimilar products, GMOs, etc., it is possible that the number of variables is large and the sample size is relatively small. However, previous studies on high-dimensional equivalence test of means are scarce. In the studies, we propose two equivalence tests of means. The first method is based on the intersection-union approach and executes individual equivalence t-test for every variable. The second method, maximal Z2 test, is derived from the study by Cai, et al. (2014) and is based on the maximal equivalence between the variables. The maximal Z2 test includes two different procedures, applying to the situations whether the common covariance matrix is known or not. The steps and test statistics of the two procedures are slightly different as well. We display the results of our simulation studies and also provide the demonstration of our methods on real high dimensional examples. According to our simulation studies, it suggests the intersection-union method is too conservative to reject the null hypothesis of overall equivalence because it requires every variable equivalent. On the other hand, the maximal Z2 method is expected to more liberal because it only tests on the maximal variable. Both procedures from maximal Z2 method can adequately control the empirical size at the significance level and they also both provide sufficient empirical power. The simulation studies indicate that the maximal Z2 test is a viable approach that can not only controls Type I error rate but also detects the true equivalent situation correctly.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:02:11Z (GMT). No. of bitstreams: 1 ntu-105-R03621203-1.pdf: 2094917 bytes, checksum: 1bafd60fe11d293ee607e90196eecabb (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	目錄口試委員會審定書 ii 誌謝 iii 中文摘要 iv Abstract vi CONTENTS viii LIST OF FIGURES x LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background of Equivalence Test and Big Data 1 1.2 Equivalence of “Large-p-Small-n” Datasets 4 1.3 Objectives 7 Chapter 2 Literature Review 11 2.1 Hypotheses for Difference Testing 11 2.2 High Dimensional Approaches 14 Chapter 3 Proposed Methods 23 3.1 Traditional Intersection-Union Approach 23 3.2 Maximal Z2 Approach 26 Chapter 4 Numeric Examples 33 Chapter 5 Simulation Studies 41 5.1 Empirical Size 44 5.2 Empirical Power 46 Chapter 6 Discussion and Conclusion 64 References 67 Appendix A. Descriptive statistics of the two groups and p-values of intersection-union method 69 Appendix B. R codes for Test Functions 80 Appendix C. R codes for Empirical Size 83 Appendix D. R codes for Empirical Power 85
dc.language.iso	en
dc.title	高維度平均值對等性檢定之研究	zh_TW
dc.title	A Study on High-dimensional Equivalence Test of Means	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	季瑋珠,林志榮
dc.subject.keyword	對等性檢定,極值分佈,兩樣本檢定,高維度資料,雙單尾檢定,	zh_TW
dc.subject.keyword	Equivalence Test,Extreme value distribution,Two-sample test,High dimensional data,Two one-sided test,	en
dc.relation.page	86
dc.identifier.doi	10.6342/NTU201600679
dc.rights.note	有償授權
dc.date.accepted	2016-07-11
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	2.05 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。