高維度資料下三階段交互作用選取與模式建立策略

Tzu-Ting Huang; 黃子庭

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16242

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張淑惠
dc.contributor.author	Tzu-Ting Huang	en
dc.contributor.author	黃子庭	zh_TW
dc.date.accessioned	2021-06-07T18:06:24Z	-
dc.date.copyright	2012-09-17
dc.date.issued	2012
dc.date.submitted	2012-07-24
dc.identifier.citation	1. Basu S, Pan W, Shen X, Oetting WS. (2011). Multilocus association testing with penalized regression. Genet Epidemiol 35: 1–11. 2. Bateson W. (1909): Mendel’s Principles of Heredity. Cambridge University Press. 3. Casella, G., and Berger, R. L. (2002). Statistical Inference. Thomson Learning. 4. Combarros O, Cortina-Borja M, Smith AD, Lehmann DJ. (2009). Epistasis in sporadic Alzheimer’s disease. Neurobiol Aging 30, 1333-1349. 5. Czepiel, S. A., Maximum likelihood estimation of logistic regression models: theory and implementation. 6. D'Angelo GM, Rao DC, Gu CC. (2009). Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. BMC Proc 3:S62. 7. Dobson, A. J., and Barnett, A. G. (2002). Introduction to Generalized Linear Models . 8. Friedman J., Hastie T., Tibshirani R. (2010): Regularization Path for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1). 9. L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. (2008). Dimensionality reduction: A comparative review. Online Preprint. 10. Lobo, I. (2008). Epistasis: gene interaction and the phenotypic expression of complex diseases like Alzheimer’s. Nat. Educ. 1(1). 11. Miller, A. J. (2002). Subset selection in regression: Chapman & Hall/CRC. 12. Moore, J. H. (2003). The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity, 56(1-3), 73-82. doi: Doi 10.1159/000073735. 13. Newton JL, Harney SMJ, Wordsworth BP and Brown MA.(2004): A review of the MHC genetics of rheumatoid arthritis. Genes Immun, 5:151–157. 14. Rao, C. R. (1947). Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57. 15. Sharma, Subhash. (1996). Applied Multivariate Techniques, John Wiley & Sons. 16. S. Sartori (2011). Penalized Regression: bootstrap confidence intervals and variable selection for high dimensional data sets. See Chapter 3 - Section 3.6. 17. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological, 58(1), 267-288. 18. Witten, D. M. & Tibshirani, R. (2010). Survival analysis with high-dimensional covariates. Stat. Methods Med. Res.19, 29–51. 19. 戴政、江淑瓊(2002)。生物醫學統計概論。 20. 陳順宇(2005)。多變量分析。見第6章-主成份分析。 21. 林建甫(2009)。存活分析。第3章第2節。 22. 高瞻自然科學教學資源平台-上行作用。
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16242	-
dc.description.abstract	近十多年來，由於生物醫學科技在基因研究的快速發展，產生大量變數遠多於觀察數的資料，也就是一個觀察樣本擁有很多自變數，我們稱之為高維度資料，實際資料如：基因表現資料或單核苷多型性。在高維度基因表現資料下，疾病發生與否為反應結果變數，若想知道哪一個基因可能與疾病有統計上顯著關係，或想知道靠近哪一個基因會導致疾病發生，無法利用一般統計方法處理。必須先對高維度資料進行降階等統計處理後，才能進行後續的統計分析。近年來，討論染色體上基因間的交互作用對於外表型或是疾病的發生在遺傳統計是個熱門的話題。在許多基因研究顯示，大部分的複雜疾病並非單一基因所導致，而是一個以上的基因間交互作用所共同影響。為了解決這樣的問題，本研究提出一交互作用效應選取和模式建立的策略。高維度資料在全基因相關掃描出和疾病可能有關的基因後，利用PCA縮減維度為低維度資料，將性質相近的基因組成一個主成份，總共產生若干個獨立的主成份。因為模式要加上兩兩交互作用效應，導致會有大量參數需要估計。我們的目的是想有效減少交互作用，先利用單一參數分數檢定一一檢定所有交互作用效應，再將PCA的主成份為主效應加上單一參數分檢定選取出有統計顯著的交互作用效應，最後利用LASSO進行變數選取與模式建立。透過模擬研究可知此策略可以降低配適LASSO模式的運算時間與提高選取變數的正確率。	zh_TW
dc.description.abstract	Due to the breakthroughs in biomedical technology, many studies have produced data containing a large number of variables exceed the number of observations. Such data are called high-dimensional data, such as the gene expression profiles and single nucleotide polymorphisms. Consider the high-dimensional data with gene expression profiles and binary disease status is the outcome variable of interest. It is of interest to study how the gene expression profiles are associated with the disease status. The standard statistical approach cannot be used directly to analyze such high-dimensional data due to the curse of dimensionality. Typically, we have to reduce the dimension of original data before performing the subsequent statistical analysis. In recent years, exploration of the interactions between genes on the chromosome phenotypic or disease is an interesting topic in genetic statistics. Many genetic studies showed that complex diseases are not only caused by a single dominant gene, but also the combined effect of more than one gene interactions. In this study, our aim is to detect the gene interactions which are correlated with complex disease. For the analysis of high dimensional data, the first step is usually to use PCA for reducing the dimension and then selecting the principle components as the main effects in the model. We propose an effective selection strategy for the potential interactions following the first step. Specifically, we use one-parameter score test to detect the interactions one by one at the second step. Then, the final step is to perform LASSO by considering both the main effects and interactions selected at the first and second steps to obtain the final model. Our limited simulation studies showed that the proposed selection strategy using one-parameter score test for selection interactions can reduce the computation time in LASSO and raise the correct rate of selecting true variables in the model.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T18:06:24Z (GMT). No. of bitstreams: 1 ntu-101-R99849030-1.pdf: 718961 bytes, checksum: 4354012c2cc20bea1184d319243b7e09 (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 ii 摘要 iii Abstract v 目錄 vii 圖表目錄 viii 第一章緒論 1 第一節研究背景 1 第二節研究目的 4 第二章文獻探討 5 第一節交互作用選取策略之文獻回顧 5 第二節方法之文獻回顧 7 第三章研究方法 15 第一節三階段模式選取交互作用 15 第四章模擬研究 22 第一節模擬資料之生成 22 第二節模擬結果 27 第五章結果與討論 32 參考文獻 33
dc.language.iso	zh-TW
dc.subject	分數檢定	zh_TW
dc.subject	高維度資料	zh_TW
dc.subject	交互作用	zh_TW
dc.subject	最小絕對值壓縮和選取	zh_TW
dc.subject	主成份分析	zh_TW
dc.subject	Least absolute shrinkage and selection operator	en
dc.subject	Score test	en
dc.subject	High-dimensional data	en
dc.subject	Interaction effect	en
dc.subject	Principal component analysis	en
dc.title	高維度資料下三階段交互作用選取與模式建立策略	zh_TW
dc.title	Three-Stage Model Selection for Detecting Interaction Effects in High-Dimensional Data	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳秀熙,鄭明燕,戴政
dc.subject.keyword	高維度資料,交互作用,最小絕對值壓縮和選取,主成份分析,分數檢定,	zh_TW
dc.subject.keyword	High-dimensional data,Interaction effect,Least absolute shrinkage and selection operator,Principal component analysis,Score test,	en
dc.relation.page	35
dc.rights.note	未授權
dc.date.accepted	2012-07-24
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	702.11 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。