Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16242
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor張淑惠
dc.contributor.authorTzu-Ting Huangen
dc.contributor.author黃子庭zh_TW
dc.date.accessioned2021-06-07T18:06:24Z-
dc.date.copyright2012-09-17
dc.date.issued2012
dc.date.submitted2012-07-24
dc.identifier.citation1. Basu S, Pan W, Shen X, Oetting WS. (2011). Multilocus association testing with penalized regression. Genet Epidemiol 35: 1–11.
2. Bateson W. (1909): Mendel’s Principles of Heredity. Cambridge University Press.
3. Casella, G., and Berger, R. L. (2002). Statistical Inference. Thomson Learning.
4. Combarros O, Cortina-Borja M, Smith AD, Lehmann DJ. (2009). Epistasis in sporadic Alzheimer’s disease. Neurobiol Aging 30, 1333-1349.
5. Czepiel, S. A., Maximum likelihood estimation of logistic regression models: theory and implementation.
6. D'Angelo GM, Rao DC, Gu CC. (2009). Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. BMC Proc 3:S62.
7. Dobson, A. J., and Barnett, A. G. (2002). Introduction to Generalized Linear Models .
8. Friedman J., Hastie T., Tibshirani R. (2010): Regularization Path for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1).
9. L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. (2008). Dimensionality reduction: A comparative review. Online Preprint.
10. Lobo, I. (2008). Epistasis: gene interaction and the phenotypic expression of complex diseases like Alzheimer’s. Nat. Educ. 1(1).
11. Miller, A. J. (2002). Subset selection in regression: Chapman & Hall/CRC.
12. Moore, J. H. (2003). The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity, 56(1-3), 73-82. doi: Doi 10.1159/000073735.
13. Newton JL, Harney SMJ, Wordsworth BP and Brown MA.(2004): A review of the MHC genetics of rheumatoid arthritis. Genes Immun, 5:151–157.
14. Rao, C. R. (1947). Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57.
15. Sharma, Subhash. (1996). Applied Multivariate Techniques, John Wiley & Sons.
16. S. Sartori (2011). Penalized Regression: bootstrap confidence intervals and variable selection for high dimensional data sets. See Chapter 3 - Section 3.6.
17. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological, 58(1), 267-288.
18. Witten, D. M. & Tibshirani, R. (2010). Survival analysis with high-dimensional covariates. Stat. Methods Med. Res.19, 29–51.
19. 戴政、江淑瓊(2002)。生物醫學統計概論。
20. 陳順宇(2005)。多變量分析。見第6章-主成份分析。
21. 林建甫(2009)。存活分析。第3章第2節。
22. 高瞻自然科學教學資源平台-上行作用。
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16242-
dc.description.abstract近十多年來,由於生物醫學科技在基因研究的快速發展,產生大量變數遠多於觀察數的資料,也就是一個觀察樣本擁有很多自變數,我們稱之為高維度資料,實際資料如:基因表現資料或單核苷多型性。在高維度基因表現資料下,疾病發生與否為反應結果變數,若想知道哪一個基因可能與疾病有統計上顯著關係,或想知道靠近哪一個基因會導致疾病發生,無法利用一般統計方法處理。必須先對高維度資料進行降階等統計處理後,才能進行後續的統計分析。近年來,討論染色體上基因間的交互作用對於外表型或是疾病的發生在遺傳統計是個熱門的話題。在許多基因研究顯示,大部分的複雜疾病並非單一基因所導致,而是一個以上的基因間交互作用所共同影響。為了解決這樣的問題,本研究提出一交互作用效應選取和模式建立的策略。高維度資料在全基因相關掃描出和疾病可能有關的基因後,利用PCA縮減維度為低維度資料,將性質相近的基因組成一個主成份,總共產生若干個獨立的主成份。因為模式要加上兩兩交互作用效應,導致會有大量參數需要估計。我們的目的是想有效減少交互作用,先利用單一參數分數檢定一一檢定所有交互作用效應,再將PCA的主成份為主效應加上單一參數分檢定選取出有統計顯著的交互作用效應,最後利用LASSO進行變數選取與模式建立。透過模擬研究可知此策略可以降低配適LASSO模式的運算時間與提高選取變數的正確率。zh_TW
dc.description.abstractDue to the breakthroughs in biomedical technology, many studies have produced data containing a large number of variables exceed the number of observations. Such data are called high-dimensional data, such as the gene expression profiles and single nucleotide polymorphisms. Consider the high-dimensional data with gene expression profiles and binary disease status is the outcome variable of interest. It is of interest to study how the gene expression profiles are associated with the disease status. The standard statistical approach cannot be used directly to analyze such high-dimensional data due to the curse of dimensionality. Typically, we have to reduce the dimension of original data before performing the subsequent statistical analysis. In recent years, exploration of the interactions between genes on the chromosome phenotypic or disease is an interesting topic in genetic statistics. Many genetic studies showed that complex diseases are not only caused by a single dominant gene, but also the combined effect of more than one gene interactions. In this study, our aim is to detect the gene interactions which are correlated with complex disease. For the analysis of high dimensional data, the first step is usually to use PCA for reducing the dimension and then selecting the principle components as the main effects in the model. We propose an effective selection strategy for the potential interactions following the first step. Specifically, we use one-parameter score test to detect the interactions one by one at the second step. Then, the final step is to perform LASSO by considering both the main effects and interactions selected at the first and second steps to obtain the final model. Our limited simulation studies showed that the proposed selection strategy using one-parameter score test for selection interactions can reduce the computation time in LASSO and raise the correct rate of selecting true variables in the model.en
dc.description.provenanceMade available in DSpace on 2021-06-07T18:06:24Z (GMT). No. of bitstreams: 1
ntu-101-R99849030-1.pdf: 718961 bytes, checksum: 4354012c2cc20bea1184d319243b7e09 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontents口試委員審定書 i
誌謝 ii
摘要 iii
Abstract v
目錄 vii
圖表目錄 viii
第一章 緒論 1
第一節 研究背景 1
第二節 研究目的 4
第二章 文獻探討 5
第一節 交互作用選取策略之文獻回顧 5
第二節 方法之文獻回顧 7
第三章 研究方法 15
第一節 三階段模式選取交互作用 15
第四章 模擬研究 22
第一節 模擬資料之生成 22
第二節 模擬結果 27
第五章 結果與討論 32
參考文獻 33
dc.language.isozh-TW
dc.subject分數檢定zh_TW
dc.subject高維度資料zh_TW
dc.subject交互作用zh_TW
dc.subject最小絕對值壓縮和選取zh_TW
dc.subject主成份分析zh_TW
dc.subjectLeast absolute shrinkage and selection operatoren
dc.subjectScore testen
dc.subjectHigh-dimensional dataen
dc.subjectInteraction effecten
dc.subjectPrincipal component analysisen
dc.title高維度資料下三階段交互作用選取與模式建立策略zh_TW
dc.titleThree-Stage Model Selection for Detecting Interaction Effects in High-Dimensional Dataen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳秀熙,鄭明燕,戴政
dc.subject.keyword高維度資料,交互作用,最小絕對值壓縮和選取,主成份分析,分數檢定,zh_TW
dc.subject.keywordHigh-dimensional data,Interaction effect,Least absolute shrinkage and selection operator,Principal component analysis,Score test,en
dc.relation.page35
dc.rights.note未授權
dc.date.accepted2012-07-24
dc.contributor.author-college公共衛生學院zh_TW
dc.contributor.author-dept流行病學與預防醫學研究所zh_TW
Appears in Collections:流行病學與預防醫學研究所

Files in This Item:
File SizeFormat 
ntu-101-1.pdf
  Restricted Access
702.11 kBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved