Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65811
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪弘(Hung Hung)
dc.contributor.authorJia-Rou Liuen
dc.contributor.author劉佳柔zh_TW
dc.date.accessioned2021-06-17T00:12:38Z-
dc.date.available2017-09-17
dc.date.copyright2012-09-17
dc.date.issued2012
dc.date.submitted2012-07-11
dc.identifier.citation[1] Alvo, M., Liu, Z., Williams, A., and Yauk, C. (2010). Testing for mean and correlation changes in microarray experiments: an application for pathway analysis. BMC Bioinformatics, 11, 60.
[2] Avvakumov, N., and Cote, J. (2007). The MYST family of histone acetyltransferases and their intimate links to cancer. Oncogene, 26, 5395-5407.
[3] Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263-265.
[4] Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
[5] Breitling, R., Armengaud, P., Amtmann, A., and Herzyk, P. (2004). Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Letters, 573, 83-92.
[6] Chang, F., and Chen, J.-C. (2010). An adaptive multiple feature subset method for feature ranking and selection. In Proceedings of the 2010 International Conference on Technologies and Applications of Artificial Intelligence, 255-262: IEEE Computer Society.
[7] Chu, T. T., Liu, Y., and Kemether, E. (2009). Thalamic transcriptome screening in three psychiatric states. Journal of human genetics, 54, 665-675.
[8] Cook, R. D., and Yin, X. (2001). Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion). Australian and New Zealand Journal of Statistics, 43, 147-199.
[9] Cunningham, P. (2008). Dimension reduction. Machine learning techniques for multimedia, 91-112.
[10] DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680-686.
[11] Fujii, T., Uchiyama, H., Yamamoto, N., et al. (2011). Possible association of the semaphorin 3D gene (SEMA3D) with schizophrenia. Journal of psychiatric research, 45, 47-53.
[12] Guyon, I., and Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157-1182.
[13] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24, 417-441, 498-520.
[14] John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, 121-129: San Francisco.
[15] Kohavi, R., and John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97, 273-324.
[16] Kuo, P.-H., Liu, J. R., Lu, M. K., Lu, R. B., Hung, H. (2011). A genome-wide association study of bipolar disorder using DNA pooling. Asian Journal of Psychiatry, 4 Supplement 1, S38
[17] Manolio, T. A., Rodriguez, L. L., Brooks, L., et al. (2007). New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature genetics, 39, 1045-1051.
[18] Mexal, S., Frank, M., Berger, R., et al. (2005). Differential modulation of gene expression in the NMDA postsynaptic density of schizophrenic and control
smokers. Molecular brain research, 139, 317-332.
[19] Purcell, S., Neale, B., Todd-Brown, K., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559-575.
[20] Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.
[21] Saeys, Y., Inza, I., and Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507-2517.
[22] Sullivan, P. F., de Geus, E. J. C., Willemsen, G., et al. (2009). Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Molecular Psychiatry, 14, 359-375.
[23] Tusher, V. G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National
Academy of Sciences of the United States of America, 98, 5116-5121.
[24] Viding, E., Hanscombe, K. B., Curtis, C. J. C., Davis, O. S. P., Meaburn, E. L., and Plomin, R. (2010). In search of genes associated with risk for psychopathic tendencies in children: a two-stage genome-wide association study of pooled DNA. Journal of Child Psychology and Psychiatry, 51, 780-788.
[25] Zhan, L., Kerr, J., Lafuente, M. J., et al. (2011). Altered expression and coregulation of dopamine signalling genes in schizophrenia and bipolar disorder. Neuropathology and applied neurobiology, 37, 206-219.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65811-
dc.description.abstract近年來隨著研究技術的蓬勃發展, 研究者愈來愈容易取得同時含有成千上萬個變項個數的資料庫, 使得樣本個數相較之下變得非常小。在這種變項個數遠大於樣本個數的情況之下, 傳統常用來偵測兩組差異的 t 統計量會因為變異估計不夠穩定而不太適用。另一方面, 同樣是用來偵測兩組差異的 ROC 曲線下面積 (AUC), 雖然屬於較不受分配限制的無母數方法, 仍然會因為重覆數值出現的頻率太高, 造成排序挑選的困擾。為了兼顧檢定力和穩健力, 改變傳統給定排序值的方法, 將其重新定義為在同一樣本內不同變項之間的排序, 會更加適用。在此研究中, 我們提出一種重覆排序方法, 以「rank-over-variable」概念為基礎, 再配合「random subset」和「re-rank」兩種技巧, 可用來幫助研究者在分析變項個數遠大於樣本個數的資料型態時,能有效挑選出在兩組間有差異的變項。為了評估此方法,我們以 GAIN-MDD 資料檔為基礎進行模擬分析,驗證相較於 t 統計量和 AUC,我們所提出的重覆排序方法能更有效地偵測出真正在兩組間有差異的變項,同時也較不容易受到小樣本數和實驗誤差的影響。最後, 我們實際將新方法應用於混合樣本之全基因體掃描研究, 偵測出可能與雙極性情感疾病相關的基因, 提供研究者進行更進一步的探討。zh_TW
dc.description.abstractRecently, more and more researches encounter the problem where the data objects have an extremely large number of variables while the available sample size is relatively small. To detect the difference between two populations in this situation, the widely used two sample t-test would fail to apply due to its instability in estimating variances. The non-parametric counterpart, AUC, will face the problem of tied values and also fail. To improve the detection power while keeping the robustness, the idea of ``rank-over-variable' is more appropriate to analyze large-p-small-n datasets. In this study, we propose a robust re-rank approach to overcome the above-mentioned difficulties and reduce the influence of enormous features in the large-$p$-small-$n$ situation. In particular, we obtain a rank-based statistic for each feature based on the concept of 'rank-over-variable'. Techniques of 'random subset' and 're-rank' are then iteratively applied to ranking features. Finally, the leading features in the constructed ranking list will be selected for further research. To evaluate the performance of our proposed re-rank approach, we conduct several simulation studies based on the GAIN-MDD dataset. Compared with the t-statistic and AUC, our re-rank approach is able to identify more pre-defined truly relevant SNPs and robust for different pool number and pooling error. Furthermore, we also demonstrate a real data analysis to explore the markers associated with bipolar disorder.en
dc.description.provenanceMade available in DSpace on 2021-06-17T00:12:38Z (GMT). No. of bitstreams: 1
ntu-101-R99849024-1.pdf: 2840311 bytes, checksum: 2fe249cab5296a4b374a8d1d0e0770b7 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontents誌謝 I
中文摘要 II
Abstract III
Contents V
List of Figures VI
List of Tables VII
1 Introduction 1
2 Inference Procedure 7
2.1 Re-Rank Approach . . . . . . . . . . . . . . . . . 7
2.2 Prior screening for re-rank approach . . . . . . . 12
2.3 Selection of M1 . . . . . . . . . . . . . . . . . . 14
3 Numerical Analysis 18
3.1 Simulation studies using GAIN-MDD dataset . . . . . 19
3.2 Bipolar dataset . . . . . . . . . . . . . . . . . . 24
4 Discussion 32
Bibliography 36
A The top 100 SNPs from Stage-1 39
B Matlab Code 43
dc.language.isoen
dc.subject過濾法zh_TW
dc.subjectrandom subsetzh_TW
dc.subject降維度分析zh_TW
dc.subject特徵選取zh_TW
dc.subject大p小nzh_TW
dc.subjectrank-over-variablezh_TW
dc.subjectrandom subseten
dc.subjectdimension reductionen
dc.subjectfeature selectionen
dc.subjectfilter methoden
dc.subjectrank-over-variableen
dc.subjectlarge-p-small-nen
dc.title利用穩健重覆排序方法偵測表現差異及其應用於分析混合樣本之全基因體掃描資料zh_TW
dc.titleA Robust Re-Rank Approach with Application to Pooling-Based GWA Study Dataen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李文宗(Wen-Chung Lee),蕭朱杏(Chuhsing Kate Hsiao),郭柏秀(Po-Hsiu Kuo)
dc.subject.keyword大p小n,降維度分析,特徵選取,過濾法,rank-over-variable,random subset,zh_TW
dc.subject.keywordlarge-p-small-n,dimension reduction,feature selection,filter method,rank-over-variable,random subset,en
dc.relation.page44
dc.rights.note有償授權
dc.date.accepted2012-07-11
dc.contributor.author-college公共衛生學院zh_TW
dc.contributor.author-dept流行病學與預防醫學研究所zh_TW
顯示於系所單位:流行病學與預防醫學研究所

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
2.77 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved