選擇標籤單核苷酸多型性的改良演算法

Chia-Jung Chang; 張家榮

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34098

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	趙坤茂
dc.contributor.author	Chia-Jung Chang	en
dc.contributor.author	張家榮	zh_TW
dc.date.accessioned	2021-06-13T05:54:15Z	-
dc.date.available	2007-07-20
dc.date.copyright	2006-07-20
dc.date.issued	2005
dc.date.submitted	2006-07-02
dc.identifier.citation	[1] Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.Y., Melhado, I., and Sham, P.C. (2005) CLUSTAG: Hierarchical Clustering and Graph Methods for Selecting Tag SNPs , Bioinformatics, 21(8), 1735-1736. [2] Bafna, V., Halld´orsson, B.V., Schwartz, R., Clark, A.G., Istrail, S. (2003) Haplotypes and Informative SNP Selection Algorithms: Don’t Block Out Information, RECOMB Berlin, Germany, 19-27. [3] Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A. (2004) Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium, Am. J. Hum. Genet., 74, 106-120. [4] Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2001) Introduction to Algorithms, The MIT Press. [5] Crawfod, D.C. and Nickerson, D.A. (2005) Annu. Rev. Med, 56, 303-320. [6] Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. (2001) High-resolution Haplotype Structure in the Human Genome, Nat. Genet., 29(2), 229-232. [7] de Bakker, P.I.W., Yelensky, R., Pe’er I., Gabriel, S.B., Daly, M.J., and Altshuler D. (2005) Efficiency and Power in Genetic Association Studies, Nat. Genet., 37(11), 1217-1223. [8] Garey, M.R. and Johnson, D.S. (1979) Computers and Intractability : A Guide to the Theory of NP-completeness, WH Freeman and Company. [9] The Genome Sequencing Project (2001) Initial sequencing and analysis of human genome, Nature, 409, 860-921. [10] Halperin, E., Kimmel, G., and Shamir, R. (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy, Bioinformatics., 21, 195-203 [11] The International HapMap Consortium (2005) A Haplotype Map of the Human Genome, Nature., 437, 1299-1320. [12] He, J., Westbrooks, K. and Zelikovsky, A. 2005 Linear Reduction Methods for Tag SNP Selection, International Journal on Bioinformatics Research and Applications, 1(3), 249-260. [13] Helmuth, L. (2001) Genome Research: Map of the Human Genome 3.0., Science, 293(5530), 583-585. [14] Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., and Cox, D.R. (2005) Whole-Genome Patterns of Common DNA Variation in Three Human Populations, Science, 307, 1072-1079. [15] Huang, Y.T., Zhang, K., Chen, T., and Chao, K.M. (2005) Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping, BMC Bioinformatics, 6(263). [16] Hudson, R.R. (2002) Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation, Bioinformatics, 18, 337-338. [17] Lewin, B. (2004) Genes VIII, Pearson Prentice-Hall. [18] Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., Nguyen, B.T., Norris, M.C., Sheehan, J.B., Shen, N., Stern, D., Stokowski, R.P., Thomas, D.J., Trulson, M.O., Vyas, K.R., Frazer, K.A., Fodor, S.P., and Cox, D.R. (2001) Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21, Science, 294, 1719-1723. [19] Qin, Z.S., Gopalakrishnan, S., Abecasis G.R. 2006 An Efficient Comprehensive Search Algorithm for TagSNP Selection Using Linkage Disequilibrium Criteria, Bioinformatics, 22, 220-225. [20] Stephens, M., and Donnelly, P. (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73, 1162-1169. [21] Stephens, M., and Scheet, P. (2005) Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics, 76, 449-462. [22] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., et al: (2001) The sequence of the human genome, Science, 291(5507), 1304-1351. [23] Zhang, K., Deng, M., Chen, T., Waterman, M.S., and Sun, F. (2002) A Dynamic Programming Algorithm for Haplotype Partitioning, Proc. Nat. Acad. Sci., 99(11), 7335-7339. [24] Zhang, K., Sun, F., Waterman, M.S., and Chen, T. (2003) Haplotype Block Partition with Limited Resources and Applications to Human Chromosome 21 Haplotype Data, Am. J. Hum. Genet., 73, 63-73. [25] Zhang, K., Qin, Z.S., Liu, J.S., Chen, T., Waterman, M.S., and Sun, F. (2004a) Haplotype Block Partition and Tag SNP Selection Using Genotype Data and Their Applications to Association Studies, Genome Research, 14, 908-916. [26] Zhang, P., Sheng, H. and Uehara, R. (2004b) A double classification tree search algorithm for index SNP selection, BMC Bioinformatics, 5(89). [27] Zhao, W., Fanning, M.L., and Lane, T. (2005) Efficient RNAi-Based Gene Family Knockdown via Set Cover Optimization, Artif. Intell. Med., 35(1-2), 61-73.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34098	-
dc.description.abstract	近期研究顯示，觀察人類族群的連鎖不平衡（Linkage Disequilibrium ; LD）形式可發現其形成類似區塊的結構。整條染色體可被切割成高連鎖不平衡區和低連鎖不平衡區互為間隔。其中高連鎖不平衡區被稱為單體型區塊(haplotype block)。在單體型區塊中，單體型的樣式數量有限。因此，只需要少量的單核苷酸多型性（Single Nucleotide Polymorphism ;SNP）就足以辨別出各種單體型樣式。而這些少量的SNPs稱為標籤SNPs 為了尋找最少量的標籤SNPs，我們提出一個結合分支設限演算法(branch-and-bound)和貪婪演算法 (greedy algorithm) 的方法。該方法探索更大的解空間，以得到比傳統的貪婪演算法更好的解。它還允許使用者在效率和最佳解之間做取捨。這個演算法經由我們實做已經在各種模擬和生物的數據做測試。實驗結果指出，比起之前的方法，我們的方法能找到更少量的標籤SNPs。這個方法還可以相當普遍地被應用在其他貪婪演算法可解的問題上面。另外，藉由結合一條染色體上任兩兩SNPs的關連性的資料，我們可以減少標籤 SNPs的數量。某些標籤SNPs 和其他標籤SNPs擁有完全的關連性。如此，可從完全關連性推斷其值的標籤SNPs可以從原先找到的標籤SNPs中刪除。依這個觀念，我們提出了兩個方法以減少標籤SNPs的數量。第一個方法是對現有的演算法所找到的標籤SNPs做後製處理。第二個方法則是一開始找標籤SNPs的時候就考慮SNPs間的關連性。實驗結果顯示，兩種方法都可以減少標籤SNPs的數量而不會損害原有標籤SNPs所包含的資訊。	zh_TW
dc.description.abstract	Recent studies have shown that the patterns of Linkage Disequilibrium (LD) observed in the human population reveal a block-like structure. The entire chromosome can be partitioned into high LD regions interspersed by low LD regions. The high LD regions are usually called “haplotype blocks”. Within a haplotype block, there are only few haplotype patterns and only a small subset of SNPs (called tag SNPs) are sufficient to distinguish these patterns. To solve the problem of finding tag SNPs, we propose a hybrid method that combines the ideas of the branch-and-bound method and the greedy algorithm. This method explores larger solution space to obtain a better solution than a traditional greedy algorithm. It also allows the user to adjust the efficiency of the program and quality of solutions. This algorithm has been implemented and tested on a variety of simulated and biological data. The experimental results indicate that our program can find better solutions than previous methods. This approach is quite general since it can be used to adapt other greedy algorithms to solving their corresponding problems. In addition, we can reduce the number of tag SNPs even more by considering the extent of linkage disequilibrium in the human genome. We show that the extent of LD can be also used to boost the heavy computation of computation of pairwise LD by giving a faster algorithm. We propose two methods of which the first is a posterior approach based one existing algorithms and the second identifies tag SNPs by considering the correlation between SNPs from the beginning. The experimental results show that our methods can reduce the number of tags SNPs in comparison with previous methods and the efficiency is significantly improved.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T05:54:15Z (GMT). No. of bitstreams: 1 ntu-94-R93922040-1.pdf: 726341 bytes, checksum: b54dd9bac06c5d823cf9658b3f06ebf9 (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	Title ii Tabel of Contents ii Acknowledgements iv Abstract in Chinese v Abstract vi 1 Introduction 1 2 A Greedier Approach for Finding Tag SNPs 4 2.1 The Problem of Minimizing the Number of Tag SNPs . . . . . . . . . . . 5 2.2 Greedy-Partition-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Improvement of Efficiency and Solutions . . . . . . . . . . . . . . . . . . 10 2.3.1 Improvement of Efficiency . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Improvement of Solutions . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Experimental Results on Simulated Data . . . . . . . . . . . . . . 11 2.4.2 Experimental Results on Biological Data . . . . . . . . . . . . . . 13 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5.1 Tradeoff between Efficiency and Solutions of GPT . . . . . . . . . 17 2.5.2 Comparison with MLR-Tagging . . . . . . . . . . . . . . . . . . . 19 3 Algorithms for Reducing the Number of Tag SNPs by the Perfect Proxy Set 21 3.1 Minimizing the Number of LD Bins . . . . . . . . . . . . . . . . . . . . . 22 3.2 The Comparison between the Block-Based Method and the LD-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Methods for Reducing the Number of Tag SNPs by the Perfect Proxy Set 26 3.3.1 The First Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.2 The Second Method . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Reducing the Number of Global Tag SNPs by Relaxing the Threshold . . 34 4 Conclusion 39
dc.language.iso	en
dc.subject	單核&#33527	zh_TW
dc.subject	單體型	zh_TW
dc.subject	連鎖不平衡	zh_TW
dc.subject	酸多型性	zh_TW
dc.subject	Haplotype	en
dc.subject	Single Nucleotide Polymorphism	en
dc.subject	Linkage Disequilibrium	en
dc.title	選擇標籤單核苷酸多型性的改良演算法	zh_TW
dc.title	Improved Algorithms for the Selection of Tag SNPs	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	姜濤,張雅惠
dc.subject.keyword	單核&#33527,酸多型性,連鎖不平衡,單體型,	zh_TW
dc.subject.keyword	Single Nucleotide Polymorphism,Linkage Disequilibrium,Haplotype,	en
dc.relation.page	43
dc.rights.note	有償授權
dc.date.accepted	2006-07-03
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	709.32 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。