Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34098
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor趙坤茂
dc.contributor.authorChia-Jung Changen
dc.contributor.author張家榮zh_TW
dc.date.accessioned2021-06-13T05:54:15Z-
dc.date.available2007-07-20
dc.date.copyright2006-07-20
dc.date.issued2005
dc.date.submitted2006-07-02
dc.identifier.citation[1] Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.Y., Melhado, I., and Sham, P.C.
(2005) CLUSTAG: Hierarchical Clustering and Graph Methods for Selecting Tag
SNPs , Bioinformatics, 21(8), 1735-1736.
[2] Bafna, V., Halld´orsson, B.V., Schwartz, R., Clark, A.G., Istrail, S. (2003) Haplotypes
and Informative SNP Selection Algorithms: Don’t Block Out Information,
RECOMB Berlin, Germany, 19-27.
[3] Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A.
(2004) Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms
for Association Analyses Using Linkage Disequilibrium, Am. J. Hum. Genet., 74,
106-120.
[4] Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2001) Introduction to
Algorithms, The MIT Press.
[5] Crawfod, D.C. and Nickerson, D.A. (2005) Annu. Rev. Med, 56, 303-320.
[6] Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. (2001)
High-resolution Haplotype Structure in the Human Genome, Nat. Genet., 29(2),
229-232.
[7] de Bakker, P.I.W., Yelensky, R., Pe’er I., Gabriel, S.B., Daly, M.J., and Altshuler D.
(2005) Efficiency and Power in Genetic Association Studies, Nat. Genet., 37(11),
1217-1223.
[8] Garey, M.R. and Johnson, D.S. (1979) Computers and Intractability : A Guide to
the Theory of NP-completeness, WH Freeman and Company.
[9] The Genome Sequencing Project (2001) Initial sequencing and analysis of human
genome, Nature, 409, 860-921.
[10] Halperin, E., Kimmel, G., and Shamir, R. (2005) Tag SNP selection in genotype
data for maximizing SNP prediction accuracy, Bioinformatics., 21, 195-203
[11] The International HapMap Consortium (2005) A Haplotype Map of the Human
Genome, Nature., 437, 1299-1320.
[12] He, J., Westbrooks, K. and Zelikovsky, A. 2005 Linear Reduction Methods for Tag
SNP Selection, International Journal on Bioinformatics Research and Applications,
1(3), 249-260.
[13] Helmuth, L. (2001) Genome Research: Map of the Human Genome 3.0., Science,
293(5530), 583-585.
[14] Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G.,
Frazer, K.A., and Cox, D.R. (2005) Whole-Genome Patterns of Common DNA
Variation in Three Human Populations, Science, 307, 1072-1079.
[15] Huang, Y.T., Zhang, K., Chen, T., and Chao, K.M. (2005) Selecting Additional Tag
SNPs for Tolerating Missing Data in Genotyping, BMC Bioinformatics, 6(263).
[16] Hudson, R.R. (2002) Generating Samples under a Wright-Fisher Neutral Model of
Genetic Variation, Bioinformatics, 18, 337-338.
[17] Lewin, B. (2004) Genes VIII, Pearson Prentice-Hall.
[18] Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R.,
Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., Nguyen, B.T., Norris,
M.C., Sheehan, J.B., Shen, N., Stern, D., Stokowski, R.P., Thomas, D.J., Trulson,
M.O., Vyas, K.R., Frazer, K.A., Fodor, S.P., and Cox, D.R. (2001) Blocks
of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human
Chromosome 21, Science, 294, 1719-1723.
[19] Qin, Z.S., Gopalakrishnan, S., Abecasis G.R. 2006 An Efficient Comprehensive
Search Algorithm for TagSNP Selection Using Linkage Disequilibrium Criteria,
Bioinformatics, 22, 220-225.
[20] Stephens, M., and Donnelly, P. (2003) A comparison of Bayesian methods for haplotype
reconstruction from population genotype data. American Journal of Human
Genetics, 73, 1162-1169.
[21] Stephens, M., and Scheet, P. (2005) Accounting for Decay of Linkage Disequilibrium
in Haplotype Inference and Missing-Data Imputation. American Journal of Human
Genetics, 76, 449-462.
[22] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., et al: (2001) The
sequence of the human genome, Science, 291(5507), 1304-1351.
[23] Zhang, K., Deng, M., Chen, T., Waterman, M.S., and Sun, F. (2002) A Dynamic
Programming Algorithm for Haplotype Partitioning, Proc. Nat. Acad. Sci., 99(11),
7335-7339.
[24] Zhang, K., Sun, F., Waterman, M.S., and Chen, T. (2003) Haplotype Block Partition
with Limited Resources and Applications to Human Chromosome 21 Haplotype
Data, Am. J. Hum. Genet., 73, 63-73.
[25] Zhang, K., Qin, Z.S., Liu, J.S., Chen, T., Waterman, M.S., and Sun, F. (2004a)
Haplotype Block Partition and Tag SNP Selection Using Genotype Data and Their
Applications to Association Studies, Genome Research, 14, 908-916.
[26] Zhang, P., Sheng, H. and Uehara, R. (2004b) A double classification tree search
algorithm for index SNP selection, BMC Bioinformatics, 5(89).
[27] Zhao, W., Fanning, M.L., and Lane, T. (2005) Efficient RNAi-Based Gene Family
Knockdown via Set Cover Optimization, Artif. Intell. Med., 35(1-2), 61-73.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34098-
dc.description.abstract近期研究顯示,觀察人類族群的連鎖不平衡(Linkage Disequilibrium ; LD)形式可發現其形成類似區塊的結構。整條染色體可被切割成高連鎖不平衡區和低連鎖不平衡區互為間隔。 其中高連鎖不平衡區被稱為單體型區塊(haplotype block)。在單體型區塊中,單體型的樣式數量有限。因此,只需要少量的單核苷酸多型性(Single Nucleotide Polymorphism ;SNP)就足以辨別出各種單體型樣式。而這些少量的SNPs稱為標籤SNPs
為了尋找最少量的標籤SNPs,我們提出一個結合分支設限演算法(branch-and-bound)和貪婪演算法 (greedy algorithm) 的方法。該方法探索更大的解空間,以得到比傳統的貪婪演算法更好的解。它還允許使用者在效率和最佳解之間做取捨。這個演算法經由我們實做已經在各種模擬和生物的數據做測試。實驗結果指出, 比起之前的方法, 我們的方法能找到更少量的標籤SNPs。這個方法還可以相當普遍地被應用在其他貪婪演算法可解的問題上面。
另外,藉由結合一條染色體上任兩兩SNPs的關連性的資料,我們可以減少標籤 SNPs的數量。某些標籤SNPs 和其他標籤SNPs擁有完全的關連性。如此,可從完全關連性推斷其值的標籤SNPs可以從原先找到的標籤SNPs中刪除。依這個觀念, 我們提出了兩個方法以減少標籤SNPs的數量。 第一個方法是對現有的演算法所找到的標籤SNPs做後製處理。第二個方法則是一開始找標籤SNPs的時候就考慮SNPs間的關連性。實驗結果顯示,兩種方法都可以減少標籤SNPs的數量而不會損害原有標籤SNPs所包含的資訊。
zh_TW
dc.description.abstractRecent studies have shown that the patterns of Linkage Disequilibrium (LD) observed
in the human population reveal a block-like structure. The entire chromosome can be
partitioned into high LD regions interspersed by low LD regions. The high LD regions
are usually called “haplotype blocks”. Within a haplotype block, there are only few
haplotype patterns and only a small subset of SNPs (called tag SNPs) are sufficient to
distinguish these patterns.
To solve the problem of finding tag SNPs, we propose a hybrid method that combines
the ideas of the branch-and-bound method and the greedy algorithm. This method
explores larger solution space to obtain a better solution than a traditional greedy algorithm.
It also allows the user to adjust the efficiency of the program and quality of
solutions. This algorithm has been implemented and tested on a variety of simulated
and biological data. The experimental results indicate that our program can find better
solutions than previous methods. This approach is quite general since it can be used to
adapt other greedy algorithms to solving their corresponding problems.
In addition, we can reduce the number of tag SNPs even more by considering the
extent of linkage disequilibrium in the human genome. We show that the extent of LD
can be also used to boost the heavy computation of computation of pairwise LD by giving
a faster algorithm. We propose two methods of which the first is a posterior approach
based one existing algorithms and the second identifies tag SNPs by considering the
correlation between SNPs from the beginning. The experimental results show that our
methods can reduce the number of tags SNPs in comparison with previous methods and the efficiency is significantly improved.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T05:54:15Z (GMT). No. of bitstreams: 1
ntu-94-R93922040-1.pdf: 726341 bytes, checksum: b54dd9bac06c5d823cf9658b3f06ebf9 (MD5)
Previous issue date: 2005
en
dc.description.tableofcontentsTitle ii
Tabel of Contents ii
Acknowledgements iv
Abstract in Chinese v
Abstract vi
1 Introduction 1
2 A Greedier Approach for Finding Tag SNPs 4
2.1 The Problem of Minimizing the Number of Tag SNPs . . . . . . . . . . . 5
2.2 Greedy-Partition-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Improvement of Efficiency and Solutions . . . . . . . . . . . . . . . . . . 10
2.3.1 Improvement of Efficiency . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Improvement of Solutions . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Experimental Results on Simulated Data . . . . . . . . . . . . . . 11
2.4.2 Experimental Results on Biological Data . . . . . . . . . . . . . . 13
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Tradeoff between Efficiency and Solutions of GPT . . . . . . . . . 17
2.5.2 Comparison with MLR-Tagging . . . . . . . . . . . . . . . . . . . 19
3 Algorithms for Reducing the Number of Tag SNPs by the Perfect Proxy
Set 21
3.1 Minimizing the Number of LD Bins . . . . . . . . . . . . . . . . . . . . . 22
3.2 The Comparison between the Block-Based Method and the LD-Based
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Methods for Reducing the Number of Tag SNPs by the Perfect Proxy Set 26
3.3.1 The First Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 The Second Method . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Reducing the Number of Global Tag SNPs by Relaxing the Threshold . . 34
4 Conclusion 39
dc.language.isoen
dc.subject單核&#33527zh_TW
dc.subject單體型zh_TW
dc.subject連鎖不平衡zh_TW
dc.subject酸多型性zh_TW
dc.subjectHaplotypeen
dc.subjectSingle Nucleotide Polymorphismen
dc.subjectLinkage Disequilibriumen
dc.title選擇標籤單核苷酸多型性的改良演算法zh_TW
dc.titleImproved Algorithms for the Selection of Tag SNPsen
dc.typeThesis
dc.date.schoolyear94-2
dc.description.degree碩士
dc.contributor.oralexamcommittee姜濤,張雅惠
dc.subject.keyword單核&#33527,酸多型性,連鎖不平衡,單體型,zh_TW
dc.subject.keywordSingle Nucleotide Polymorphism,Linkage Disequilibrium,Haplotype,en
dc.relation.page43
dc.rights.note有償授權
dc.date.accepted2006-07-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
709.32 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved