請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9380完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 歐陽彥正 | |
| dc.contributor.author | Chih-Wei Lin | en |
| dc.contributor.author | 林志瑋 | zh_TW |
| dc.date.accessioned | 2021-05-20T20:19:57Z | - |
| dc.date.available | 2011-08-18 | |
| dc.date.available | 2021-05-20T20:19:57Z | - |
| dc.date.copyright | 2011-08-18 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-08-11 | |
| dc.identifier.citation | 1. Wrzodek, C., et al., ModuleMaster: A new tool to decipher transcriptional regulatory networks. Biosystems, 2010. 99(1): p. 79-81.
2. Rodionov, D.A., Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chemical Reviews, 2007. 107(8): p. 3467-3497. 3. Bonneau, R., et al., The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology, 2006. 7(5). 4. Alamanova, D., P. Stegmaier, and A. Kel, Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. Bmc Bioinformatics, 2010. 11: p. -. 5. Morozov, A.V., et al., Protein-DNA binding specificity predictions with structural models. Nucleic acids research, 2005. 33(18): p. 5781-98. 6. Morozov, A.V. and E.D. Siggia, Connecting protein structure with predictions of regulatory sites. Proceedings of the National Academy of Sciences of the United States of America, 2007. 104(17): p. 7068-73. 7. Zhou, Y.Q., et al., An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins-Structure Function and Bioinformatics, 2009. 76(3): p. 718-730. 8. Schroder, A., et al., Predicting DNA-binding specificities of eukaryotic transcription factors. PloS one, 2010. 5(11): p. e13876. 9. Berger, M.F., et al., Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell, 2008. 133(7): p. 1266-1276. 10. Alleyne, T.M., et al., Predicting the binding preference of transcription factors to individual DNA k-mers. Bioinformatics, 2009. 25(8): p. 1012-1018. 11. Wolber, G., et al., The Protein Data Bank (PDB), Its Related Services and Software Tools as Key Components for In Silico Guided Drug Discovery. Journal of Medicinal Chemistry, 2008. 51(22): p. 7021-7040. 12. Robertson, T.A. and G. Varani, An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure. Proteins-Structure Function and Bioinformatics, 2007. 66(2): p. 359-374. 13. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 1997. 25(17): p. 3389-402. 14. Zhang, Y. and J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research, 2005. 33(7): p. 2302-2309. 15. Shindyalov, I.N. and P.E. Bourne, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, 1998. 11(9): p. 739-747. 16. Holm, L. and C. Sander, Protein-Structure Comparison by Alignment of Distance Matrices. Journal of Molecular Biology, 1993. 233(1): p. 123-138. 17. Kihara, D. and J. Skolnick, The PDB is a covering set of small protein structures. Journal of Molecular Biology, 2003. 334(4): p. 793-802. 18. Zhang, Y. and J. Skolnick, Scoring function for automated assessment of protein structure template quality. Proteins-Structure Function and Bioinformatics, 2004. 57(4): p. 702-710. 19. Matys, V., et al., TRANSFAC (R) and its module TRANSCompel (R): transcriptional gene regulation in eukaryotes. Nucleic acids research, 2006. 34: p. D108-D110. 20. Tsai, H.K., et al., MYBS: a comprehensive web server for mining transcription factor binding sites in yeast. Nucleic acids research, 2007. 35: p. W221-W226. 21. Chan, W.M. and U. Consortium, The UniProt Knowledgebase (UniProtKB): a freely accessible, comprehensive and expertly curated protein sequence database. Genetics Research, 2010. 92(1): p. 78-79. 22. Chen, C.Y., W.C. Chung, and C.T. Su, Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies. Pattern Recognition, 2006. 39(12): p. 2356-2369. 23. Tsai, H.K., et al., Method for identifying transcription factor binding sites in yeast. Bioinformatics, 2006. 22(14): p. 1675-81. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9380 | - |
| dc.description.abstract | 結合特定DNA序列的蛋白質在基因調控中扮演重要的角色,利用計算方法預測或設計生物實驗尋找這些DNA結合蛋白質的標的序列可以幫助我們了解基因調控如何進行,並解釋基因組中序列的變異如何擾亂正常的基因表現。位置頻率矩陣 (position frequency matrix) 是最常被拿來描述這些標的序列的模型,對大部分的物種而言,截至目前為止,只有一小部分的轉錄因子已經從相關生物實驗中取得這樣的模型。由於生物實驗往往需要高資金與人力成本,因此,如何利用計算方法準確預測位置頻率矩陣,加速這個研究領域的進展,一直以來是生物資訊學家非常關心的研究議題之一。這篇論文針對這個問題,提出一個利用蛋白質DNA複合物結構與紀錄不同胺基酸和核酸之間結合偏好的知識庫去預測DNA結合蛋白之標的序列的新方法。當我們拿到一條蛋白質序列,會先挑選一個適當的樣板複合物結構,接著利用該樣板與所得之知識庫進行位置頻率矩陣的預測。
這篇論文使用了兩組資料去評估新方法的表現,和其他利用三級結構的方法比較起來,這篇論文提出的新方法可以達到和它們一樣的預測效果;但若與另一個同樣以序列資訊為基礎且利用已知位置頻率矩陣訓練所得之預測模型相比,本論文所提之方法表現略差。由於現存這些以序列資訊為基礎的預測方法仍各有其侷限處,本論文所提之方法,仍可幫助一些相關的研究,針對其同源序列已有蛋白質DNA複合物結構之蛋白質序列預測其標的序列,所得之預測結果將有助於相關研究之進行。 | zh_TW |
| dc.description.abstract | Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase.
The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now. | en |
| dc.description.provenance | Made available in DSpace on 2021-05-20T20:19:57Z (GMT). No. of bitstreams: 1 ntu-100-R98922116-1.pdf: 1183043 bytes, checksum: 7deeca6f3404da88d30da7e82c45f63a (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | 口試委員會審定書 #
誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS v Tables vii Figures viii Chapter 1 Introduction 1 Chapter 2 Literature Review 4 2.1 Algorithms for predicting protein-DNA binding specificities 4 2.1.1 Predicting the binding preference of DNA-binding proteins 4 2.1.2 Predicting PFM by homologues’ annotated PFMs 5 2.1.3 Predicting PFM based on structural model and potential functions 6 2.2 Algorithms of sequence alignment 10 2.2.1 BLAST 10 2.3 Algorithms of structure alignment 11 2.3.1 TM-align 11 Chapter 3 Methods 13 3.1 Materials 13 3.1.1 Collection of protein-DNA complex structures 13 3.1.2 Collection of PFMs 14 3.1.3 Relating PFMs to protein-DNA complex structures 14 3.2 Building the knowledgebase 14 3.3 Prediction framework 16 3.3.1 Template selection and contact residue substitution 18 3.3.2 Building the predicted PFM by DNA sequence in the template 18 3.3.3 Refining the PFM by knowledgebase 18 Chapter 4 Results 21 4.1 Measuring performance 21 4.2 Validation sets 21 4.2.1 Training data of SABINE 21 4.2.2 Protein-DNA complexes with annotated PFMs 22 4.3 Performance 22 4.3.1 Training data of SABINE 22 4.3.2 Protein-DNA complexes with annotated PFMs 25 4.4 Evaluating SABINE 30 4.5 Discussion 30 4.5.1 Differences between DNA sequences in protein-DNA complex structures and their annotated PFMs 30 4.5.2 The effect of different contact distance cut-off 32 4.5.3 How to select a template 33 4.5.4 Similar protein sequences bind similar DNA sequences 33 4.5.5 Using the number of contact atoms of contact residues 34 4.5.6 The frequency of amino acids and nucleotides 35 Chapter 5 Conclusions 38 REFERENCE 40 | |
| dc.language.iso | en | |
| dc.title | 從一級結構預測DNA結合蛋白之標的序列 | zh_TW |
| dc.title | Predicting target sequences of DNA-binding proteins based on primary structure | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 99-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 陳倩瑜 | |
| dc.contributor.oralexamcommittee | 黃乾綱,張天豪 | |
| dc.subject.keyword | DNA結合蛋白, | zh_TW |
| dc.subject.keyword | DNA-binding protein, | en |
| dc.relation.page | 41 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2011-08-11 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf | 1.16 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
