結合ChIP-Seq和蛋白質結構分析蛋白質序列、結構和DNA結合序列特徵之相關性

Wen-Ting Wang; 王文廷

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	陳倩瑜(Chien-Yu Chen)
dc.contributor.author	Wen-Ting Wang	en
dc.contributor.author	王文廷	zh_TW
dc.date.accessioned	2021-06-08T03:28:34Z	-
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-19
dc.identifier.citation	1. Lin, C.-K. and C.-Y. Chen, PiDNA: predicting protein–DNA interactions with structural models. Nucleic Acids Research, 2013. 41(W1): p. W523-W530. 2. Bailey, T.L., et al., MEME Suite: tools for motif discovery and searching. Nucleic Acids Research, 2009. 37(suppl_2): p. W202-W208. 3. Matys, V., et al., TRANSFAC ® : transcriptional regulation, from patterns to profiles. Nucleic Acids Research, 2003. 31(1): p. 374-378. 4. Alipanahi, B., et al., Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nature Biotechnology, 2015. 33: p. 831. 5. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 2004. 306(5696): p. 636. 6. Park, P.J., ChIP–seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 2009. 10: p. 669. 7. Bernstein, F.C., et al., The protein data bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 1977. 112(3): p. 535-542. 8. McGinnis, S. and T.L. Madden, BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 2004. 32(suppl_2): p. W20-W25. 9. Zhang, Y. and J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 2005. 33(7): p. 2302-2309. 10. Crick, F., Central Dogma of Molecular Biology. Nature, 1970. 227(5258): p. 561-563. 11. Gonzalez, D.H., Introduction to transcription factor structure and function, in Plant Transcription Factors. 2016, Elsevier. p. 3-11. 12. Hollenhorst, P.C., L.P. McIntosh, and B.J. Graves, Genomic and Biochemical Insights into the Specificity of ETS Transcription Factors. Annual Review of Biochemistry, 2011. 80(1): p. 437-471. 13. Hsu, C.-M., C.-Y. Chen, and B.-J. Liu, WildSpan: mining structured motifs from protein sequences. Algorithms for Molecular Biology, 2011. 6(1): p. 6. 14. Lis, M. and D. Walther, The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics, 2016. 17(1): p. 185. 15. Bank, P.D., Protein data bank. Nature New Biol, 1971. 233: p. 223. 16. Pietrokovski, S., Searching Databases of Conserved Sequence Regions by Aligning Protein Multiple-Alignments. Nucleic Acids Research, 1996. 24(19): p. 3836-3845. 17. Wang, T. and G.D. Stormo, Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 2003. 19(18): p. 2369-2380. 18. Schones, D.E., P. Sumazin, and M.Q. Zhang, Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics, 2004. 21(3): p. 307-313. 19. Gupta, S., et al., Quantifying similarity between motifs. Genome biology, 2007. 8(2): p. R24. 20. Skolnick, J., J.S. Fetrow, and A. Kolinski, Structural genomics and its importance for gene function analysis. Nature biotechnology, 2000. 18(3): p. 283. 21. Baker, D. and A. Sali, Protein structure prediction and structural genomics. Science, 2001. 294(5540): p. 93-96. 22. Zhang, Y. and J. Skolnick, Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 2004. 57(4): p. 702-710. 23. Levitt, M. and M. Gerstein, A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of sciences, 1998. 95(11): p. 5913-5920. 24. Gordân, R., et al., Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell reports, 2013. 3(4): p. 1093-1104. 25. R Development Core Team, R., R: A language and environment for statistical computing. 2011, R foundation for statistical computing Vienna, Austria. 26. Xu, J. and Y. Zhang, How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics, 2010. 26(7): p. 889-895.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200	-
dc.description.abstract	分子生物學中心法則的大意是：去氧核醣核酸(DNA)製造核醣核酸(RNA)，RNA製造蛋白質。而蛋白質會輔助上述兩項流程，其中轉錄因子與DNA的結合是基因調控的主要環節，進而調控細胞的不同表現，也因此轉錄因子會與哪些轉錄因子結合位結合，是問題的重點。近年來，蛋白質與DNA共存的結構資料日益增加，給了我們許多關於DNA與蛋白質交互作用的資訊；然而，透過觀察可以得到DNA與蛋白質間的交互作用並非簡單的一對一的鹼基與殘基關係，還需要考量到三維幾何結構上變化。本實驗室過去發表的PiDNA工具，針對PDB (Protein Data Bank, PDB)資料庫中的蛋白質-DNA複合物結構，進行DNA結合序列特徵的預測，提供結構與序列之間關連性。近年來，基於機器學習領域的蓬勃發展，同時生物資訊學領域的複雜性也讓資訊學家們深感興趣，便有了一系列深度學習於生物資訊領域的應用。其中，DeepBind使用了卷積神經網路 (CNN) 進行單一轉錄因子與DNA 序列的結合預測，其預測的準確度超越過去的其他預測工具，DeepBind的成功證明了使用深度學習能夠解決抓取結合序列特徵的問題。本研究中將選擇ENCODE資料庫的染色體免疫沉澱定序資料(Chromatin Immunoprecipitation Sequencing, ChIP-Seq)作為DNA序列資料輸入，並使用從PDB資料庫收集而得的蛋白質序列-DNA複合物結構資料，抓取蛋白質與DNA的結合序列特徵與結構相似程度，進一步分析在同一個PFam家族中的轉錄因子DNA結合域序列、轉錄因子結合序列特徵與轉錄因子結構之間的關聯性，並藉此來檢驗DeepBind是否能夠更好的辨別ChIP-Seq資料的結合集之間的異同。	zh_TW
dc.description.abstract	The binding of transcription factors to DNA is the main process of gene regulation. Transcription factors will bind to their binding sites, which is the focus of the problem. In recent years, the increasing structural data of protein and DNA complexes, giving us information about the interaction between DNA and protein. PiDNA, previously developed by our lab, used the structure of protein and DNA complexes in the PDB (Protein Data Bank) database to predict binding motifs. On the other hand, based on the advance of deep learning, information scientists have applied deep learning to many applications in the field of Bioinformatics. DeepBind used CNN (Convolution Neural Network) to demonstrate that the DNA sequence has binding characteristics that can be recognized by specific proteins. The success of DeepBind revealed the value of deep learning in characterizing binding sequences. In this study, ChIP-Seq (Chromatin Immunoprecipitation Sequencing) from the ENCODE database was collected as the DNA sequence data input, and the protein sequence and structural data collected from the PDB database were used to capture the binding sequence characteristics of DNA of proteins in the same family. This study further analyzed the relationship between DNA binding domain sequence, transcription factor binding sequence characteristics and transcription factor structure for several Pfam families, revealing the importance of utilizing deep learning and protein-DNA complex structure in this important computational biology problem. In this way, it is tested whether DeepBind can distinguish the differences between the binding sets of ChIP-Seq data.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:28:34Z (GMT). No. of bitstreams: 1 ntu-108-R06631015-1.pdf: 2672703 bytes, checksum: d440673082e1795b88cb79a2a1f7707d (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii 目錄 v 圖目錄 vii 表目錄 ix 第一章背景 1 第二章文獻探討 3 2.1 分子生物學的中心法則 (Central Dogma of Molecular Biology) 3 2.2 轉錄因子 (Transcription Factor) 4 2.2.1 DNA結合域 (DNA Binding Domain, DBD) 4 2.2.2 轉錄因子結合位 (Transcription Factor Binding Site) 4 2.3 染色質免疫沉澱定序(ChIP-Seq) 6 2.4 資料庫介紹 6 2.4.1 ENCODE 資料庫 6 2.4.2 TRANSFAC資料庫 7 2.4.3 PDB資料庫 7 2.5 結合序列特徵提取與比較 8 2.5.1 DeepBind 10 2.6 蛋白質結構比較 10 第三章研究方法 12 3.1 結合序列特徵分析 12 3.2 主要實驗流程 13 3.2.1 資料庫搜尋 15 3.2.2 資料前處理 16 3.2.3 轉錄因子結合位相似度比較 17 3.2.4 蛋白質結構相似度比較 17 3.2.5 DBD序列相似度比較 17 3.2.6 結果相關性分析及整理 18 3.3 分析資料集 18 第四章結果與討論 19 4.1 結合序列特徵 19 4.2 序列結合特徵的相似度與結構相似度分析 24 第五章結論 34 參考文獻 36
dc.language.iso	zh-TW
dc.subject	深度學習	zh_TW
dc.subject	轉錄因子DNA結合域	zh_TW
dc.subject	ChIP-Seq	zh_TW
dc.subject	蛋白質結構相似度	zh_TW
dc.subject	結合序列相似度	zh_TW
dc.subject	deep learning	en
dc.subject	DNA binding domain	en
dc.subject	ChIP-Seq	en
dc.subject	protein structure similarity	en
dc.subject	binding sequence similarity	en
dc.title	結合ChIP-Seq和蛋白質結構分析蛋白質序列、結構和DNA結合序列特徵之相關性	zh_TW
dc.title	Analysis of protein sequence, structure and DNA binding motifs by incorporating ChIP-Seq and protein structure data	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	歐陽彥正(Yan-Jheng Ou Yang),吳君泰(June-Tai Wu)
dc.subject.keyword	轉錄因子DNA結合域,ChIP-Seq,蛋白質結構相似度,結合序列相似度,深度學習,	zh_TW
dc.subject.keyword	DNA binding domain,ChIP-Seq,protein structure similarity,binding sequence similarity,deep learning,	en
dc.relation.page	37
dc.identifier.doi	10.6342/NTU201903464
dc.rights.note	未授權
dc.date.accepted	2019-08-19
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	生物產業機電工程學研究所	zh_TW
Appears in Collections:	生物機電工程學系

Files in This Item:

File	Size	Format
ntu-108-1.pdf Restricted Access	2.61 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets