Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 生物機電工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳倩瑜(Chien-Yu Chen)
dc.contributor.authorWen-Ting Wangen
dc.contributor.author王文廷zh_TW
dc.date.accessioned2021-06-08T03:28:34Z-
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-19
dc.identifier.citation1. Lin, C.-K. and C.-Y. Chen, PiDNA: predicting protein–DNA interactions with structural models. Nucleic Acids Research, 2013. 41(W1): p. W523-W530.
2. Bailey, T.L., et al., MEME Suite: tools for motif discovery and searching. Nucleic Acids Research, 2009. 37(suppl_2): p. W202-W208.
3. Matys, V., et al., TRANSFAC ® : transcriptional regulation, from patterns to profiles. Nucleic Acids Research, 2003. 31(1): p. 374-378.
4. Alipanahi, B., et al., Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nature Biotechnology, 2015. 33: p. 831.
5. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 2004. 306(5696): p. 636.
6. Park, P.J., ChIP–seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 2009. 10: p. 669.
7. Bernstein, F.C., et al., The protein data bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 1977. 112(3): p. 535-542.
8. McGinnis, S. and T.L. Madden, BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 2004. 32(suppl_2): p. W20-W25.
9. Zhang, Y. and J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 2005. 33(7): p. 2302-2309.
10. Crick, F., Central Dogma of Molecular Biology. Nature, 1970. 227(5258): p. 561-563.
11. Gonzalez, D.H., Introduction to transcription factor structure and function, in Plant Transcription Factors. 2016, Elsevier. p. 3-11.
12. Hollenhorst, P.C., L.P. McIntosh, and B.J. Graves, Genomic and Biochemical Insights into the Specificity of ETS Transcription Factors. Annual Review of Biochemistry, 2011. 80(1): p. 437-471.
13. Hsu, C.-M., C.-Y. Chen, and B.-J. Liu, WildSpan: mining structured motifs from protein sequences. Algorithms for Molecular Biology, 2011. 6(1): p. 6.
14. Lis, M. and D. Walther, The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics, 2016. 17(1): p. 185.
15. Bank, P.D., Protein data bank. Nature New Biol, 1971. 233: p. 223.
16. Pietrokovski, S., Searching Databases of Conserved Sequence Regions by Aligning Protein Multiple-Alignments. Nucleic Acids Research, 1996. 24(19): p. 3836-3845.
17. Wang, T. and G.D. Stormo, Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 2003. 19(18): p. 2369-2380.
18. Schones, D.E., P. Sumazin, and M.Q. Zhang, Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics, 2004. 21(3): p. 307-313.
19. Gupta, S., et al., Quantifying similarity between motifs. Genome biology, 2007. 8(2): p. R24.
20. Skolnick, J., J.S. Fetrow, and A. Kolinski, Structural genomics and its importance for gene function analysis. Nature biotechnology, 2000. 18(3): p. 283.
21. Baker, D. and A. Sali, Protein structure prediction and structural genomics. Science, 2001. 294(5540): p. 93-96.
22. Zhang, Y. and J. Skolnick, Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 2004. 57(4): p. 702-710.
23. Levitt, M. and M. Gerstein, A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of sciences, 1998. 95(11): p. 5913-5920.
24. Gordân, R., et al., Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell reports, 2013. 3(4): p. 1093-1104.
25. R Development Core Team, R., R: A language and environment for statistical computing. 2011, R foundation for statistical computing Vienna, Austria.
26. Xu, J. and Y. Zhang, How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics, 2010. 26(7): p. 889-895.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200-
dc.description.abstract分子生物學中心法則的大意是:去氧核醣核酸(DNA)製造核醣核酸(RNA),RNA製造蛋白質。而蛋白質會輔助上述兩項流程,其中轉錄因子與DNA的結合是基因調控的主要環節,進而調控細胞的不同表現,也因此轉錄因子會與哪些轉錄因子結合位結合,是問題的重點。近年來,蛋白質與DNA共存的結構資料日益增加,給了我們許多關於DNA與蛋白質交互作用的資訊;然而,透過觀察可以得到DNA與蛋白質間的交互作用並非簡單的一對一的鹼基與殘基關係,還需要考量到三維幾何結構上變化。本實驗室過去發表的PiDNA工具,針對PDB (Protein Data Bank, PDB)資料庫中的蛋白質-DNA複合物結構,進行DNA結合序列特徵的預測,提供結構與序列之間關連性。近年來,基於機器學習領域的蓬勃發展,同時生物資訊學領域的複雜性也讓資訊學家們深感興趣,便有了一系列深度學習於生物資訊領域的應用。其中,DeepBind使用了卷積神經網路 (CNN) 進行單一轉錄因子與DNA 序列的結合預測,其預測的準確度超越過去的其他預測工具,DeepBind的成功證明了使用深度學習能夠解決抓取結合序列特徵的問題。
本研究中將選擇ENCODE資料庫的染色體免疫沉澱定序資料(Chromatin Immunoprecipitation Sequencing, ChIP-Seq)作為DNA序列資料輸入,並使用從PDB資料庫收集而得的蛋白質序列-DNA複合物結構資料,抓取蛋白質與DNA的結合序列特徵與結構相似程度,進一步分析在同一個PFam家族中的轉錄因子DNA結合域序列、轉錄因子結合序列特徵與轉錄因子結構之間的關聯性,並藉此來檢驗DeepBind是否能夠更好的辨別ChIP-Seq資料的結合集之間的異同。
zh_TW
dc.description.abstractThe binding of transcription factors to DNA is the main process of gene regulation. Transcription factors will bind to their binding sites, which is the focus of the problem. In recent years, the increasing structural data of protein and DNA complexes, giving us information about the interaction between DNA and protein. PiDNA, previously developed by our lab, used the structure of protein and DNA complexes in the PDB (Protein Data Bank) database to predict binding motifs. On the other hand, based on the advance of deep learning, information scientists have applied deep learning to many applications in the field of Bioinformatics. DeepBind used CNN (Convolution Neural Network) to demonstrate that the DNA sequence has binding characteristics that can be recognized by specific proteins. The success of DeepBind revealed the value of deep learning in characterizing binding sequences.
In this study, ChIP-Seq (Chromatin Immunoprecipitation Sequencing) from the ENCODE database was collected as the DNA sequence data input, and the protein sequence and structural data collected from the PDB database were used to capture the binding sequence characteristics of DNA of proteins in the same family. This study further analyzed the relationship between DNA binding domain sequence, transcription factor binding sequence characteristics and transcription factor structure for several Pfam families, revealing the importance of utilizing deep learning and protein-DNA complex structure in this important computational biology problem. In this way, it is tested whether DeepBind can distinguish the differences between the binding sets of ChIP-Seq data.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T03:28:34Z (GMT). No. of bitstreams: 1
ntu-108-R06631015-1.pdf: 2672703 bytes, checksum: d440673082e1795b88cb79a2a1f7707d (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents誌謝 i
摘要 ii
Abstract iii
目錄 v
圖目錄 vii
表目錄 ix
第一章 背景 1
第二章 文獻探討 3
2.1 分子生物學的中心法則 (Central Dogma of Molecular Biology) 3
2.2 轉錄因子 (Transcription Factor) 4
2.2.1 DNA結合域 (DNA Binding Domain, DBD) 4
2.2.2 轉錄因子結合位 (Transcription Factor Binding Site) 4
2.3 染色質免疫沉澱定序(ChIP-Seq) 6
2.4 資料庫介紹 6
2.4.1 ENCODE 資料庫 6
2.4.2 TRANSFAC資料庫 7
2.4.3 PDB資料庫 7
2.5 結合序列特徵提取與比較 8
2.5.1 DeepBind 10
2.6 蛋白質結構比較 10
第三章 研究方法 12
3.1 結合序列特徵分析 12
3.2 主要實驗流程 13
3.2.1 資料庫搜尋 15
3.2.2 資料前處理 16
3.2.3 轉錄因子結合位相似度比較 17
3.2.4 蛋白質結構相似度比較 17
3.2.5 DBD序列相似度比較 17
3.2.6 結果相關性分析及整理 18
3.3 分析資料集 18
第四章 結果與討論 19
4.1 結合序列特徵 19
4.2 序列結合特徵的相似度與結構相似度分析 24
第五章 結論 34
參考文獻 36
dc.language.isozh-TW
dc.subject深度學習zh_TW
dc.subject轉錄因子DNA結合域zh_TW
dc.subjectChIP-Seqzh_TW
dc.subject蛋白質結構相似度zh_TW
dc.subject結合序列相似度zh_TW
dc.subjectdeep learningen
dc.subjectDNA binding domainen
dc.subjectChIP-Seqen
dc.subjectprotein structure similarityen
dc.subjectbinding sequence similarityen
dc.title結合ChIP-Seq和蛋白質結構分析蛋白質序列、結構和DNA結合序列特徵之相關性zh_TW
dc.titleAnalysis of protein sequence, structure and DNA binding motifs by incorporating ChIP-Seq and protein structure dataen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee歐陽彥正(Yan-Jheng Ou Yang),吳君泰(June-Tai Wu)
dc.subject.keyword轉錄因子DNA結合域,ChIP-Seq,蛋白質結構相似度,結合序列相似度,深度學習,zh_TW
dc.subject.keywordDNA binding domain,ChIP-Seq,protein structure similarity,binding sequence similarity,deep learning,en
dc.relation.page37
dc.identifier.doi10.6342/NTU201903464
dc.rights.note未授權
dc.date.accepted2019-08-19
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept生物產業機電工程學研究所zh_TW
顯示於系所單位:生物機電工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
2.61 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved