結合ChIP-Seq和蛋白質結構分析蛋白質序列、結構和DNA結合序列特徵之相關性

Wen-Ting Wang; 王文廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200

標題:	結合ChIP-Seq和蛋白質結構分析蛋白質序列、結構和DNA結合序列特徵之相關性 Analysis of protein sequence, structure and DNA binding motifs by incorporating ChIP-Seq and protein structure data
作者:	Wen-Ting Wang 王文廷
指導教授:	陳倩瑜(Chien-Yu Chen)
關鍵字:	轉錄因子DNA結合域,ChIP-Seq,蛋白質結構相似度,結合序列相似度,深度學習, DNA binding domain,ChIP-Seq,protein structure similarity,binding sequence similarity,deep learning,
出版年 :	2019
學位:	碩士
摘要:	分子生物學中心法則的大意是：去氧核醣核酸(DNA)製造核醣核酸(RNA)，RNA製造蛋白質。而蛋白質會輔助上述兩項流程，其中轉錄因子與DNA的結合是基因調控的主要環節，進而調控細胞的不同表現，也因此轉錄因子會與哪些轉錄因子結合位結合，是問題的重點。近年來，蛋白質與DNA共存的結構資料日益增加，給了我們許多關於DNA與蛋白質交互作用的資訊；然而，透過觀察可以得到DNA與蛋白質間的交互作用並非簡單的一對一的鹼基與殘基關係，還需要考量到三維幾何結構上變化。本實驗室過去發表的PiDNA工具，針對PDB (Protein Data Bank, PDB)資料庫中的蛋白質-DNA複合物結構，進行DNA結合序列特徵的預測，提供結構與序列之間關連性。近年來，基於機器學習領域的蓬勃發展，同時生物資訊學領域的複雜性也讓資訊學家們深感興趣，便有了一系列深度學習於生物資訊領域的應用。其中，DeepBind使用了卷積神經網路 (CNN) 進行單一轉錄因子與DNA 序列的結合預測，其預測的準確度超越過去的其他預測工具，DeepBind的成功證明了使用深度學習能夠解決抓取結合序列特徵的問題。本研究中將選擇ENCODE資料庫的染色體免疫沉澱定序資料(Chromatin Immunoprecipitation Sequencing, ChIP-Seq)作為DNA序列資料輸入，並使用從PDB資料庫收集而得的蛋白質序列-DNA複合物結構資料，抓取蛋白質與DNA的結合序列特徵與結構相似程度，進一步分析在同一個PFam家族中的轉錄因子DNA結合域序列、轉錄因子結合序列特徵與轉錄因子結構之間的關聯性，並藉此來檢驗DeepBind是否能夠更好的辨別ChIP-Seq資料的結合集之間的異同。 The binding of transcription factors to DNA is the main process of gene regulation. Transcription factors will bind to their binding sites, which is the focus of the problem. In recent years, the increasing structural data of protein and DNA complexes, giving us information about the interaction between DNA and protein. PiDNA, previously developed by our lab, used the structure of protein and DNA complexes in the PDB (Protein Data Bank) database to predict binding motifs. On the other hand, based on the advance of deep learning, information scientists have applied deep learning to many applications in the field of Bioinformatics. DeepBind used CNN (Convolution Neural Network) to demonstrate that the DNA sequence has binding characteristics that can be recognized by specific proteins. The success of DeepBind revealed the value of deep learning in characterizing binding sequences. In this study, ChIP-Seq (Chromatin Immunoprecipitation Sequencing) from the ENCODE database was collected as the DNA sequence data input, and the protein sequence and structural data collected from the PDB database were used to capture the binding sequence characteristics of DNA of proteins in the same family. This study further analyzed the relationship between DNA binding domain sequence, transcription factor binding sequence characteristics and transcription factor structure for several Pfam families, revealing the importance of utilizing deep learning and protein-DNA complex structure in this important computational biology problem. In this way, it is tested whether DeepBind can distinguish the differences between the binding sets of ChIP-Seq data.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21200
DOI:	10.6342/NTU201903464
全文授權:	未授權
顯示於系所單位：	生物機電工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	2.61 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。