Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70498
Title: | 利用深度學習建構K562全基因組轉錄因子結合位特徵 Constructing K562 transcriptional binding profiles using deep learning |
Authors: | Rou-An Shen 沈柔安 |
Advisor: | 陳倩瑜 |
Keyword: | 轉錄因子結合位,結合位特徵,深度學習,卷積神經網路, Transcription factor binding site,Binding sites motif,Deep learning,Convolution neural networks, |
Publication Year : | 2018 |
Degree: | 碩士 |
Abstract: | 基因調控的研究有廣泛的生物學意義,是發生在遺傳學和分子遺傳學的重要研究領域。基因調控結果,造就了不同物種,或是同一物種但不同個體之間的差異,也可以說是生物體內控制基因表達的機制。基因調控可分為很多部分,其中以轉錄因子與轉錄因子結合位的交互作用下,使得其附近的基因表現被活化或抑制與否這個議題最為關切,本論文透過深度學習架構,對於轉錄因子在特定細胞形態上的轉錄因子結合位進行特徵學習預測評估,增進其預測精準度,並且建立一套便於查詢的資料庫系統。
本論文使用ENCODE資料庫的染色體免疫沉澱定序(ChIP-seq)資料進行分析,且挑選K562細胞株進行學習與預測,透過染色質免疫沉澱定序技術用來尋找特定蛋白質與其調控的基因方法,藉由此技術我們可以大略知道轉錄因子在人體DNA片段上的位置。本論文透過染色體免疫沉澱定序資料再加上針對K562細胞株建立多套深度學習之卷積神經網路的預測模型,準確預測出轉錄因子在特定細胞形態的轉錄因子結合位,並應用該模型預測序列變異對轉錄因子結合親和力之影響。資料庫的建立有助於使用者省去將轉錄因子結合特徵放入全基因組中比對的時間,能利用輸入特定染色體之位置查詢到有可能影響基因調控結果的轉錄因子,並作為特定疾病基因檢測之重要步驟,本論文結果將做為未來相關生物資訊對於轉錄因子結合位相關研究與應用之重要基礎。 The study of gene regulation has a wide range of biological significance and is an important research topic in the field of genetics and molecular biology. Gene regulation results in differences between different species, or the same species but different individuals. It can be said is the mechanism that controls gene expression in organisms. Among different kinds of gene regulation activities, many studies focus on the activation or inhibition of a nearby gene resulting from the interaction between transcription factors and their binding sites. This thesis uses a deep learning framework, to evaluate the morphology of transcription factor binding sites on specific cells and improve the prediction accuracy by feature learning. Moreover, a database was established based on this model for easy access of the data. This thesis uses the chromatin immunoprecipitation sequencing (ChIP-seq) data from the ENCODE database for analysis, and selects K562 cell lines for learning and prediction. Chromatin immunoprecipitation sequencing specialized in finding the binding site of a particular protein on the human DNA fragment, and then it’s gene regulation can be observed. In this thesis, by using the chromosome immunoprecipitation sequencing data and the prediction model of multiple sets of deep learning convolutional neural networks for K562 cell lines, the model used in this study can accurately predict the transcription factor binding site of a specific cell, and the effect of sequence variation on transcription factor binding affinity. Furthermore, the establishment of the database helps the users to save time on comparing the transcription factor binding features with the whole genome, and can the input the position of a specific chromosome to query the transcription factors that may affect the gene regulation results. And as an important step in the detection of disease-specific gene, the results of this thesis will serve as an important basis for future research and application of bioinformatics related to transcription factor binding sites. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70498 |
DOI: | 10.6342/NTU201802430 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 生物機電工程學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-107-1.pdf Restricted Access | 6.49 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.