利用深度學習建構跨細胞株模型預測增強子之細胞株特異性活性

Yi-An Tung; 童翊安

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15348

標題:	利用深度學習建構跨細胞株模型預測增強子之細胞株特異性活性 Predicting cell type-specific enhancer activities by cross-cell type modeling with deep learning
作者:	Yi-An Tung 童翊安
指導教授:	陳倩瑜(Chien-Yu Chen)
共同指導教授:	歐陽彥正(Yen-Jen Oyang)
關鍵字:	機器學習,深度學習,增強子,基因調控網路,跨細胞株預測,增強子預測模型, accuEnhancer,Machine learning,deep learning,enhancer,Gene regulatory network,cross-cell type prediction,
出版年 :	2020
學位:	博士
摘要:	增強子是一類重要的調節元件，過去許多研究中已顯示出增強子是輔助啟動子調節細胞基因表達的關鍵角色。目前，人類基因體中，增強子的數目及其在不同細胞中的活性，仍存在有許多未知。在過往的研究發現，增強子的活性與一些功能性的資料相關，例如：組蛋白修飾、序列特徵以及染色質的結構與開合程度等等。在本論文中，我們主要利用DNase以及其他組蛋白修飾數據建立了深度學習模型，並且以H3K27ac峰值作為所選細胞類型中的增強子位置進行訓練與預測。此外，本研究還設計了結合多種細胞類型的聯合訓練(Joint training)，用以提高模型預測性能。透過我們所提出的深度學習模型accuEnhancer，我們展示了利用完整特徵資料集以及深度學習於預測單一種類細胞株內增強子活性的可行性，其準確性和F1可以達到0.97和0.9。為了更進一步進行跨細胞株的增強子活性預測，本論文提出通過整合來自不同細胞類型的數據來提高跨細胞類型預測的性能。隨著結合來自不同細胞類型的更多訓練數據，預測獨立細胞類型的F1從0.3上升至0.80。結果表明，通過合併更多的跨細胞類型的數據集，深度學習模型可以捕獲複雜的調控模式並提供更好的性能。最後，本研究測試了accuEnhancer模型在預測VISTA實驗驗證的增強子數據庫的有效性。結果顯示accuEnhancer在預測經過實驗驗證的增強子能勝過前人的其他方法，本論文因此探討了跨細胞株，乃至於跨物種預測的可行性。 Enhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers in the human genome as well as their activities in different cell types are still largely unknown. Previous studies have shown that enhancer activities are associated with some functional data, such as histone modification, sequence motifs, and chromatin accessibilities. This study utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers. Moreover, this thesis proposed joint training of multiple cell types to boost the model performance. The analyses conducted in this thesis first demonstrated the general feasibility of accuEnhancer to predict within-cell type enhancer activities, where the accuracy and the F1 score can achieve 0.97 and 0.9, respectively. To further predict cell type-specific enhancers by cross-cell type modeling, we integrated the training data from different cell types to boost the model performance. The F1 score increased from 0.3 to 0.80 as the model combined more training data from different cell types. The results demonstrated that by incorporating more datasets across cell types, the complex regulatory patterns could be captured by the deep neural networks to deliver better performances. Lastly, this study tested the effectiveness of the model on predicting experimentally validated enhancers in the VISTA database. The results indicated that accuEnhancer outperforms the previous works in predicting cell type-specific enhancer activities by cross-cell type modeling.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15348
DOI:	10.6342/NTU202001200
全文授權:	未授權
顯示於系所單位：	基因體與系統生物學學位學程

文件中的檔案：

檔案	大小	格式
U0001-2906202023270600.pdf 未授權公開取用	12.39 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。