請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7838
標題: | 利用序列特徵提升使用染色質免疫沉澱定序平台預測轉錄因子結合位點之準確度 Incorporating sequence motifs to improve accuracy of predicting transcription factor binding sites using ChIP-seq data |
作者: | Ping-Cheng Wu 吳秉承 |
指導教授: | 陳倩瑜 |
關鍵字: | 轉錄因子,轉錄因子結合位,模序探勘,染色質免疫沉澱,結合位特徵, Transcription factor,transcription factor binding site,motif discovery,Chromatin immunoprecipitation sequencing, |
出版年 : | 2016 |
學位: | 碩士 |
摘要: | 染色質免疫沉澱定序技術,是用來尋找特定蛋白,例如轉錄因子,與其調控的基因一種方法,藉由這種技術我們可以大略的知道轉錄因子在人體DNA片段上的位置,然而這些被找到的轉錄因子結合位點的準確率尚未曾有研究做過系統性的討論。因此,本論文裡用TRANSFAC資料庫提供之已知的轉錄因子結合位點,針對染色質免疫沉澱技術鑑定之不同信心程度(FDR)下的轉錄因子結合位點進行整體性的預測表現評估,並輔以序列特徵資訊,增進其預測準確度。本論文使用了ENCODE資料庫的染色免疫沉澱資料來進行分析,且挑選了擁有不同細胞株的轉錄因子來做比較,整體而言,各個細胞株的結果顯示,經由ChIP-seq鑑定出的峰值區中,約六成會包含至少一個該特定轉錄因子的轉錄因子結合位。此外,本論文發現利用模序探勘所得之序列特徵結合ChIP-seq峰值區的資訊去預測轉錄因子結合位,經觀察確實可增加預測轉錄因子結合位的準確率,然而,使用不同FDR信心程度與不同的序列特徵,將會影響轉錄因子結合位的準確率。本論文之研究結果點出單純使用染色質免疫沉澱技術預測轉錄因子結合位點的缺陷,並提出序列特徵有助於改善預測結果,而可作為未來相關生物資訊預測方法之重要基礎。 Transcription factors (TF) regulate gene expression in living organisms and influence multiple biological processes. Chromatin immunoprecipitation sequencing (ChIP-seq) is a technology that have been widely used to find transcription factor binding sites (TFBSs) of a specific TF among the DNA sequences of a genome. However, the accuracy of the TFBSs identified by ChIP-seq has not been systematically evaluated. In this regard, this thesis utilized TFBS information provided by the TRANSFAC database to validate the TFBSs identified by using ChIP-seq only with multiple false discovery rate (FDR). Moreover, in this thesis, a method incorporating de novo motif discovery was proposed to improve the performance of the predicted TFBSs. ChIP-seq data sampled from different cell lines was collected from ENCODE database. In general, ~60% of the peak regions identified by using the ChIP-seq only with a strict FDR cutoff (FDR = 0) contained at least one TFBS of the specific TF across multiple cell lines. In addition, by our proposed method, the prediction accuracy was improved and better than the results using ChIP-seq alone, though it was observed that the improved levels were affected by the used FDR cutoffs and discovered motifs. In conclusion, this thesis identified the accuracy problem of the ChIP-seq platform by observing from the data in a large scale, and address this issue by proposing a method incorporating de novo motif discovery. The observed results can serve as an important foundation for developing bioinformatics tools on TFBS prediction in future. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7838 |
DOI: | 10.6342/NTU201603094 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 生物機電工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf | 1.93 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。