請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70498
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳倩瑜 | |
dc.contributor.author | Rou-An Shen | en |
dc.contributor.author | 沈柔安 | zh_TW |
dc.date.accessioned | 2021-06-17T04:29:32Z | - |
dc.date.available | 2023-08-14 | |
dc.date.copyright | 2018-08-14 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-13 | |
dc.identifier.citation | [1] 2017. Ministry of Health and Welfare.Available at: https://dep.mohw.gov.tw/DOS/np-1714-113.html
[2] 2018.Taiwan Cancer Registry.Available at: http://tcr.cph.ntu.edu.tw [3] Babak Alipanahi. Andrew Delong; Matthew T Weirauch & Brendan J Frey, Nature Biotechnology 33, 2015, p.831–838. [4] Crick, F.H., J.S. Griffith, and L.E. Orgel. 1957. Codes without commas.Proceedings of the National Academy of Science of the United States of America.43(5).416 [5] Crick, F. 1970 Central dogma of molecular biology. Nature 27(5258)51-563 [6] The musings and ravings of a computational biologist about science, computers, music and, you know, stuff. 01 April 2011 Available at: http://bytesizebio.net/2011/04/01/reverse-translation-discovered/ [7]The GTEx Consortium. The Genotype-Tissue Expression(GTEx) project. NIH Public Access,2013,45(6):580-585 [8] G.D. Winter.2011. Three Waves Of Innovation In Vertebrate Evolution.Available at: http://www.science20.com/curious_cub [9] Warren S. McCulloch and W.Pitts, A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 1943. 5(4):p.115-133 [10] Rosenblatt, F., The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,1958.65(6):p386. [11] 蘇紹安,2003,「應用倒傳第神經網路在颱風坡浪預報之研究」,國立台灣大學工程科學與海洋工程學系碩士論文:台北。 [12] Rumelhart, D.E, G.E. Hinton, R. J. Williams. Learning representations by back-propagating errors. Cognitive modeling, 1998.5(3):p.1. [13] Hubel,D.H., T. N. Wiesel, Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology, 1968. 195(1):p.215-241. [14] Fukushima, K., Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition. Biological cybernetics, 1980. 36(4):p. 193-202. [15] Lecun, Y. ,et al., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 86(11):p.2278-2324. [16] Ioffe, S. and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [17] Institute, N.H.G.R.2011.ENDCODE: Encyclopedia of DNA Elements-ENDCODE. 12 October 2015. Available at : https://www.encodeproject.org/ [18] Aaron R, Quinlan and Neil Kindlon.2009. The BEDTools suite.Available at : http://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70498 | - |
dc.description.abstract | 基因調控的研究有廣泛的生物學意義,是發生在遺傳學和分子遺傳學的重要研究領域。基因調控結果,造就了不同物種,或是同一物種但不同個體之間的差異,也可以說是生物體內控制基因表達的機制。基因調控可分為很多部分,其中以轉錄因子與轉錄因子結合位的交互作用下,使得其附近的基因表現被活化或抑制與否這個議題最為關切,本論文透過深度學習架構,對於轉錄因子在特定細胞形態上的轉錄因子結合位進行特徵學習預測評估,增進其預測精準度,並且建立一套便於查詢的資料庫系統。
本論文使用ENCODE資料庫的染色體免疫沉澱定序(ChIP-seq)資料進行分析,且挑選K562細胞株進行學習與預測,透過染色質免疫沉澱定序技術用來尋找特定蛋白質與其調控的基因方法,藉由此技術我們可以大略知道轉錄因子在人體DNA片段上的位置。本論文透過染色體免疫沉澱定序資料再加上針對K562細胞株建立多套深度學習之卷積神經網路的預測模型,準確預測出轉錄因子在特定細胞形態的轉錄因子結合位,並應用該模型預測序列變異對轉錄因子結合親和力之影響。資料庫的建立有助於使用者省去將轉錄因子結合特徵放入全基因組中比對的時間,能利用輸入特定染色體之位置查詢到有可能影響基因調控結果的轉錄因子,並作為特定疾病基因檢測之重要步驟,本論文結果將做為未來相關生物資訊對於轉錄因子結合位相關研究與應用之重要基礎。 | zh_TW |
dc.description.abstract | The study of gene regulation has a wide range of biological significance and is an important research topic in the field of genetics and molecular biology. Gene regulation results in differences between different species, or the same species but different individuals. It can be said is the mechanism that controls gene expression in organisms. Among different kinds of gene regulation activities, many studies focus on the activation or inhibition of a nearby gene resulting from the interaction between transcription factors and their binding sites. This thesis uses a deep learning framework, to evaluate the morphology of transcription factor binding sites on specific cells and improve the prediction accuracy by feature learning. Moreover, a database was established based on this model for easy access of the data.
This thesis uses the chromatin immunoprecipitation sequencing (ChIP-seq) data from the ENCODE database for analysis, and selects K562 cell lines for learning and prediction. Chromatin immunoprecipitation sequencing specialized in finding the binding site of a particular protein on the human DNA fragment, and then it’s gene regulation can be observed. In this thesis, by using the chromosome immunoprecipitation sequencing data and the prediction model of multiple sets of deep learning convolutional neural networks for K562 cell lines, the model used in this study can accurately predict the transcription factor binding site of a specific cell, and the effect of sequence variation on transcription factor binding affinity. Furthermore, the establishment of the database helps the users to save time on comparing the transcription factor binding features with the whole genome, and can the input the position of a specific chromosome to query the transcription factors that may affect the gene regulation results. And as an important step in the detection of disease-specific gene, the results of this thesis will serve as an important basis for future research and application of bioinformatics related to transcription factor binding sites. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T04:29:32Z (GMT). No. of bitstreams: 1 ntu-107-R05631002-1.pdf: 6647787 bytes, checksum: a00fed4cde46fea5641cfd3ffa716326 (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 誌謝 ii
中文摘要 iii 英文摘要 iv 目錄 vi 表目錄 x 第一章 研究目的 1 第二章 文獻探討 3 2.1 分子生物學中心法則 3 2.2 染色質免疫沉澱定序技術 (ChIP-sequencing) 3 2.3 轉錄因子 (Transcription factor) 4 2.3.1 轉錄因子結合位 (Transcription factor binding site) 4 2.4 基因型-組織表達(Genotype-Tissue Expression,GTEx) 5 2.5 細胞株與細胞系 (Cell strain and cell line) 6 2.5.1 K562細胞株資訊 7 2.6 神經網路 (Neural Network) 7 2.6.1 卷積神經網路 (Convolution neural network) 10 2.6.2 卷積層 (Convolution layer) 11 2.6.3 採樣層 (Pooling layer) 13 2.6.4 全連接層 (Fully connected layer) 13 2.6.5 批量正規化 (Batch Normalization) 14 第三章 研究方法 15 3.1 ENCODE 資料庫 15 3.2 實驗流程 16 3.2.1 K562轉錄因子資料搜集 17 3.2.2 資料前處理 18 3.2.3 深度學習模型訓練 21 3.2.4 結果分析 24 3.2.5 建立資料庫 28 第四章 結果與討論 30 4.1 整體實驗結果 30 4.2 與DeepBind模型效能比較 33 4.3 eQTL資料分析結果 36 4.4 峰值序列數量對分類器影響 41 4.5 卷積層對轉錄因子結合分類器影響 43 4.6 批量正規化對轉錄因子結合分類器影響 45 第五章 結論 47 參考文獻 48 | |
dc.language.iso | zh-TW | |
dc.title | 利用深度學習建構K562全基因組轉錄因子結合位特徵 | zh_TW |
dc.title | Constructing K562 transcriptional binding profiles
using deep learning | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 陳沛隆,吳君泰,黃乾綱 | |
dc.subject.keyword | 轉錄因子結合位,結合位特徵,深度學習,卷積神經網路, | zh_TW |
dc.subject.keyword | Transcription factor binding site,Binding sites motif,Deep learning,Convolution neural networks, | en |
dc.relation.page | 49 | |
dc.identifier.doi | 10.6342/NTU201802430 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2018-08-13 | |
dc.contributor.author-college | 生物資源暨農學院 | zh_TW |
dc.contributor.author-dept | 生物產業機電工程學研究所 | zh_TW |
顯示於系所單位: | 生物機電工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 6.49 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。