Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生命科學院
  3. 基因體與系統生物學學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67491
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊安綏(An-Suei Yang)
dc.contributor.authorPei-Hsuan Chenen
dc.contributor.author陳佩萱zh_TW
dc.date.accessioned2021-06-17T01:34:32Z-
dc.date.available2018-07-31
dc.date.copyright2017-08-29
dc.date.issued2017
dc.date.submitted2017-08-01
dc.identifier.citation1. Ward, L.D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 30, 1095-106 (2012).
2. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-91 (2016).
3. Genomes Project, C. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061-73 (2010).
4. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073-81 (2009).
5. Choi, Y., Sims, G.E., Murphy, S., Miller, J.R. & Chan, A.P. Predicting the functional effect of amino acid substitutions and indels. PLoS One 7, e46688 (2012).
6. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat Methods 7, 248-9 (2010).
7. Li, B. et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25, 2744-50 (2009).
8. Baugh, E.H. et al. Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Res 44, 2501-13 (2016).
9. Zhou, H., Gao, M. & Skolnick, J. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures. PLoS One 11, e0150965 (2016).
10. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310-5 (2014).
11. Swan, A.L. et al. A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genomics 16 Suppl 1, S2 (2015).
12. Diaz-Uriarte, R. & Alvarez de Andres, S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006).
13. Tang, H. & Thomas, P.D. Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 203, 635-47 (2016).
14. Todorovic, V. Genetics. Predicting the impact of genomic variation. Nat Methods 13, 203 (2016).
15. Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48, 214-20 (2016).
16. Ng, P.C. & Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7, 61-80 (2006).
17. Chen, C.T. et al. Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces. PLoS One 7, e37706 (2012).
18. Tsai, K.C. et al. Prediction of carbohydrate binding sites on protein surfaces with 3-dimensional probability density distributions of interacting atoms. PLoS One 7, e40846 (2012).
19. Jian, J.W. et al. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms. PLoS One 11, e0160315 (2016).
20. Wu, C.H. et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34, D187-91 (2006).
21. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235-42 (2000).
22. Fiser, A. & Sali, A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 374, 461-91 (2003).
23. Eswar, N. et al. Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics Chapter 5, Unit 5 6 (2006).
24. Pieper, U. et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 32, D217-22 (2004).
25. Schwede, T., Kopp, J., Guex, N. & Peitsch, M.C. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res 31, 3381-5 (2003).
26. Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res 33, W382-8 (2005).
27. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
28. Burges, C.J.C. A tutorial on Support Vector Machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121-167 (1998).
29. Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27 (2011).
30. Grimm, D.G. et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 36, 513-23 (2015).
31. Kuhn, M. Caret package. (2008).
32. Fawcett, T. An introduction to ROC analysis. Pattern recognition letters 27, 861-874 (2006).
33. Davis, J. et al. View Learning for Statistical Relational Learning: With an Application to Mammography. in IJCAI 677-683 (Citeseer, 2005).
34. Davis, J. & Goadrich, M. The relationship between Precision-Recall and ROC curves. in Proceedings of the 23rd international conference on Machine learning 233-240 (ACM, 2006).
35. Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one 10, e0118432 (2015).
36. Noble, W.S. What is a support vector machine? Nat Biotechnol 24, 1565-7 (2006).
37. Meka, H., Werner, F., Cordell, S.C., Onesti, S. & Brick, P. Crystal structure and RNA binding of the Rpb4/Rpb7 subunits of human RNA polymerase II. Nucleic Acids Res 33, 6435-44 (2005).
38. Echwald, S.M. et al. Identification of four amino acid substitutions in hexokinase II and studies of relationships to NIDDM, glucose effectiveness, and insulin sensitivity. Diabetes 44, 347-53 (1995).
39. Libbrecht, M.W. & Noble, W.S. Machine learning applications in genetics and genomics. Nat Rev Genet 16, 321-32 (2015).
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67491-
dc.description.abstract隨著高通量技術的發展,以及各個不同定序計畫產生的序列變異的數量逐漸增加,如何應用電腦計算方法來協助解釋這些序列變異,成為大家所關注的研究議題。在現存的方法中,大多利用以序列或結構為基礎的資訊來檢測這些序列變異的影響,並且他們試著解釋這些變異在蛋白質功能的破壞或者疾病致病性上的影響是什麼。在本篇研究中,我們整合以序列為基礎的資訊以及ISMBLab功能性區域預測方法,來辨識會破壞蛋白質功能的有害的胺基酸取代。我們從VIPUR預測工具的訓練集中,蒐集8,884個蛋白質變異來建構出一個SVM的分類器,而這些蛋白質變異皆是已經有明確的實驗上證明它是否會破壞蛋白質功能的變異。從結果中可以得知,我們的分類器能夠可以推展運用至其他物種預測上,且在ROC及PR曲線下的面積皆能得到更好的數值。若和其他方法做比較的話,對於人類變異的測試資料集,我們的分類器可以得到0.405的Matthews相關係數。總結,我們提出一個整合以結構為基礎的功能區域預測方法,可以來預測胺基酸取代對於蛋白質功能的影響,另外也能夠證明衍生自ISMBLab功能區域預測的特徵值對於蛋白質變異的預測是有幫助的。zh_TW
dc.description.abstractAs high-throughput techniques advance and massive sequence variation data is generated by different sequencing projects, the application of computational methods to annotate these variations tends to be an issue of concern. Existing methods exploit sequence-based or structure-based information to interpret the effects of variations and most of them correlate the effects with the functional disruption of a protein or the disease pathogenicity. Here we present a method that integrates sequence-based information and ISMBLab functional site prediction to identify the deleterious amino acid substitutions which disrupt the functions of proteins. In this work, we collect 8,884 protein variants from VIPUR training set, which have clear experimental evidences on the disruptions of protein functions, to train a SVM classifier. The results show that our classifier can generalize to other organism with better values of the area under ROC and PR curves. Compare to other methods, the Matthews correlation coefficients for human variants testing set is 0.405. In summary, we provide an incorporating structure-based functional site prediction method to predict the effects of amino acid substitutions on protein functions, and prove that features derived from ISMBLab functional site prediction are useful for predicting protein variations.en
dc.description.provenanceMade available in DSpace on 2021-06-17T01:34:32Z (GMT). No. of bitstreams: 1
ntu-106-R04B48002-1.pdf: 2568087 bytes, checksum: 2e427ce63e7716b7459418b7d5049d04 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
ABSTRACT iv
LIST OF FIGURES vii
LIST OF TABLES viii
CHAPTER 1 Introduction 1
1.1 Annotation of genetic variations 1
1.2 Researches using machine learning algorithms 2
1.3 Research hypothesis 4
CHAPTER 2 Materials and Methods 6
2.1 Benchmark dataset 6
2.2 Characterize protein variants 7
2.2.1 Sequence-based features 7
2.2.2 Structure-based features 9
2.3 Training a classifier by support vector machine (SVM) 10
2.4 Independent set for comparisons 11
2.5 Feature selection by recursive feature elimination 12
2.6 Prediction capacity 12
CHAPTER 3 Results 15
3.1 Correlations between features and the label 15
3.2 Performance evaluation of features 16
3.3 Comparison ISMBLab* to other classifiers 19
3.4 Visualization of data distribution 21
3.5 Feature selection by recursive feature elimination 26
3.6 Detailed consequence and annotation in human variants 31
CHAPTER 4 Discussions 36
CHAPTER 5 Conclusions 39
CHAPTER 6 References 41
dc.language.isoen
dc.subject胺基酸取代zh_TW
dc.subject註解變異zh_TW
dc.subject支持向量機zh_TW
dc.subject機器學習zh_TW
dc.subject有害變異zh_TW
dc.subject蛋白質變異zh_TW
dc.subjectsupport vector machineen
dc.subjectamino acid substitutionen
dc.subjectprotein variationen
dc.subjectdeleterious variationen
dc.subjectmachine learningen
dc.subjectannotating variationen
dc.title結合以結構為基礎的功能區域預測方法預測基因體中蛋白質轉譯區段的有害變異zh_TW
dc.titleIncorporating structure-based functional site prediction in predicting deleterious protein coding region variation in human genome.en
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee蔡懷寬(Huai-Kuang Tsai),陳倩瑜(Chien-Yu Chen),許世宜(Sheh-Yi Sheu)
dc.subject.keyword註解變異,胺基酸取代,蛋白質變異,有害變異,機器學習,支持向量機,zh_TW
dc.subject.keywordannotating variation,amino acid substitution,protein variation,deleterious variation,machine learning,support vector machine,en
dc.relation.page43
dc.identifier.doi10.6342/NTU201702281
dc.rights.note有償授權
dc.date.accepted2017-08-02
dc.contributor.author-college生命科學院zh_TW
dc.contributor.author-dept基因體與系統生物學學位學程zh_TW
顯示於系所單位:基因體與系統生物學學位學程

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
2.51 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved