利用序列特徵探勘預測酵素催化部位

Ting-Ying Chien; 簡廷因

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41987

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正(Yen-Jen Oyang)
dc.contributor.author	Ting-Ying Chien	en
dc.contributor.author	簡廷因	zh_TW
dc.date.accessioned	2021-06-15T00:40:40Z	-
dc.date.available	2008-09-02
dc.date.copyright	2008-09-02
dc.date.issued	2008
dc.date.submitted	2008-08-26
dc.identifier.citation	1. Friedberg, I. (2006) Automated protein function prediction - the genomic challenge. Briefings in Bioinformatics, 7, 225-242. 2. Chandonia, J.M. and Brenner, S.E. (2006) The impact of structural genomics: Expectations and outcomes. Science, 311, 347-351. 3. Watson, J.D., Laskowski, R.A. and Thornton, J.M. (2005) Predicting protein function from sequence and structural data. Current Opinion in Structural Biology, 15, 275-284. 4. George, R.A., Spriggs, R.V., Bartlett, G.J., Gutteridge, A., MacArthur, M.W., Porter, C.T., Al-Lazikani, B., Thornton, J.M. and Swindells, M.B. (2005) Effective function annotation through catalytic residue conservation. Proceedings of the National Academy of Sciences of the United States of America, 102, 12299-12304. 5. Tian, W.D., Arakaki, A.K. and Skolnick, J. (2004) EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Research, 32, 6226-6239. 6. Kasuya, A. and Thornton, J.M. (1999) Three-dimensional structure analysis of PROSITE patterns. Journal of Molecular Biology, 286, 1673-1691. 7. Torrance, J.W., Bartlett, G.J., Porter, C.T. and Thornton, J.M. (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. Journal of Molecular Biology, 347, 565-581. 8. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M. and Sigrist, C.J.A. (2006) The PROSITE database. Nucleic Acids Research, 34, D227-D230. 9. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, 31, 365-370. 10. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research, 28, 235-242. 11. Porter, C.T., Bartlett, G.J. and Thornton, J.M. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Research, 32, D129-D133. 12. Sheu, S.H., Lancia, D.R., Clodfelter, K.H., Landon, M.R. and Vajda, S. (2005) PRECISE: a database of predicted and consensus interaction sites in enzymes. Nucleic Acids Research, 33, D206-D211. 13. Meng, E.C., Polacco, B.J. and Babbitt, P.C. (2004) Superfamily active site templates. Proteins-Structure Function and Bioinformatics, 55, 962-976. 14. Cover, T.M. and Thomas, J.A. (1991) Elements of Information Theory, New York. 15. Nielsen, M.A. and Chuang, I.L. (2000) Quantum Computation and Quantum Information, UK. 16. Kullback, S. and Leibler, R.A. (1951) On Information and Sufficiency The Annals of Mathematical Statistics, 22, 79-86 17. Capra, J.A. and Singh, M. (2007) Predicting functionally important residues from sequence conservation. Bioinformatics, 23, 1875-1882. 18. Mirny, L.A. and Shakhnovich, E.I. (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol, 291, 177-196. 19. Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 89, 10915-10919. 20. Hsu, C.-M. (2007), Yuan Ze University, Taoyuan. 21. Wei, Y., Ko, J., Murga, L.F. and Ondrechen, M.J. (2007) Selective prediction of interaction sites in protein structures with THEMATICS. Bmc Bioinformatics, 8, -. 22. Ondrechen, M.J., Clifton, J.G. and Ringe, D. (2001) THEMATICS: a simple computational predictor of enzyme function from structure. Proc Natl Acad Sci U S A, 98, 12473-12478. 23. Kaplan, W. and Littlejohn, T.G. (2001) Swiss-PDB Viewer (Deep View). Brief Bioinform, 2, 195-197. 24. Ren, P. and Ponder, J.W. (2003) Polarizable Atomic Multipole Water Model for Molecular Mechanics Simulation. J. Phys. Chem, 107, 5933-5947. 25. Jorgensen, W.L., Chandrasekhar, J. and Madura, J.D. (1983) Comparison of simple potential functions for simulating liquid water. J. Chem. Phys., 79. 26. Madura, J.D., Briggs, J.M., Wade, R.C., Davis, M.E., Luty, B.A., Ilin, A., Antosiewicz, J., Gilson, M.K., Bagheri, B., Scott, L.R. et al. (1995) Simulations with the University of Houston Brownian Dynamics program. Computer Physics Communications, 91, 57-95. 27. Gilson, M.K. (1993) Multiple-site titration and molecular modeling: two rapid methods for computing energies and forces for ionizable groups in proteins. Proteins, 15, 266-282. 28. Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J Mol Biol, 147, 195-197. 29. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389-3402. 30. Higgins, D.G. and Sharp, P.M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73, 237-244. 31. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R. et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. 32. Bartlett, G.J., Porter, C.T., Borkakoti, N. and Thornton, J.M. (2002) Analysis of catalytic residues in enzyme active sites. Journal of Molecular Biology, 324, 105-121.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41987	-
dc.description.abstract	大規模地以非人工的方式註解蛋白質的功能或序列特徵(signature)，在後基因時代仍然是一項大挑戰，在此論文中，我們利用蛋白質的序列特徵設計一個預測方法，預測酵素序列的催化部位(catalytic sites)。我們的方法利用模體(motif)探勘的方式產生蛋白質序列特徵，每個序列特徵包含了幾個重要的殘基區塊，這些區塊也稱為保留性區塊(conserved segments)，這些保留性區塊在同源序列上常常一起出現，它們在演化過程中被小心地保留下來，表示這些區塊有一定的重要性。依照生物實驗結果，酵素的催化殘基通常分散在蛋白質序列的不同區域，因此若要完整的預測催化殘基部位，產生的序列特徵也必須分散在蛋白質序列的不同區域。在本論文中，我們蒐集Catalytic Site Atlas (CSA)資料庫中的催化殘基資訊來評估我們所提出的預測方法之效能。測試結果顯示，我們的方法比PROSITE資料庫中的模板更能夠辨識催化部位和催化殘基。本論文將此研究方法實作成E1DS網站(http://e1ds.csbb.ntu.edu.tw/)，E1DS目前有5421個序列特徵，這些序列特徵總共涵蓋932個4碼EC編號 ( numbers)。平均而言，在預測催化位置上，E1DS的正確率(correct)達到35.5%；成功猜測率(success rate)達到49.6%，而PROSITE的正確率及成功猜測率分別為18.9%及33.7%，在預測催化位置這部分，E1DS的正確率和成功猜測率均表現的比PROSITE理想。在預測催化殘基部分，E1DS的靈敏度(sensitivity)為30.0%，比PROSITE (16.2%)來得要好，但就明確度(specificity)而言，E1DS (96.7%)表現的比PROSITE (98.6%)來得差。	zh_TW
dc.description.abstract	Large-scale automatic annotation for protein sequences remains challenging in post-genomics era. This thesis aims at predicting catalytic sites of enzyme sequences based on a repository of protein signatures. The employed sequence signatures are derived from a motif based method. The blocks of a signature, also called conserved regions, are composed of the key residues found among the homologues. These blocks are conserved during evolution because of their importance in protein functions. Biological experiments reveal that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. To predict catalytic sites comprehensively, it is expected that the employed signatures must contain residues that are largely scattered in sequence. In this regard, we employ a recently developed pattern mining algorithm WildSpan for generating enzyme sequence signatures. WildSpan is well designed for discovering sequence motifs spanning a large number of unimportant positions. To measure the performance of our method, we collect the annotated catalytic sites for 831 enzymes from Catalytic Site Atlas (CSA). The results reveal that our method performs more effectively in identifying catalytic sites and catalytic residues than the patterns derived from PROSITE database. The proposed method has been realized in a web server named E1DS (http://e1ds.csbb.ntu.edu.tw/). E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. In average, on the task of predicting catalytic sites, E1DS achieves a ‘correct’ rate of 35.5% and a ‘success rate’ of 49.6%, while the ‘correct’ and ’success’ rates of using PROSITE patterns are 18.9% and 33.7% respectively. On the other hand, on the task of predicting catalytic residues, the sensitivity rate of E1DS is 30.0%, better than that of PROSITE (16.2%), though the specificity rate of E1DS (96.7%) is slightly worse than that of PROSITE (98.6%).	en
dc.description.provenance	Made available in DSpace on 2021-06-15T00:40:40Z (GMT). No. of bitstreams: 1 ntu-97-R95922108-1.pdf: 1043043 bytes, checksum: 089ea5f2e9d9f16bcf2d78cdd3ee36be (MD5) Previous issue date: 2008	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii 英文摘要 iii 目錄 v 圖目錄 vii 表目錄 viii 第一章緒論 1 第二章相關研究 5 2.1 預測功能殘基 5 2.2 序列比對演算法 13 第三章方法 16 3.0簡介 16 3.1資料蒐集 17 3.2序列特徵建構 18 3.3評估序列特徵 19 3.4預測方法 21 第四章實驗 24 4.1 催化殘基資料集 24 4.2效能評估 26 第五章網站 29 5.1首頁 29 5.2結果頁面 30 5.3錯誤訊息 34 第六章結論 36 參考文獻 37
dc.language.iso	zh-TW
dc.subject	酵素功能	zh_TW
dc.subject	蛋白質序列探勘	zh_TW
dc.subject	催化部位	zh_TW
dc.subject	序列特徵	zh_TW
dc.subject	EC編號	zh_TW
dc.subject	Catalytic site	en
dc.subject	Enzyme function	en
dc.subject	EC number	en
dc.subject	Signature	en
dc.subject	Sequential pattern mining	en
dc.title	利用序列特徵探勘預測酵素催化部位	zh_TW
dc.title	Prediction of enzyme catalytic sites by sequential pattern mining	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳倩瑜(Chien-Yu Chen),張天豪(Tien-Hao Chang)
dc.subject.keyword	蛋白質序列探勘,催化部位,序列特徵,EC編號,酵素功能,	zh_TW
dc.subject.keyword	Sequential pattern mining,Catalytic site,Signature,EC number,Enzyme function,	en
dc.relation.page	39
dc.rights.note	有償授權
dc.date.accepted	2008-08-27
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 未授權公開取用	1.02 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。