基於結合模序配對探勘之蛋白質交互作用預測

Chi-Yuan Yu; 游棨元

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42555

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正
dc.contributor.author	Chi-Yuan Yu	en
dc.contributor.author	游棨元	zh_TW
dc.date.accessioned	2021-06-15T01:16:10Z	-
dc.date.available	2014-07-30
dc.date.copyright	2009-07-30
dc.date.issued	2009
dc.date.submitted	2009-07-28
dc.identifier.citation	1. Golemis, E. and P.D. Adams, Protein-protein interactions : a molecular cloning manual. 2nd ed. 2005, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xiv, 938 p. 2. Bollenbach, T.J. and T. Nowak, Kinetic linked-function analysis of the multiligand interactions on Mg(2+)-activated yeast pyruvate kinase. Biochemistry, 2001. 40(43): p. 13097-106. 3. Fields, S. and O.K. Song, A Novel Genetic System to Detect Protein Protein Interactions. Nature, 1989. 340(6230): p. 245-246. 4. Ito, T., et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(8): p. 4569-4574. 5. Rigaut, G., et al., A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol, 1999. 17(10): p. 1030-2. 6. Gavin, A.C., et al., Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 2002. 415(6868): p. 141-147. 7. Krogan, N.J., et al., Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 2006. 440(7084): p. 637-43. 8. Gavin, A.C., et al., Proteome survey reveals modularity of the yeast cell machinery. Nature, 2006. 440(7084): p. 631-636. 9. MacBeath, G. and S.L. Schreiber, Printing proteins as microarrays for high-throughput function determination. Science, 2000. 289(5485): p. 1760-3. 10. Jones, R.B., et al., A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature, 2006. 439(7073): p. 168-74. 11. Stoevesandt, O., M.J. Taussig, and M. He, Protein microarrays: high-throughput tools for proteomics. Expert Rev Proteomics, 2009. 6(2): p. 145-57. 12. Figeys, D., L.D. McBroom, and M.F. Moran, Mass spectrometry for the study of protein-protein interactions. Methods, 2001. 24(3): p. 230-9. 13. McCammon, M.G., et al., Screening transthyretin amyloid fibril inhibitors: characterization of novel multiprotein, multiligand complexes by mass spectrometry. Structure, 2002. 10(6): p. 851-63. 14. Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 2002. 415(6868): p. 180-183. 15. Han, J.D.J., et al., Effect of sampling on topology predictions of protein-protein interaction networks. Nature Biotechnology, 2005. 23(7): p. 839-844. 16. Hart, G.T., A.K. Ramani, and E.M. Marcotte, How complete are current yeast and human protein-interaction networks? Genome Biology, 2006. 7(11): p. -. 17. Dandekar, T., et al., Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci, 1998. 23(9): p. 324-8. 18. Marcotte, E.M., et al., Detecting protein function and protein-protein interactions from genome sequences. Science, 1999. 285(5428): p. 751-753. 19. Enright, A.J., et al., Protein interaction maps for complete genomes based on gene fusion events. Nature, 1999. 402(6757): p. 86-90. 20. Pellegrini, M., et al., Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America, 1999. 96(8): p. 4285-4288. 21. Vert, J.P., A tree kernel to analyse phylogenetic profiles. Bioinformatics, 2002. 18 Suppl 1: p. S276-84. 22. Pazos, F. and A. Valencia, Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 2001. 14(9): p. 609-14. 23. Ma, B., et al., Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci U S A, 2003. 100(10): p. 5772-7. 24. Aloy, P. and R.B. Russell, InterPreTS: protein Interaction Prediction through Tertiary Structure. Bioinformatics, 2003. 19(1): p. 161-162. 25. Lu, L., H. Lu, and J. Skolnick, MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins, 2002. 49(3): p. 350-64. 26. Wodak, S.J. and R. Mendez, Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. Curr Opin Struct Biol, 2004. 14(2): p. 242-9. 27. Ogmen, U., et al., PRISM: protein interactions by structural matching. Nucleic Acids Res, 2005. 33(Web Server issue): p. W331-6. 28. Espadaler, J., et al., Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics, 2005. 21(16): p. 3360-3368. 29. Deng, M., et al., Inferring domain-domain interactions from protein-protein interactions. Genome Res, 2002. 12(10): p. 1540-8. 30. Han, D.S., et al., PreSPI: a domain combination based prediction system for protein-protein interaction. Nucleic Acids Res, 2004. 32(21): p. 6312-20. 31. Kim, W.K., J. Park, and J.K. Suh, Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform, 2002. 13: p. 42-50. 32. Bock, J.R. and D.A. Gough, Predicting protein-protein interactions from primary structure. Bioinformatics, 2001. 17(5): p. 455-460. 33. Shen, J.W., et al., Predictina protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences of the United States of America, 2007. 104(11): p. 4337-4341. 34. Chen, X.W. and M. Liu, Prediction of protein-protein interactions using random decision forest framework. Bioinformatics, 2005. 21(24): p. 4394-4400. 35. Guo, Y., et al., Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res, 2008. 36(9): p. 3025-30. 36. Sprinzak, E. and H. Margalit, Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol, 2001. 311(4): p. 681-92. 37. Kumar, S., et al., Folding and binding cascades: dynamic landscapes and population shifts. Protein Sci, 2000. 9(1): p. 10-9. 38. Li, H., J. Li, and L. Wong, Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale. Bioinformatics, 2006. 22(8): p. 989-96. 39. Hsu, C.M., C.Y. Chen, and B.J. Liu, MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res, 2006. 34(Web Server issue): p. W356-61. 40. Hsu, C.M., C.Y. Chen, and B.J. Liu, MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res, 2008. 36(4): p. 1400-6. 41. Hsu, C.M., et al., Identification of hot regions in protein-protein interactions by sequential pattern mining. BMC Bioinformatics, 2007. 8 Suppl 5: p. S8. 42. Henikoff, S. and J.G. Henikoff, Automated assembly of protein blocks for database searching. Nucleic Acids Res, 1991. 19(23): p. 6565-72. 43. Jonassen, I., J.F. Collins, and D.G. Higgins, Finding flexible patterns in unaligned protein sequences. Protein Sci, 1995. 4(8): p. 1587-95. 44. Jonassen, I., Efficient discovery of conserved patterns using a pattern graph. Comput Appl Biosci, 1997. 13(5): p. 509-22. 45. Wu, X., et al., Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res, 2006. 34(7): p. 2137-50. 46. Crick, F.H., On protein synthesis. Symp Soc Exp Biol, 1958. 12: p. 138-63. 47. Rodbell, M., The role of hormone receptors and GTP-regulatory proteins in membrane transduction. Nature, 1980. 284(5751): p. 17-22. 48. Lim, W.A., F.M. Richards, and R.O. Fox, Structural determinants of peptide-binding orientation and of sequence specificity in SH3 domains. Nature, 1994. 372(6504): p. 375-9. 49. Feng, S., et al., Two binding orientations for peptides to the Src SH3 domain: development of a general model for SH3-ligand interactions. Science, 1994. 266(5188): p. 1241-7. 50. Fazi, B., et al., Unusual binding properties of the SH3 domain of the yeast actin-binding protein Abp1: structural and functional analysis. J Biol Chem, 2002. 277(7): p. 5290-8. 51. Bork, P., et al., Protein interaction networks from yeast to human. Curr Opin Struct Biol, 2004. 14(3): p. 292-9. 52. Salwinski, L., et al., The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 2004. 32: p. D449-D451. 53. Bader, G.D., D. Betel, and C.W.V. Hogue, BIND: the Biomolecular Interaction Network Database. Nucleic Acids Research, 2003. 31(1): p. 248-250. 54. Mishra, G.R., et al., Human protein reference database - 2006 update. Nucleic Acids Research, 2006. 34: p. D411-D414. 55. Chatr-aryamontri, A., et al., MINT: the Molecular INTeraction database. Nucleic Acids Res, 2007. 35(Database issue): p. D572-4. 56. Luc, P.V. and P. Tempst, PINdb: a database of nuclear protein complexes from human and yeast. Bioinformatics, 2004. 20(9): p. 1413-5. 57. Kerrien, S., et al., IntAct - open source resource for molecular interaction data. Nucleic Acids Research, 2007. 35: p. D561-D565. 58. Mewes, H.W., et al., MIPS: a database for genomes and protein sequences. Nucleic Acids Res, 2002. 30(1): p. 31-4. 59. Stark, C., et al., BioGRID: a general repository for interaction datasets. Nucleic Acids Res, 2006. 34(Database issue): p. D535-9. 60. Pitre, S., et al., Computational methods for predicting protein-protein interactions. Adv Biochem Eng Biotechnol, 2008. 110: p. 247-67. 61. Aloy, P. and R.B. Russell, Interrogating protein interaction networks through structural biology. Proceedings of the National Academy of Sciences of the United States of America, 2002. 99(9): p. 5896-5901. 62. Aloy, P., M. Pichaud, and R.B. Russell, Protein complexes: structure prediction challenges for the 21st century. Curr Opin Struct Biol, 2005. 15(1): p. 15-22. 63. Sonnhammer, E.L., S.R. Eddy, and R. Durbin, Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 1997. 28(3): p. 405-20. 64. Agrawal, R. and R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases, in Proceedings of the 20th International Conference on Very Large Data Bases. 1994, Morgan Kaufmann Publishers Inc. 65. Han, J., et al., Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Min. Knowl. Discov., 2004. 8(1): p. 53-87. 66. Uno, T., M. Kiyomi, and H. Arimura, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, in Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations. 2005, ACM: Chicago, Illinois. 67. Thompson, J.D., D.G. Higgins, and T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22(22): p. 4673-80. 68. Henikoff, S., et al., Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene, 1995. 163(2): p. GC17-26. 69. Hulo, N., et al., The 20 years of PROSITE. Nucleic Acids Res, 2008. 36(Database issue): p. D245-9. 70. Doolittle, R.F., Similar amino acid sequences: chance or common ancestry? Science, 1981. 214(4517): p. 149-59. 71. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-10. 72. Shomer, B., Seqalert--a daily sequence alertness server for the EMBL and SWISSPROT databases. Comput Appl Biosci, 1997. 13(5): p. 545-7. 73. Al-Shahrour, F., et al., FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res, 2007. 35(Web Server issue): p. W91-6. 74. Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42555	-
dc.description.abstract	蛋白質交互作用是生物體執行功能的基礎。透過蛋白質交互作用的研究，可以理解細胞運作的基本原理，進而開發、設計藥物，並針對疾病進行治療，因此了解蛋白質的交互作用不論在基礎或臨床的研究上都非常重要。由於以生物實驗驗證蛋白質交互作用需要耗費過多的時間與金錢，因此開發計算方法輔助，減少研究蛋白質交互作用消耗的資源，為現今系統生物學研究的首要工作之一。計算方法中，「以模序為基礎 (motif-based)」的方法利用樣式探勘 (pattern mining) 演算法找出結合模序 (binding motif)，再透過樣式比對 (pattern matching) 來預測蛋白質交互作用。其優勢在於只需要蛋白質序列即可進行分析與預測，而且可以得知蛋白質產生交互作用之區域。傳統以模序為基礎之方法大多是利用功能序列在同源家族蛋白質中具有高度保留性的原理，針對同源蛋白質序列進行樣式探勘，試圖挖掘出結合模序。不過以此方法所找出的具保留性之序列不一定與交互作用有關，可能是用來維持結構或其他功能。為了克服這個問題，近年來有學者提出應該針對具備相同交互作用機制的蛋白質進行樣式探勘，並提出透過偵測全對全作用網路 (all-versus-all interaction network) 的方式來蒐集探勘樣式所需之蛋白質。然而，近年來已發表的研究中並沒有將這套方法所探勘之結合模序運用於蛋白質交互作用之預測，以精確評估這種機制的效果以及採用不同樣式探勘演算法的效應。本研究提出以一個先進的蛋白質序列樣式探勘演算法為基礎，搭配全對全作用網路，以進行蛋白質交互作用預測。此蛋白質序列樣式探勘演算法之特色在於能夠探勘出由數個短樣式組合而成的長樣式，這種特性相當符合蛋白質結合介面 (binding interface) 通常是由許多序列片段所組成之特性。本研究的結果顯示，樣式探勘演算法對於結合模序挖掘與蛋白質交互作用預測影響皆非常顯著。本研究所採用的先進演算法，不論結合模序之正確性以及蛋白質交互作用預測之準確率都較其他方法來得優越。	zh_TW
dc.description.abstract	Protein-Protein interactions (PPIs) are essential to various biological functions in living organisms. Studying PPI not only provides critical clues for understanding how a cell operates but also may lead to development of advanced diagnoses and therapies. In this regard, as it requires huge amounts of time and resources to confirm protein-protein interactions with molecular biology experiments, design of computational approaches to predict possible protein-protein interactions is of scientific significance for advances in systems biology. One existing approach to predict protein-protein interactions is based on the binding motifs extracted by pattern mining algorithms. Motif-based approaches are favored by biologists who want to conduct in-depth analyses on how the concerned proteins interact, instead of just knowing whether these proteins interact with each other or not. With respect to motif-based prediction of protein-protein interactions, there exist two major categories of approaches. One category of approaches simply resorts to analysis of the polypeptide sequences, while another category of approaches further refers to the tertiary structures of proteins. As the availability of the tertiary structures of proteins is still limited to certain groups of proteins, sequence-based approaches are more generally applicable. The conventional motif-based approaches extract binding motifs through identifying evolutionally conserved regions in polypeptide sequences. However, evolutional conservation is just a necessary condition and is not a sufficient condition for presence of interaction sites. Certain regions in a protein chain may be conserved in order to maintain a conformation. Therefore, in recent years, researchers have proposed a novel approach to identify protein-protein interaction motifs through analysis of interaction networks. Nevertheless, latest studies did not report a comprehensive analysis on the quality of the interaction motifs identified, let alone the effects with alternative pattern mining algorithms. The study reported in this thesis has followed the recent development and has employed a state-of-the-art pattern mining algorithm to deliver superior performance in identifying protein-protein interaction motifs. The most distinctive feature of the pattern mining algorithm employed in this study is its capability in identifying patterns composed of several short gapped segments. Experimental results reveal that the predictor designed in this study really outperforms the predictors that incorporate other pattern mining algorithms.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T01:16:10Z (GMT). No. of bitstreams: 1 ntu-98-R96945017-1.pdf: 2423131 bytes, checksum: 78de0131154704bd477709f3107604f0 (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	論文口試委員審定書 I 誌謝 II 摘要 III Abstract V 目錄 VII 圖目錄 IX 表目錄 X 第一章緒論 1 第二章相關研究 5 2.1 蛋白質及其交互作用 5 2.2 蛋白質交互作用資料庫 7 2.3 預測蛋白質交互作用之計算方法 8 2.3.1 基因體法 9 2.3.2 演化關聯性法 10 2.3.3 蛋白質結構法 11 2.3.4 功能區塊法 12 2.3.5 蛋白質一級結構法 13 2.3.5.1 以分類器為基礎之方法 14 2.3.5.2 以模序為基礎之方法 14 第三章研究方法 16 3.1 資料集 16 3.2 本研究提出之方法 16 3.2.1 全對全作用網路探勘 17 3.2.2 交互作用結合模序配對挖掘 21 3.2.2.1 Wildspan 22 3.2.2.2 Protomat 23 3.2.2.3 Pratt 24 3.2.3 交互作用蛋白質配對比對 25 第四章實驗結果與討論 26 4.1 本方法蛋白質交互作用預測效能與探勘模序正確性之評估 26 4.1.1 蛋白質交互作用預測效能評估 26 4.1.2 探勘模序配對正確性分析 29 4.2 同源蛋白質與偵測全對全作用網路兩種蒐集訓練集方法之比較 33 4.2.1 同源蛋白質與偵測全對全作用網路蒐集蛋白質組之差異 34 4.2.2 蛋白質交互作用預測效能之比較 34 4.2.3 探勘模序配對正確性分析 35 4.3 預測蛋白質配對具有交互作用之信賴程度分析 37 第五章結論與未來展望 40 5.1 結論 40 5.2 未來展望 40 參考文獻 41 圖目錄圖一全對全作用網路 3 圖二分子生物學中心法則 5 圖三基因體法 10 圖四演化關聯性法 11 圖五蛋白質結構法 12 圖六功能區塊法 13 圖七蛋白質一級結構法 15 圖八本研究提出方法之流程 18 圖九全對全作用網路探勘 19 圖十頻繁項目集探勘演算法 (Apriori) 20 圖十一交互作用結合模序配對挖掘 22 圖十二 Wildspan 產生之樣式 23 圖十三 Protomat 產生之樣式 24 圖十四 Pratt 產生之樣式 24 圖十五交互作用蛋白質配對比對 25 圖十六蛋白質配對與全對全作用網路之關聯性 30 圖十七未位於訓練集之蛋白質交互作用配對距離分數之分佈 31 圖十八結合模序配對於蛋白質複合體1U7F上之位置 32 圖十九同源蛋白質所探勘出之模序配對於蛋白質複合體1U7F上之位置 37 圖二十基因知識體分數信賴度分佈 38 表目錄表一二十種胺基酸 6 表二驗證交互作用之生物實驗方法 7 表三蛋白質交互作用資料庫 8 表四資料集 16 表五不同資料集與參數下所產生之訓練集 27 表六不同樣式演算法之蛋白質交互作用預測效能 28 表七組內平均序列相似度 34 表八同源蛋白質與偵測全對全作用網路蒐集訓練集之蛋白質交互作用預測效能 36
dc.language.iso	zh-TW
dc.subject	樣式探勘	zh_TW
dc.subject	全對全作用網路.	zh_TW
dc.subject	蛋白質交互作用	zh_TW
dc.subject	蛋白質序列	zh_TW
dc.subject	結合模序	zh_TW
dc.subject	pattern mining	en
dc.subject	all-versus-all interaction network.	en
dc.subject	protein sequence	en
dc.subject	binding motif	en
dc.subject	protein-protein interaction	en
dc.title	基於結合模序配對探勘之蛋白質交互作用預測	zh_TW
dc.title	Predicting Protein-Protein Interactions with a Network-based Motif Miner	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張天豪,陳倩瑜,黃乾綱,黃奇英
dc.subject.keyword	蛋白質交互作用,樣式探勘,結合模序,蛋白質序列,全對全作用網路.,	zh_TW
dc.subject.keyword	protein-protein interaction,pattern mining,binding motif,protein sequence,all-versus-all interaction network.,	en
dc.relation.page	45
dc.rights.note	有償授權
dc.date.accepted	2009-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	2.37 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。