利用二級結構資訊提昇蛋白質非穩定區段的預測準確度

Tong-Ming Xu; 許通明

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32676

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正(Yen-Jen Oyang)
dc.contributor.author	Tong-Ming Xu	en
dc.contributor.author	許通明	zh_TW
dc.date.accessioned	2021-06-13T04:13:19Z	-
dc.date.available	2006-07-29
dc.date.copyright	2006-07-29
dc.date.issued	2006
dc.date.submitted	2006-07-25
dc.identifier.citation	1. Li, X., et al., Predicting Protein Disorder for N-, C-, and Internal Regions. Genome Inform Ser Workshop Genome Inform, 1999. 10: p. 30-40. 2. Romero, P., et al., Sequence complexity of disordered protein. Proteins, 2001. 42(1): p. 38-48. 3. Dunker, A.K., et al., Protein disorder and the evolution of molecular recognition: theory, predictions and observations. Pac Symp Biocomput, 1998: p. 473-84. 4. Yang, Z.R., et al., RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics, 2005. 21(16): p. 3369-76. 5. Jones, D.T. and J.J. Ward, Prediction of disordered regions in proteins from position specific score matrices. Proteins, 2003. 53 Suppl 6: p. 573-8. 6. Romero, P., Z. Obradovic, and A.K. Dunker, Folding minimal sequences: the lower bound for sequence complexity of globular proteins. FEBS Lett, 1999. 462(3): p. 363-7. 7. Romero P, O.Z., Kissinger C, Villafranca JE, Dunker AK, Identifying disordered regions in proteins from amino acid sequence. Proc. IEEE Int.Conf. Neural Networks., 1997: p. 1:90-95. 8. Shimizu K, H.S., Noguchi T, Muraoka Y., Predicting the protein disordered region using modified position specific scoring matrix. Genome Informatics, 2004: p. P150. 9. Coeytaux, K. and A. Poupon, Prediction of unfolded segments in a protein sequence based on amino acid composition. Bioinformatics, 2005. 21(9): p. 1891-900. 10. Dosztanyi, Z., et al., The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol, 2005. 347(4): p. 827-39. 11. Chung-Tsai Su, C.-Y.C., and Yu-Yen Ou, Incorporating Disorder Tendency with Reduced Position-Specific Score Matrices in Protein Disorder Prediction. 2006. 12. Linding, R., et al., GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res, 2003. 31(13): p. 3701-8. 13. Ward, J.J., et al., The DISOPRED server for the prediction of protein disorder. Bioinformatics, 2004. 20(13): p. 2138-9. 14. Ward, J.J., et al., Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol, 2004. 337(3): p. 635-45. 15. Cheng J, S.M., Baldi P, Accurate prediction of protein disordered regions by mining protein structure data. Data Mining and Knowledge Discovery, 2005. 16. Zoran Obradovic, K.P., Slobodan Vucetic, Predrag Radivojac, and A. Keith Dunker, Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder. PROTEINS: Structure, Function, and Bioinformatics Suppl, 2005: p. 7:176–182. 17. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42. 18. Li, W., L. Jaroszewski, and A. Godzik, Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 2001. 17(3): p. 282-3. 19. Li, W., L. Jaroszewski, and A. Godzik, Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics, 2002. 18(1): p. 77-82. 20. Vucetic, S., et al., Flavors of protein disorder. Proteins, 2003. 52(4): p. 573-84. 21. Uversky, V.N., J.R. Gillespie, and A.L. Fink, Why are 'natively unfolded' proteins unstructured under physiologic conditions? Proteins, 2000. 41(3): p. 415-27. 22. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402. 23. Cuff, J.A. and G.J. Barton, Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 1999. 34(4): p. 508-19. 24. YY Ou, QuickRBF http://muse.csie.ntu.edu.tw/~yien/quickrbf/index.php. 2004.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32676	-
dc.description.abstract	現在有愈來愈多的蛋白質或其序列的某些區段，被發現折疊之後並無法形成穩定的結構。而在這些非穩定區段（disordered regions）有些已經被証實有特定的生物功能。其它一些沒有功能的非穩定區段由於在空間上的形狀是比較有彈性的，因此可以提供折疊跟纏繞空間讓作用區段能和其它的對象結合作用。此外，亦有其它研究發現，這些擁有不穩定區段之蛋白質，常常可藉由與其它的作用分子結合而形成穩定結構並使其功能活化。所以蛋白質非穩定區段的相關研究和預測是有助於蛋白質結構與功能之相關分析。近年來，有許多非穩定區段的預測方法是利用胺基酸的組成，或者胺基酸的生化性質來做為預測時所使用的特徵值，也有許多方法曾試圖引入二級結構資訊進行預測，本論文針對二級結構資訊提出幾種有意義的特徵值來進行實驗，並討論其各自對非穩定區段預測的表現結果之影響。本論文採取二階段式的方法來做蛋白質非穩定性區段的預測，在第一階段取蛋白質序列上資訊來當特徵，利用 RBFN （Radial Basis Function Network）來做非穩定區段的預測。在此同時，利用二級結構的預測工具來預測蛋白質的二級結構，轉換為以距離方式來呈現二級結構資訊的特徵；第二階段時，利用第一階段預測結果，然後整合二級結構的資訊進行最後的預測。而實驗證明轉化後之二級結構資訊有助於預測結果之準確度，其中以距離最近二級結構之的資訊對於預測蛋白質非穩定性區段是有明顯幫助的。	zh_TW
dc.description.abstract	There are increasing quantities of proteins discovered to contain regions that do not form stable tertiary structures in their native states. Such sequence fragments that have no propensity to form specific structures are regarded as “disordered regions”. Some disordered regions have been justified to be functionally significant. Therefore, a reliable predictor for such disordered regions is important for further understanding of protein functions. Most recent studies employ the amino acid composition and/or a number of biochemical properties within a sliding window with respect to the target residue as the feature set in predicting protein disorder. In this regard, this thesis conducts a comprehensive study on the performance of a recently proposed feature set which considers both physicochemical properties and amino acid propensity for order/disorder, and demonstrates how a two-stage framework improves the accuracy of the classifier. Furthermore, we propose a novel feature based on protein secondary structures to reduce potential false postives. This thesis attempts several ways of extracting information from the local secondary structures. The experimental results reveal that the feature set taking the distance to the nearest secondary structure element (SSE) of the target residue outperforms the others. In particular, it is observed that employing the proposed feature set in the second stage delivers better accuracies than.that is used together with the original feature sets.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T04:13:19Z (GMT). No. of bitstreams: 1 ntu-95-R93922126-1.pdf: 638768 bytes, checksum: d9ae04f7f39089f51bfca9928b9ab182 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Chapter 1 導言 1 Chapter 2 蛋白質非穩定區段的相關研究 3 2.1 相關蛋白質特徵集的使用 3 2.1.1 胺基酸性質 3 2.1.2 PSSM 及其應用 4 2.1.3 二級結構資訊 4 2.2 相關分類法比較 5 2.2.1 類神經網路（Neural Network - NN） 5 2.2.2 支持向量機器（support vector machines - SVM） 6 2.2.3 徑向基函數網路（Radial Basis Function Network - RBFN） 7 2.3 各類使用二級結構資訊的蛋白質非穩定區段預測法 9 2.3.1 GlobPlot 9 2.3.2 DISOPRED & DISOPRED2 10 2.3.3 DISpro 11 2.3.4 VSL2 12 2.3.5 DisPSSMP 13 Chapter 3 以 PSSM 為基礎加二級結構資訊的蛋白質非穩定區段預測 15 3.1 目標 15 3.2 Datasets 15 3.3 相關特徵集的採用 17 3.3.1 PSSM 17 3.3.2 PSSMP-4 19 3.3.3 二級結構資訊的使用 20 3.4 Sampling 25 3.5 分類法 RBFN 及分類工具QuickRBF 25 3.6 第二階段預測 26 3.7 評估準則 26 Chapter 4 實驗與討論 29 4.1 實驗一－Sampling & PSSM 29 4.1.1 實驗目標： 29 4.1.2 實驗結果： 29 4.1.3 討論： 30 4.2 實驗二－二級結構的運用 30 4.2.1 實驗目標： 30 4.2.2 實驗結果： 32 4.2.3 討論： 33 4.3 實驗三－PSSMP-4 + SSE-Distance 34 4.3.1 實驗目標： 34 4.3.2 實驗結果： 34 4.3.3 討論： 35 4.4 實驗四－利用二階段的預測法 36 4.4.1 實驗目標： 36 4.4.2 實驗結果： 36 4.4.3 討論： 38 4.5 各類實驗結果的比較 39 Chapter 5 結論與未來展望 42 5.1 結論 42 5.2 未來展望 42 Reference： 44
dc.language.iso	zh-TW
dc.title	利用二級結構資訊提昇蛋白質非穩定區段的預測準確度	zh_TW
dc.title	Improving Protein Disorder Prediction by Secondary Structure Information	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	趙坤茂(Kun-Mao Chao),黃鎮剛(Jenn-Kang Hwang),陳倩瑜(Chien-Yu Chen)
dc.subject.keyword	蛋白質,非穩定區段,序列,二級結構,	zh_TW
dc.subject.keyword	protein,disorder region,SSE,sequence analysis,disorder,	en
dc.relation.page	45
dc.rights.note	有償授權
dc.date.accepted	2006-07-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 目前未授權公開取用	623.8 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。