區域結構碼序列在蛋白質穿針引線法上的應用

Yang-Wen Chen; 陳暘文

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31501

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳中明
dc.contributor.author	Yang-Wen Chen	en
dc.contributor.author	陳暘文	zh_TW
dc.date.accessioned	2021-06-13T03:13:52Z	-
dc.date.available	2006-08-09
dc.date.copyright	2006-08-09
dc.date.issued	2006
dc.date.submitted	2006-08-07
dc.identifier.citation	[1] 陳政偉，2005，” 蛋白質區域保留結構片段之分群編碼研究。”碩士論文，國立台灣大學醫學工程學研究所。 [2] Kolodny R, Levitt M. 2003. ”Protein Decoy Assembly Using Short Fragments Under Geometric Constraints.” Biopolymers. 68（3）:278-85. [3] Abagyan R, Batalov S, Cardozo T, Totrov M, Webber J, Zhou Y. 1997. Homology modeling with internal coordinate mechanics: Deformation zone mapping and improvements of models via conformational search. PROTEINS: Structure, Function, and Genetics, Suppl. 1: 29–37. [4] Capener CE, Shrivastava IH, Ranatunga KM, Forrest LR, Smith GR, Sansom MSP. 2000. Homology Modeling and Molecular Dynamics Simulation Studies of an Inward Rectifier Potassium Channel. Biophysical Journal. 78: 2929-2942. [5] Ogawa H, Toyoshima C. 2002. Homology modeling of the cation binding sites of Na+K+-ATPase. PNAS. 99: 15977-15982. [6] Cornilescu G, Delaglio F, Bax A. 1999. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR. 13: 289-302. [7] Chou JJ, Li S, Bax A. 2000. Study of conformational rearrangement and refinement of structural homology models by the use of heteronuclear dipolar couplings. J. Biomol. NMR. 18: 217-227. [8] Moult J, Hubbard T, Fidelis K, Pedersen JT. 1999. Critical assessment of methods of protein structure prediction (CASP): round III. Proteins: Struct. Funct .Genet. Suppl. 3: 2–6. [9] Bystroff C, Baker D. 1998. Prediction of local structure in proteins using a library of sequence-structure motifs. J. Mol. Biol. 281: 565–577. [10] Gatchell DW, Dennis S, Vejda S. 2000. Discrimination of near-native proteins structures from misfolded models by empirical free energy functions. Proteins: Struct. Funct. Genet. 41: 518–534 [11] Lee MR, Duan Y, Kollman PA. 2000. Use of MM-PB/SA in estimating the free energies of proteins: application to native, intermediates, and unfolded villin headpiece. Proteins: Struct. Funct. Genet. 39: 309–316. [12] Simons KT, Kooperberg C, Huang E, Baker D. 1997. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268:209–25 [13] Cui Y, Chen RS, Wong WH. 1998. Protein folding simulation with genetic algorithm and supersecondary structure constraints. Proteins: Struct. Funct. Genet. 31: 247-257. [14] Salzberg S, Seals D, Kasif S. 1998. Computational Method in Molecular Biology, Chapter 12. [15] Godzik A. 1997. Counting and classifying possible protein folds. Trends in biotechnology.15 : 147-151 [16] Lathrop RH. 1994. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering. 7:1059-1068. [17] Akustu T, Miyano S. 1997. On the approximation of protein threading. In S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir, and M.Waterman (Eds.). Proc. Intl. Conf. on Computional Molecular Biology. pp. 3-8 [18] Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, et al. 1995. Principles of protein folding—a perspective from simple exact models. Protein Sci. 4: 561–602. [19] Hinds DA, Levitt M. 1994. Exploring conformational space with a simple lattice model for protein structure. J. Mol. Biol. 243: 668–682. [20] Ishikawa K, Yue K, Dill KA. 1999. Predicting the structures of 18 peptides using Geocore. Protein Sci. 8: 716–721. [21] Skolnick J, Kolinski A. 1991. Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J. Mol. Biol. 221:499–531. [22] Reva BA, Finkelstein AV, Sanner MF, Olson AJ. 1996. Adjusting potential energy functions for lattice models of chain molecules. Proteins: Struct. Funct. Genet. 25: 379–388. [23] Park BH, Levitt M. 1995. The complexity and accuracy of discrete state models of protein structure. J. Mol. Biol. 249: 493–507. [24] Kolinski A, Skolnick J. 1994. Monte Carlo using simulations of protein folding. I. Lattice model and interaction scheme. Proteins:Struct. Funct. Genet. 18: 338–352. [25] Kolinski A, Skolnick J. 1994. Monte Carlo simulations of protein folding. II. Application to protein A, ROP, and crambin. Proteins: Struct. Funct. Genet. 18: 353–366. [26] Simons KT, Kooperberg C, Huang E, Baker D. 1997. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268:209–25 Biochem. Soc. Trans. 10: 299–301. [27] Sternberg MJ, Cohen FE, Taylor WR. 1982. A combinational approach to the prediction of the tertiary fold of globular proteins. Biochem. Soc. Trans. 10: 299–301. [28] Park BH, Levitt M. 1995. The complexity and accuracy of discrete state models of protein structure. J. Mol. Biol. 249:493–507. [29] Blanco FJ, RivasG, Serrano L. 1994. Ashort linear peptide that folds into a native stable beta-hairpin in aqueous solution. Nat. Struct. Biol. 1: 584–590. [30] Callihan DE, Logan TM. 1999. Conformations of peptide fragments from the FK506 binding protein: comparison with the native and urea-unfolded states. J. Mol. Biol. 285: 2161–2175. [31] Marqusee S, Robbins VH, Baldwin RL. 1989. Unusually stable helix formation in short alanine-based peptides. Proc. Natl. Acad. Sci. USA 86: 5286–5290. [32] Munoz V, Serrano L. 1996. Local versus nonlocal interactions in protein folding and stability—an experimentalist’s point of view. Fold. Des. 1: R71–77. [33] Cohen BI, Presnell SR, Cohen FE. 1993. Origins of structural diversity within sequentially identical hexapeptides. Protein Sci. 2: 2134–2145. [34] Kabsch W, Sander C. 1984. On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc. Natl. Acad. Sci. USA 81: 1075–1078. [35] Bystroff C, Simons KT, Han KF, Baker D. 1996. Local sequence-structure correlations in proteins. Curr. Opin. Biotechnol. 7: 417–421. [36] Bystroff C, Thorsson V, Baker D. 2000. HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173–190. [37] Han KF, Bystroff C, Baker D. 1997. Three-dimensional structures and contexts associated with recurrent amino acid sequence patterns. Protein Sci. 6: 1587–1590. [38] Orengo CA, Bray JE, Hubbard T, LoConte L, Sillitoe I. 1999. Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins: Struct. Funct. Genet. Suppl 3: 149–170. [39] Mirny LA, Shakhnovich EI. Protein structure prediction by threading: Why it works and why it does not. 1998. J. Mol. Biol. 283:507-526. [40] Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H. Froschauer, R., Gottsbacher, K., Casari, G., Sippl, M.J. 1990. Identification of native protein folds amongst a large number of incorrect models: the calculation of low energy conformations from potentials of mean force. J. Mol. Biol. 216:167-180. [41] Sippl, M.J. 1990. Calculation of conformational ensembles from potentials of mean force. An approach to the Knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213:859-883. [42] David T. Jones. 1999. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol. 287(4):797-815. [43] Salzberg S, Seals D, Kasif S. 1998. Computational Method in Molecular Biology, Chapter 13. [44] Marchler-Bauer A, Bryant SH. 1997. A measure of success in fold recognition. Trends Biochem Sci; 22:236-240 [45] Karlin, S., Dembo, A., Kawabata, T. 1990. Statistical composition of high-scoring segments from molecular sequences. Ann. Statist. 18:571—581 [46] 宋大辰，2002，”蛋白質局部重複性結構之分析—以EM為輔助之群聚演算法。”碩士論文，國立台灣大學醫學工程學研究所。
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31501	-
dc.description.abstract	最近，由於人類基因體定序完成，隨之而來的大量基因序列資料已使得傳統醫學有了革命性的進展。有了電腦科學的幫助，我們不但可以在短時間分析大量的資料，也能確保生物醫學的研究具有正確性與安全性。然而，研究基因層級對於臨床應用的實用性並不高，因為真正參與生理作用的往往是基因所表現的蛋白質。一般相信結構和功能間有著密切的關係，如果我們能對蛋白質折疊的形狀有清楚的了解，那麼我們也就能大致上決定它的功能。在過去，蛋白質結構必須經由實驗才能得知，例如：運用X-射線繞射法或是NMR光譜法，但此二者皆有其技術上的困難與限制。因此，蛋白質結構預測已逐漸在生物醫學的研究當中扮演了相當重要的角色。蛋白質結構預測的主要困難在於缺乏同源性蛋白質時，模板骨架往往不易選取。如果一個未知結構的序列能在蛋白質資料庫中找到與其序列相似度介於20%與30%之間的遠同源蛋白質，便可使用序列結構比對（或稱為穿針引線法）來解決結構預測的問題。傳統的蛋白質穿針引線法是架構在胺基酸序列的層次上，去探究不同的胺基酸在不同空間結構與環境上的偏向性。不過，蛋白質的一級結構與三級結構之間，往往僅有少量的關連性存在，因此穿針引線法在現階段仍有所多需要突破的地方。在本研究中，我們提出了一個以區域結構碼序列為基礎的蛋白質穿針引線法。區域結構碼是在我們的先期研究中，藉由將一些具有相似保留結構的四元片段分成30群而建立的。這些將序列與結構兩個面向加以聯繫的區域結構碼，將比胺基酸序列更具有空間結構上的意涵。我們從SCOP 1.69的資料庫中挑選了945個折疊代表，以此建立出我們的模板資料庫。在適當選取訓練資料及對每一個能量項目建立出相對應的分數矩陣後，我們就可以衡量輸入進來的區域結構碼序列與模板資料庫中的每個元素的適合性。為了減少搜尋空間及時間的複雜度，我們引進了尋找多個定位點的概念，而這個想法有點類似在尋找兩個蛋白質之間，彼此相對應的結構模組。雖然初步結果並未有突破性的表現，我們仍然探討了區域結構碼序列在蛋白質穿針引線法上的可能應用。與著名的穿針引線法伺服器Gen-THREADER比較過後，我們發現：若能加大我們的模板資料庫，研究的結果將會有所提升。一些測試資料的結果也顯示：我們應使用可以信賴的演算法去尋找適合的能量函數權值，並觀察在不同的折疊下其相對應最佳能量函數權值的變化。即使初步的研究情形遇到了瓶頸，但將區域結構碼序列應用到蛋白質穿針引線法上，仍是生物資訊學上的一個新的嘗試與突破。我們也說明了這個具有獨特雙重性質（快速搜尋的序列層次及三維結構的空間資訊）的區域結構碼序列，是值得好好加以重視的。因此，我們期望區域結構碼序列的概念能更全面性的應用到蛋白質結構預測及結構生物資訊學的領域當中。	zh_TW
dc.description.abstract	The Human Genome Project has recently completed sequencing of human genome. Consequently, the huge amount of genomic sequence data has revolutionized the studies of conventional medical science. With the aid of computer science, we can not only analyze numerous data but ensure the safety and correctness of the studies of medical science as well. However, researches in genomic level might be less practical than those in protein level in terms of further applications to clinical use because it is protein that actually participates in a physiological process. It is commonly believed that protein structures are highly correlated with protein functions. If we get a clear picture of how a protein folds, we can possibly determine its functionality. In the past, protein structures were obtained by X-ray diffraction or nuclear magnetic resonance but both methods have technical limitation. Thus, protein structure prediction has played a major role in the field of biomedical science. The major difficulty in developing the methods of protein structure prediction consists in the selection of the protein backbone template especially when there is no homology protein to the query protein sequence. If there is relatively low amino acid sequence similarity (usually from 20% to 30%) between an unknown sequence and its remote homologue in the protein database, a protein structure prediction method of sequence-structure alignment, namely threading, will be extensively used. Conventional protein threading methods are based on amino acid sequences and often exploit the fact that different amino acid types have different preferences for occupying different structural environments and spatial proximity. However, there exists little relationship between protein primary structure and tertiary structure. Accordingly, the threading research challenge is only partially met at the present time. In this study, we proposed a novel method — protein threading based on alphabet code encoded sequence. Alphabet codes, derived from our former researches, were created by clustering some specific conserved quadripeptides into 30 clusters. These codes, which naturally connect sequence to structure, are endowed with more structural information than amino acid ones. We picked up 945 fold representatives from the SCOP 1.69 database as our template library. After randomly choosing our training data and creating corresponding scoring matrix for each energy term, we could measure which template was compatible with the input alphabet code encode sequence. To reduce our search space and time complexity, we also introduced an idea of finding several fixed positions, which is to some extent like finding common structurally-aligned motifs between two proteins. Although our preliminary result was devoid of convincing performance, we still made a study of the application of alphabet code encoded sequence to protein threading. Compared with the famous threading server — GenTHREADER, our result was less reliable because of the fewer number of core templates. Some test data performances also suggested the importance of finding the appropriate weight of each energy term and the suitable corresponding weights for our energy function may vary with different folds. In spite of this, our method is still a breakthrough utilizing alphabet code encoded sequence for protein threading. We also illustrated that a high premium could be placed on the unique characteristic of alphabet code encoded sequence — fast-searching sequence level and 3D-rich structure information. Therefore, we anticipate the concept of alphabet code encoded sequence be applied to all aspects of protein structure prediction and even the field of structure bioinformatics.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T03:13:52Z (GMT). No. of bitstreams: 1 ntu-95-R91548047-1.pdf: 808608 bytes, checksum: 1683a26b35df6288379a530ed3cfc6cb (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	第一章序論 1 1.1 問題背景與動機 1 1.2研究目的 3 1.3 論文架構 5 第二章文獻回顧 6 2.1 結構相似度文獻探討 6 2.1.1 先期研究流程 6 2.1.2 先期研究成果 9 2.2 蛋白質結構預測的方法 11 2.2.1 同源模擬法（homology modeling） 11 2.2.2 從頭開始的方法（Ab initio Method） 13 2.3 蛋白質結構預測之穿針引線法 14 2.3.1 二維穿針引線法（2-D threading） 20 2.3.2 三維穿針引線法（3-D threading） 20 2.4 傳統穿針引線法的適用性及優缺點 21 2.4.1 穿針引線相關演算法的比較 21 2.4.2 本論文計畫探討 23 第三章研究材料與方法 24 3.1 研究材料 24 3.2 研究方法 24 3.2.1 研究流程 24 3.2.2 建構評估函數（evaluation function） 25 3.2.2 找尋最佳結構序列比對（optimal threading） 29 3.2.2 正確評估，並選取模板候選骨架（candidate template） 33 第四章實驗結果與討論 35 4.1 實驗材料介紹 35 4.2 實驗結果 37 4.2.1 中心模板資料庫測試資料的結果 37 4.2.2 超級家族（superfamily）測試資料的結果 38 4.3 實驗結果討論 41 第五章結論與未來研究方向 45 5.1 結論 45 5.2 未來研究方向 46 參考文獻 48
dc.language.iso	zh-TW
dc.subject	Fixed Positions	en
dc.subject	Protein Structure Prediction	en
dc.subject	Threading	en
dc.subject	Template Library	en
dc.subject	Amino Acid Sequence	en
dc.subject	Alphabet Code Encoded Sequence	en
dc.title	區域結構碼序列在蛋白質穿針引線法上的應用	zh_TW
dc.title	Application of alphabet code encoded sequence to protein threading	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃乾綱,陳倩瑜
dc.subject.keyword	蛋白質結構預測,穿針引線法,模板資料庫,胺基酸序列,區域結構碼序列,定位點,	zh_TW
dc.subject.keyword	Protein Structure Prediction,Threading,Template Library,Amino Acid Sequence,Alphabet Code Encoded Sequence,Fixed Positions,	en
dc.relation.page	51
dc.rights.note	有償授權
dc.date.accepted	2006-08-08
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	醫學工程學研究所	zh_TW
顯示於系所單位：	醫學工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	789.66 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。