應用比較基因體學於尋找酵素演化趨勢之研究

Sukanya Manna; 馬舒雅

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/23932

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉長遠(Cheng-Yuan Liou)
dc.contributor.author	Sukanya Manna	en
dc.contributor.author	馬舒雅	zh_TW
dc.date.accessioned	2021-06-08T05:12:42Z	-
dc.date.copyright	2006-07-27
dc.date.issued	2006
dc.date.submitted	2006-07-18
dc.identifier.citation	[1] M. Nei and R. K. Koehn, Evolution of Genes and Proteins, Sinauer Associates Inc., Sunderland, Massachussets, 1983. [2] V. Spirin and L. A. Mirny, Protein Complexes and Functional Modules in Molecular Networks, Proc. Natl. Acad. Sci., USA, vol. 100, no. 21, pp. 12123-12128, 2003. [3] K. Humphreys, G. Demetriou, and R. Gaixauskas, Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures, Pac. Symp. Biocomput., pp. 505-516, 2000. [4] R. C. Hardison, Coparative Genomics, PLOS Biology, vol. 1, no. 2, pp. 156-160, 2003. [5] S. Lawrence and C. L. Giles, Accessibility of Information on the Web, Nature, vol. 400, no. 6740, pp. 107-109, 1999. [6] D. L. Nelson and M. M. Cox, Lehninger Principles of Biochemistry, Worth, 2000. [7] M. D. Adams et al., The Genome Sequence of Drosophila melanogaster, Science, vol. 287, no. 5461, pp. 2185-2195, 2000. [8] A. Go¤eau, Life with 6000 Genes, Science, vol. 274, no. 546, pp. 563-567, 1996. [9] M. Ashburner et al., Gene Ontology: Tool for Uni…cation of Biology. The Gene Ontology Consortium, Nature Genet., vol. 25, pp. 25-29, 2000. [10] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray, Science, vol. 270, no. 5235, pp. 467-470, 1995. [11] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, Cluster Analysis and Display of Genome-Wide Expression Patterns, Proc. Natl. Acad. Sci, USA, vol. 95, pp. 14863-14868, 1998. 47 [12] M. A. Andrade and P. Bork, Automated Extraction of Information in Molecular Biology, FEBS lett., vol. 476, pp. 12-17, 2000. [13] C. Blaschke, M. A. Andrade, C. Ouzounis, and A. Valencis, Automated Extraction of Bi- logical Information from Scienti…c Text: Protein-Protein Interactions, Intelligent Systems for Molecular Biology, pp. 60-67, 1999. [14] J. Thomas, D. Milward, C. Quzounis, S. Pulman, and M. Carroll, Automatic Extraction of Protein Interactions from Scienti…c Abstracts, Pac. Symp. Biocomput., vol. 5, pp. 541-552, 2000. [15] T. Sekimizu, H. S. Park, and J. Tsujii, Identifying the Interaction between Genes and Gene Products Based on Frequency of Seen Verbs in Medicine Abstracts, Genome Informatics Work- shop, pp. 62-71, 1998. [16] M. Craven and J. Kumlien, Constructing Biological Knowlwdge Bases by Extracting Informa- tion from Text Sources, Intelligent Systems for Molecular Biology, pp. 77-86, 1999. [17] T. C. Rind‡esch, J. V. Rayan, and L. Hunter, Extracting Molecular Binding Relationships from Biomedical Texts, Applied Natural Language Processing and the North American Chapter of the Association for Computational Lingustics, pp. 188-195, 2000. [18] T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter, EDGAR: Extraction of Drugs, Genes, and Relations from Biomedical Literature, Pac. Symp. Biocomput., vol. 5, pp. 517-528, 2000. [19] D. Proux, F. Rechenmann, L. Julliard, V. Pillet, and B. Jacq, Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction, Genome Informatics Workshop, pp. 71-80, 1998. [20] K. Fukada, T. Tsunoda, A. Tamura, and T. Takagi, Toward Information Extraction: Iden- tifying Protein Names from Biological Papers, Pac. Symp. Biocomput., vol. 3, pp. 705-716, 1998. [21] M. A. Andrade and A. Valencia, Automatic Extraction for Biological Sequences by Extrac- tion of Keywords from MEDLINE Abstracts. Development of a Prototype System, Intelligent Systems for Molecular Biology, pp. 25-32, 1997. [22] B. J. Stapley and G. Benoit, Bibliometrics: Information Retrieval and Visualization from Co- occurrences of Gene Names in Medline Abstracts, Pac. Symp. Biocomput., vol. 5, pp. 529-540, 2000. 48 [23] H. Shatkay, S. Edwards, W. Wilbur, and M. Boguski, Genes, Themes and Microarrays: Using Information Retrieval for Large Scale Gene Analysis, Intelligent Systems for Molecular Biology, pp. 317-328, 2000. [24] T. K. Jenssen, A. Lægreid, J. Komorowski, and E. Hovig, A Literature Network of Human Genes for High Throughput Analysis of Gene Expression, Nature Genet., vol. 28, 2001. [25] L. H. Hartwell, J. J. Hop…eld, S. Leibler, and A. Murray, From Molecular to Modular Cell Biology, Nature, vol. 402, supple., pp. C47-C52, 1999. [26] M. Kimura, The Neutral Theory of Molecular Evolution, Cambridge University Press, Cam- bridge, 1983. [27] M. Kimura, Evolutionary Rate at the Molecular Level, Nature, vol. 217, pp. 624-626, 1968. [28] J. H. McDonald and M. Kreitman, Adaptive Protein Evolution at the Adh Locus in Drosophila, Nature, vol. 351, pp. 652-654, 1991. [29] J. H. Gillespie, The Causes of Molecular Evolution, Oxford University Press, Oxford, 1991. [30] S. Easteal and C. Collet, Consistent Variation in Amino Acid Substitution Rate, despite Uni- formity of Mutation Rate: Protein Evolution in Mammal is not Neutral, Mol. Biol. Evol., vol. 11, pp. 643-647, 1994. [31] A. Eyre-Walker and B. S. Gaut, Correlated Rates of Synonymous Site Evolution across Plant Genomes, Mol. Biol. Evol., vol. 14, pp. 455-460, 1997. [32] T. Miyata and T. yasunaga, Molecular Evolution of mRNA: A Method for Estimating Evo- lutionary Rates of Synonymous and Amino-Acid Substitutions from Homologous Nucleotide Sequences and its Applications, J. Mol. Evol., vol. 16, pp. 23-36, 1980. [33] W. -H. Li, C. -I.Wu, and C. -C. Luo, A New Method for Estimating Sysnonymous and Nonsyn- onymous Rates of Nucleotide Substitutions considering the Relative Liklihood of Nucleotide Codon Changes, Mol. Biol. Evol., vol. 2, pp. 150-174, 1985. [34] M. Nei and T. Gojobori, Simple Methods for Estimating the Number of Synonymous and Nonsynonymous Nucleotide Substitutions, Mol. Biol. Evol., vol. 3, pp. 418-426, 1986. [35] W. -H. Li, Unbiased Estimation of the Rates of Synonymous and Nonsynonymous substitutions, J. Mol. Evol., vol. 36, pp. 96-99, 1993. [36] P. Pamilo and N. O. Bianchi, Evolution of the Zfx and Zfy Genes- Rates and Interdependence between then Genes, Mol. Biol. Evol., vol. 41, pp. 1152-1159, 1995. 49 [37] J. M. Cameron, A Method for Estimating the Numbers of Synonymous and Nonsynonymous Substitutions Per Site, J. Mol. Evol., vol. 41, pp. 1152-1159, 1995. [38] R. B. Huey, Phylogeny, History and the Comparative Method, New Directions in Ecological Physiology, Cambridge University Press, pp. 76-198, 1987. [39] T. -K. Jenssen, J. Komorowski, A. Lægreid, and E. Hovig, Pubgen: Discovering and Visualising Gene-Gene Relations, Currents in Computational Molecular Biology, pp. 49-49, 2000. [40] T. H. Jukes and C. R. Cantor, Evolution of Protein Molecules, Mammalian Protein Metabolism, Academic Press, New York, 1969. [41] R. Holmquist, Theoritical Foundations for a Quantitative Approach to Paleogenetics. I: DNA, J. Mol. Evol., vol. 1, pp. 115-133, 1972. [42] M. Kimura, A SImple Method for Estimating Evolutionary Rate of Base Substitution through Comparative Studies of Nucleotide Sequences, J. Mol. Evol., vol. 16, pp. 111-120, 1980. [43] M. Kimura, Estimating Evolutionary Distances between Homologous Nucleotide Sequences, Proc. Natl. Acad. Sci., USA, vol. 78, pp. 454-458, 1981. [44] F. Perler and A. Efstratiadis, P. Lomedico, W. Gilbert, R. Kolodner, and J. Dodgeson, The Evolution of Genes: The Chicken Preproinsulin Gene, Cell, vol. 20, pp. 555-566, 1980. [45] F. C. Kafatos, A. Efstratiadis, and S. M.Weissman, Molecular Evolution of Human and Rabbit
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/23932	-
dc.description.abstract	此論文提出一系統化的方法從數位媒體中生物和電腦科學的知識擷取有用的資訊。此工作在基因的層次找出蛋白質酵素的演化趨勢。此外，本論文尚描繪出從數位典藏中有同現性地酵素彼此間的關係的生物發現。此處我們發展了 pseudo-reverse 機制來比較現存標準概念的蛋白質酵素的行為。關於核甘酸置換的比例，我們的研究強烈的建立在進化論的假設上；我們想要確定pseudo-reverse 的方法在進化論的假設下可以多接近正確的答案。我們在此使用了 Nei and Gojobori 一般化的標準模型來決定 nucleotide 的替代以及 Jukes and Cantor’s模型來計算其比率。我們也將可比較的基因組欽入此模型來計算物種世系，如人類，老鼠的蛋白質酵素。此調查研究預測酵素的突變相較之下比原來的蛋白質慢且人類和鼠類的酵素差異時間更比原來的蛋白質慢五倍之多，約四億年。在論文附錄的部份，我們描述了酵素互引的相關研究。這包含從現存的酵素數位文獻中自動擷取出隱含的和外顯的生藥知識。我們已經在此展示小規模的資料以便對在一般的數位資料庫像 CiteSeer 上有多少可得酵素資料有完整的 idea 。我們從 4950 對酵素的數位文件中，建立酵素對酵素的互引網路。此調查著重在 CiteSeer 這樣一般資料庫中的酵素研究資料，研究資料分為三個基本的狀態－－良好建立的，半熟的，及未知的。我們的目標非常的簡單，主要負責專注在同一文件中兩個酵素的關係，來找出生物資訊。我們由相關的參考文獻來驗證此概念，並發現此方法可以偵測由某些酵素引起或治療的疾病。甚至可以從文獻中得知關於酵素的詳細分子反應。此論文解釋了這些調查及其詳細的方法。	zh_TW
dc.description.abstract	This dissertation provides a systematic methodologies for the information retrieval from digital media incorporating the knowledge from both biology and computer science. Here the work is proceeded in the genomic level to find out the evolutionary trends of enzyme proteins. Besides this, this thesis also illustrates some biological findings of how the enzymes can be related with each other through co-occurrences in the digital literatures. Here, we developed a method of pseudo-reverse mechanism to compare the behaviour the enzyme proteins with the existing standard concepts. Our work is based on the strong assumption from the evolutionary theory, about the rates of nucleotide substitutions; we use this in the pseudo-reverse approach to verify how far it can be justified. We employed here the standard model of Nei and Gojobori in a generalized form for determining the nucleotide substitutions and Jukes and Cantor's model for finding out their rates. We also embedded the comparative genomics in this model to calculate the lineages among the species like human, mouse and rat for these enzyme proteins. We predicted from this study that the mutation for the enzymes are comparatively slower than ordinary proteins and the time of divergence for these enzymes with human and mouse or rat is almost five times more, around 400 Million years. In the Appendix part of this thesis, we described the study on the enzyme co-citations. This involves automated extraction of explicit and implicit biomedical knowledge of the existing works on enzymes from the digital documents. We have presented here the work on a small scale data-set so as to have an overall idea of the availability of these enzymes on a generic digital library like CiteSeer. We created enzyme-to-enzyme co-citation network from digital documents from 4950 pairs of enzymes. This study emphasizes three basic statuses of the enzyme studies on the generic database like CiteSeer -- some are well established, some are half cooked and others still now unknown and unclear. Our goal is very simple and it mainly responsible to focus on two enzyme relation in a document. We validated the concepts of this work with the related references and found that this approach can find ways to detect diseases, which are caused or cured by certain enzymes. Even it can help to get the detail underlying molecular reactions about enzymes from the literatures.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T05:12:42Z (GMT). No. of bitstreams: 1 ntu-95-R93922143-1.pdf: 568486 bytes, checksum: a97d3c984bd8729a491fceec4088d2c4 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Biological Background of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Enzymes and their Roles in Living Body . . . . . . . . . . . . . . . . . . . . . 3 1.3.2 Molecular Evolution Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Relation between DNA, RNA and Proteins . . . . . . . . . . . . . . . . . . . 8 1.3.4 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.5 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Pseudo-Reverse Approach in Genetic Evolution 18 2.1 Comparative Genomics and Evolutionary Studies . . . . . . . . . . . . . . . . . . . . 18 2.2 Nucleotide substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Nucleotide Substitution Model : Jukes and Cantor’s one parameter model . . 19 2.2.2 Number of substitution between two sequences . . . . . . . . . . . . . . . . . 23 2.2.3 Number of substitutions between two non-coding sequences . . . . . . . . . . 23 2.2.4 Number of substitution between two protein-coding sequences . . . . . . . . . 25 2.3 Rates of evolutionary changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Pseudo-Reverse approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.2 Generalized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.7 Simulations and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.2 Experimental Results and Observation . . . . . . . . . . . . . . . . . . . . . . 34 i 3 Conclusion 45 A Enzyme Co-citation : A Case Study with CiteSeer 52 A.1 Role of information extraction (IE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.2 Enzyme-enzyme co-occurrence concept . . . . . . . . . . . . . . . . . . . . . . . . . . 53 A.3 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 A.3.1 Reason for choosing CiteSeer over other databases . . . . . . . . . . . . . . . 54 A.3.2 Dataset and co-indexing of enzymes . . . . . . . . . . . . . . . . . . . . . . . 54 B Glossary 66
dc.language.iso	en
dc.title	應用比較基因體學於尋找酵素演化趨勢之研究	zh_TW
dc.title	Comparative Genomics in Determining the Evolutionary Trends of Enzymes	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	傅心家(Hsin-Chia Fu),呂學一(Hsueh-I Lu),趙坤茂(Kun-Mao Chao),吳建銘(Jiann-Ming Wu)
dc.subject.keyword	比較基因體學,酵素,演化,核甘酸替代,共同引述,	zh_TW
dc.subject.keyword	comparative genomics,enzymes,evolution,nucleotide substitutions,co-citation,	en
dc.relation.page	67
dc.rights.note	未授權
dc.date.accepted	2006-07-18
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 目前未授權公開取用	555.16 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。