Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 醫學工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35032
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳中明
dc.contributor.authorCheng-Wei Chenen
dc.contributor.author陳政偉zh_TW
dc.date.accessioned2021-06-13T06:39:14Z-
dc.date.available2005-08-09
dc.date.copyright2005-08-09
dc.date.issued2005
dc.date.submitted2005-08-05
dc.identifier.citation[1]. Akutsu T. (1994). Efficient and robust three-dimensional pattern matching algorithms using hashing and dynamic programming techniques. Proc. 27th Hawaii International Conference on System Sciences. 5, 225-234.
[2]. Schwartz J.T. and Sharir M. (1987). Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Int. J. Rob. Res. 6, 29-44
[3]. 宋大辰. (1992). 蛋白質局部重複性結構之分析—以EM為輔助之群聚演算法. 碩士論文,國立台灣大學醫學工程學研究所
[4]. Tendulkar A.V., Wangikar P. P., Sohoni M.A., Samant V.V. and Mone C.Y. (2003). Parameterizationand classification of the protein universe via geometric techniques. J. Mol. Biol. 334, 157–172
[5]. Tendulkar, A. V., Joshi, A. A., Sohoni, M. A. and Wangikar, P. P. (2004). Clustering of Protein Structural Fragments Reveals Modular Building Block Approach of Nature. J. Mol. Biol. 338, 611–629
[6]. Bystroff C, Baker D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs, J. Mol. Biol. 281, 565- 577
[7]. Oliva B, Bates P.A., Querol E, Aviles F.X. & Sternberg M.J.E. (1997). An automated classification of the structure of protein loops. J. Mol. Biol. 266, 814-830
[8]. Micheletti C, Seno F, and Martin A. (2000). Recurrent oligomers in proteins: an optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies. Proteins: Structure, Function, and Genetics 40, 662-674
[9]. Hunter CG and Subramaniam S. (2003). Protein local structure prediction from sequence. Proteins: Structure, Function, and Genetics 50, 572-579
[10]. Balaza B., Janos A. and Balazs F. Fuzzy clustering and data analysis toolbox. Department of Process Engerineering University of Veszprem.
[11]. Kolodny R., Levitt M.(2003). Protein Decoy Assembly Using Short Fragments Under Geometric Constraints. Biopolymers 68(3), 278-285
[12]. Arun K.S., Huang T.S., Blostein S.D. (1987). Least-Squares Fitting of Two 3-D Point Sets. IEEE Transactions on Pattern Analysis and Machine Intelilgence(PAMI)9, 698-700
[13]. Coutsias E.A., Seok C. and Dill K.A. (2004). Using quaternions to calculate RMSD. J. comput. Chem. 25, 1849-1857
[14]. Unger R., Harel D., Wherland S. and Sussman J.L. (1989). A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5, 355-373
[15]. Rooman M.J., Rodriguez J., Wodak S.J. (1990). Automatic definition of recurrent local structure motifs in proteins. J. Mol. Biol. 213(2), 327- 336
[16]. Bezdek J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum
[17]. Babuska R.,van der Veen P.J. and Kaymak U. (2002). Improved covariance estimation for Gustafson-Kessel clustering. IEEE International Conference on Fuzzy Systems, 1081-1085
[18]. Gath I. and Geva A.B. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 7, 773-778
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35032-
dc.description.abstract21世紀初,人類基因體定序完成,大量的基因序列資料出現,使得傳統醫學得以運用新的生物資訊觀點切入,利用資訊科學輔助資料分析使得生物醫學的研究能夠更具正確性與安全性;然而,研究基因層級對於醫療方法的實用性並不高,真正參與生物性程序的往往是基因所表現的蛋白質,蛋白質利用其結構上一定的構形,致使有其特定的功能,一般相信結構和功能間有密切的關係,因此生物資訊學者常歸納結構相似的蛋白質結構,將這些推論可能擁有相似功能的蛋白質結構一起做深入的探討。
在生物學上的觀點上,許多蛋白質結構間具有一些特定的保留結構,相關研究中指出這些保留結構通常具有特定的功能,因此也可以解釋不同的蛋白質,可以擁有相同或相似的功能;因此利用保留結構的觀點,我們希望可以在蛋白質結構的資料庫中,找出特定的重複性保留結構、歸納這些保留結構間相互的差異與替代關係;更進一步,我們還可以將這些重複性的結構進行編碼,將新產生的編碼套用到蛋白質鏈上以取代傳統的氨基酸序列,我們將新的序列命名為區域結構碼序列,因此,新的區域結構碼序列使得蛋白質鏈的序列得以更具有結構上的意義,增進結構與序列之間的關聯性。
本論文中,我們從蛋白質子資料庫PDB-REPREDB以15%序列相似度取出1738個蛋白質鏈,以4個殘基(4 mer)等長度切割蛋白質鏈而成四元片段的片段資料庫,再利用數個四元片段間的幾何特徵,用fuzzy c means clustering將四元片段分成30個分群,並且在初步分群後再經過優化過程而得最後結果並給予30個分群編碼而成區域結構碼(alphabet code)。為了驗證區域結構碼的觀點與分群結果正確,我們再透過兩個個案研究,個案一取出兩個序列相似度不高但結構相似的蛋白質鏈,個案二為著名的結構分類資料庫SCOP families中的16個蛋白質鏈,藉此探討此兩個案區域結構碼與結構相似度的關係,最終的研究結果顯示區域結構碼序列的確可以有效並正確表現結構之間相似度的關係。
zh_TW
dc.description.abstractAt the beginning of 21st century, the Human Genome Project (HGP) has completed sequencing of human genome. The huge amount of genomic sequence data has revolutionized the studies of conventional medical science from the viewpoint of bioinformatics. The safety and correctness of the studies of medical science has been greatly improved by analyzing these data with the aid of computer science. However, researches in genomic level are potentially less practical than those in protein level in terms of further applications to clinical uses. It is because what actually participate in biological processes are mainly proteins. It is commonly believed that protein structures are highly correlated with protein functions. Generally speaking, proteins of the similar functions usually have the similar structures. Biologists, thereby, often cluster similar structures together and infer a function from these similar structures.
Many protein structures share some specific conserved structures. It has been shown in many researches that these conserved structures exhibit some particular functions. With the concept of conserved structures, we aim to find out repeated conserved structures from protein structure database and analyze the substitutional relations among them. Furthermore, we can encode these repeated conserved structures. These new codes are endowed with more structural information than the amino acid codes. We name these new codes - alphabet codes, which naturally connect sequence to structure.
In this study, we picked 1738 protein chains form protein structural database-PDB-REPRDB. All protein chains were decomposed into 4-mer fragments in a overlapping fashion, each of which is called a “quadripeptide”. Using the geometrical properties of these quadripeptides, we clustered them into 30 clusters with fuzzy c means clustering algorithm, refined the results of clusters, and encoded clusters into 30 different alphabet codes. Two case studies have been carried out to verify the effect of clustering and alphabet codes. In Case 1, we picked two protein chains which are similar in structures but different in amino acid sequences. In case 2, we picked 16 protein chains from a family of SCOP database. The results suggested that alphabet codes can characterize the structural similarity between two protein chains more effectively and informatively than the amino acid codes.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T06:39:14Z (GMT). No. of bitstreams: 1
ntu-94-R92548027-1.pdf: 2277119 bytes, checksum: 464cc88e0b977e851c0e596438783e0a (MD5)
Previous issue date: 2005
en
dc.description.tableofcontents第一章 序 論 1
1.1 問題背景與動機 1
1.2 研究目的 3
1.3 論文架構 4
第二章 文獻回顧 5
2.1 結構相似度文獻探討 5
2.1.1 幾何雜湊法(Geometric hashing) 5
2.1.2 座標均方根誤差 ( cRMSD ) 8
2.2 結構分群演算法文獻探討 11
2.2.1 期望-最大演算法( EM clustering ) 11
2.2.2 幾何不變量演算法(Geometric invariant) 14
2.2.3 其他相關演算法與分群研究比較 17
2.3 分群演算法與本論文計畫探討 18
第三章 研究材料與方法 19
3.1 研究材料 19
3.2 研究方法 21
3.2.1 研究流程 21
3.2.2 選擇特徵(feature selection) 22
3.2.3 分群方法的選擇 28
<3.2.3.1> Hard clustering methods 28
<3.2.2.2> K-means clustering 30
<3.2.2.3> Fuzzy clustering methods 32
<3.2.2.4> Fuzzy c means clustering (FCM) 33
3.2.4驗證(Validation) 37
3.2.5視覺化分群成果 (Visiualization) 38
3.2.6再優化 (Refinement) 40
3.2.7 建立替代矩陣 (Substitution matrix) 41
第四章 實驗結果與討論 43
4.1 實驗材料介紹 43
4.2 實驗結果 45
4.2.1 隨機抽取訓練資料 45
4.2.2 測試資料分群 49
4.2.3 分群代表中心 50
4.2.4 建立替代矩陣 54
4.3 實驗結果討論 57
<個案討論 一> 57
<個案討論 二> 60
第五章 結論與未來研究方向 62
5.1 結論 62
5.2 未來研究方向 64
參考文獻 66
附 錄 68
dc.language.isozh-TW
dc.subject模糊分群zh_TW
dc.subject蛋白質結構zh_TW
dc.subject重複性保留結構zh_TW
dc.subject氨基酸序列zh_TW
dc.subject區域結構碼序列zh_TW
dc.subjectFuzzy clusteringen
dc.subjectAlphabet Codes Sequenceen
dc.subjectAmino Acid Sequenceen
dc.subjectRepeated Conserved Structuresen
dc.subjectProtein Structureen
dc.title蛋白質區域保留結構片段之分群編碼研究zh_TW
dc.titleAmino acid Fragment encoding for structural preserving clusteringen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee黃鎮剛,陳倩瑜
dc.subject.keyword蛋白質結構,重複性保留結構,氨基酸序列,區域結構碼序列,模糊分群,zh_TW
dc.subject.keywordProtein Structure,Repeated Conserved Structures,Amino Acid Sequence,Alphabet Codes Sequence,Fuzzy clustering,en
dc.relation.page77
dc.rights.note有償授權
dc.date.accepted2005-08-08
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept醫學工程學研究所zh_TW
顯示於系所單位:醫學工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
2.22 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved