三維蛋白質模型檢索: 基於功能口袋之建構與比對

Jeng-Sheng Yeh; 葉正聖

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35071

標題:	三維蛋白質模型檢索: 基於功能口袋之建構與比對 3D Protein Retrieval Based on Pocket Modeling and Matching
作者:	Jeng-Sheng Yeh 葉正聖
指導教授:	歐陽明(Ming Ouhyoung)
共同指導教授:	陳炳宇(Bing-Yu Chen)
關鍵字:	電腦圖學,三維模型檢索,蛋白質檢索,蛋白質功能區域,蛋白質功能口袋,生物資訊,生物圖學,多視角 Zernike moments,Spin-images, computer graphics,3D model retrieval,protein retrieval,protein function sites,minimal binding surface,bioinformatics,bio graphics,multi-view Zernike moments,spin-images,
出版年 :	2005
學位:	博士
摘要:	中文摘要本論文提出了一個可以對三維蛋白質結構做部分比對的架構。我們使用這個架構來實作出一個三維蛋白質檢索系統來篩選相似蛋白質。在這個系統中，對於未知功能的蛋白質結構，我們的系統有可能經由找出已知功能的相似蛋白質結構們，來提供可能的功能建議，或是可能的藥物結合資訊。本系統可以當成一個前端的篩子來過濾掉大部分不可能的蛋白質而提供要再更進一步精確比對時的建議。我們的系統流程分為蛋白質功能口袋建構、比對、改進結果這三個階段來串接起來。因為我們了解到蛋白質的功能口袋 (binding pockets) 是在蛋白質作用時很重要的部分，所以我們第一步是試著將蛋白質的凹陷口袋找出來。我們使用一個給定半徑虛擬的圓球，在蛋白質的表面上全部滾過。如果這個虛擬的球內所含蛋白質表面的原子個數夠多的話，便暗示著這個在表面上的虛擬圓球很可能深埋在蛋白質裡面，這剛好是口袋的一個性質。我們便將這個球所在的圓心視為可能功能口袋的候選人。在建構功能口袋之後，下一步便是做比對的部分。我們實作了兩種不同的比對方式：Multi-View Zernike Moments 及 Spin Images。Multi-View Zernike Moments 可以經由在很多不同的視角下面比對視覺上的相似度，比對找出在整體上的相似形狀。Spin Images 則是使用不同的表面參考點及其對映切平面來當參考座標系統，可以不受任意空間旋轉角度的影響，達成小區域的表面比對。這兩個比對方式都能在不同旋轉角度下實現三維形狀的比對，可用來幫助我們比對三維蛋白質結構。所以，當我們想要預測一個未知功能的三維蛋白質結構的功能時，我們可以在找出它的功能口袋後，利用上述兩種不同的比對方法來從已知資料庫中找出相似形狀的蛋白質。我們的方法能夠在合理的時間內比對出相似的三維蛋白質、受體或是可結合之化學分子，所以可以幫助生化或生物領域的專家們能夠利用我們的方法當成前置篩選過濾工具，可再和其他工具結合，經由有用的形狀相似資訊來做未知蛋白質功能預測、可能功能區域及藥物結合建議等。成果部分，首先我們有一個包含所有 PDB 和 FSSP 的三維蛋白質檢索系統。在方便使用的網頁介面下，我們使用多角度視覺相似度的比對方式來比對蛋白質結構。結果顯示在少於三秒的時間下可以在網路上回傳結果，而且每次檢索的正確率約有九成。其次，關於另外一個比較困難的問題，是要找出可能的受體結合位置 (receptor sites) 及其對映的結合化學分子 (ligands) 或抑制物 (inhibitors)。我們的系統有一些比較初步的成果，包括 (a) 在超過一百個不同蛋白質可以比對下，經過約七十分鐘，可以找出與查詢receptor site 相似蛋白質的receptor sites。 (b) 在二十組 inhibitors/ligands 中，經過 17 分鐘，我們可能夠找到可能的結合位置，平均每組需要花 50 秒鐘。其中機器使用 Pentium IV 2.4 GHz PC。關於正確率部分，前面 (a)的比較在107個蛋白質候選人中 (i) 若只看排名最前面的蛋白質，正確率可達到68%；(ii) 若看排名前五名的蛋白質，正確率可達到 95%。再來 (b)的比較只有先做小案例探論，還需要做更進一步的探討。 ABSTRACT A framework for matching the partial surface data of three-dimensional (3D) protein structures is proposed. We use this framework to build a retrieval system for 3D structure of proteins. With this system as a filter, suggestions for its functions or corresponding binding drugs can be provided with the known proteins of similar shapes in our database as a front-end filter to reduce the search space for more accurate search by other methods. The pipeline of our system has three stages: pocket modeling, matching, and refinement. First we extract the possible binding pockets of proteins and model them since the binding pockets are the active sites in protein-protein or protein-ligand interaction. We use the “Sphere Coverage” method to retrieve the binding pockets, that is, we use a virtual sphere to first roll along the solvent-accessible surface, and then if there is more than 50 percent of space filled by atoms of proteins, it suggests that the virtual spheres should be nearby the concave parts of proteins. Furthermore, after constructing the 3D models of binding pockets, we implemented two algorithms for matching: the multi-view Zernike moments and the spin images. The multi-view Zernike moments can match the global shape by visual similarity in many different viewing directions. The spin images can find local surface features in rotation invariant way with respect to a reference point and its corresponding surface normal. All those two rotation invariant methods can help our matching of 3D protein data. In our experiments, given an unknown 3D protein, by extracting and modeling its possible binding pockets, we can use the above two methods to retrieve similar proteins from the database. Since our method can match two 3D proteins, their receptors and ligands in a reasonably short time as a preliminary filter, it will benefit biochemists and biologists with very useful information in function prediction, in terms of possible functional sites of unknown proteins or suggestions for drug binding. First, a web-based 3D protein retrieval system is available for protein structure data including all PDB and FSSP database. In this system, we use a visual-based matching method to compare the protein structure from multiple viewpoints. It takes less than three seconds for each query with 90 percents accuracy on an average. Secondly, for the more difficult problem of finding possible receptor sites and its corresponding inhibitors, our system has the preliminary results using a 2.4 G Hz Pentium IV PC that (a) within 70 minutes, a query receptor site can be used to retrieve possible proteins that also have similar receptor sites from 107 different proteins. (b) Within 17 minutes, a given receptor site used as a query can retrieve a possible inhibitor/ligand that may fit into this given receptor site, out of 20 possible inhibitors/ligands, where each receptor/ligand pair takes about 50 seconds to compute. The rate of precision for experiment (a) above with a database of 107 candidates is (i) 68% for the top rank retrieved results, and (ii) 95% for top five ranked retrieved results, that is, the correct answer is one of the top five candidates. For experiment (b) above, only case studies are done, and formal experiments need to be conducted yet.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35071
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	5.25 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。