請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76837| 標題: | 基於局部序列比對的蛋白質序列比對演算法之硬體加速器設計與功能註釋自動化研究 BLAST-Based Protein Sequence Alignment: Hardware Accelerator Design and Automatic Function Annotation Studies |
| 作者: | Yu-Cheng Li 李育誠 |
| 指導教授: | 盧奕璋(Yi-Chang Lu) |
| 關鍵字: | 基本局部比對搜索工具,現場可程式化邏輯閘陣列,硬體加速器,平行架構,蛋白質序列比對,史密斯–沃特曼演算法,自動化功能註釋, Basic Local Alignment Search Tool (BLAST),Field Programmable Gate Array (FPGA),hardware accelerator,parallel architecture,protein sequence alignment,Smith–Waterman algorithm,automatic function annotation, |
| 出版年 : | 2020 |
| 學位: | 博士 |
| 摘要: | 在這個研究中,我們設計了一個硬體加速器,而此加速器可應用於一個廣泛被使用的序列比對演算法,蛋白質序列的基本局部比對搜索工具(Basic Local Alignment Search Tool for proteins,BLASTP。)我們提出的硬體加速器結構中包括了五個階段:一個新的基於脈動陣列之一次相似搜索、一個創新的基於RAM-REG的搜索兩次相似搜索、無空格之延伸擴展、更快的含空格延伸擴展,以及高效的平行計算排序器。我們提出的系統在Altera Stratix V FPGA上實現,其處理速度可達每秒500千兆個單元更新(GCUPS)。此系統可以接收查詢序列,並將其與數據庫中的序列進行比較,並生成一個列表,該列表可根據查詢序列和資料庫序列之間的相似性分數,以降序排列輸出。而且,它能夠一次處理包含序列長度長達8192個氨基酸殘基的查詢序列和資料序列。最後,使用來自美國國家生物技術信息中心(National Center for Biotechnology Information, NCBI)數據庫的資料為例,根據我們實驗數據顯示,與配置144GB記憶體及8線程Intel Xeon處理器上的BLASTP軟體所需的運行時間相比,使用我們的硬體加速器可以將速度提高3倍以上。 在自動化蛋白質功能註釋研究中,我們對使用序列比對方法來自動註釋的預測性能感興趣。基於同源性轉移特性,我們通過分析新序列與先前註釋序列之間的相似性來預測未註釋的新序列的蛋白質功能。最直接、最容易獲得的基於同源性轉移方法是序列比對。但是,其準確性隨其註釋關鍵字的不同而有差異。為了評估採用序列比對方法的預測可靠性,我們在SWISS-Prot數據庫中採用了10倍交叉驗證測試。我們比較了馬修斯相關係數(Matthews correlation coefficient)、靈敏度和精確度,並使用不同參數設置在BLASTP和PSI-BLAST中。正如本文所示的結果,在結構域(domain),配體(ligend),分子功能(molecular function),生物學過程(biological process),細胞成分(cellular componnent)和轉譯後修飾(PTM)這些類別中,其關鍵詞可以可靠地用於預測,而其他關鍵詞則不太可靠。 ,In this study, we design a hardware accelerator for a widely used sequence alignment algorithm, the basic local alignment search tool for proteins (BLASTP). The architecture of the proposed accelerator consists of five stages: a new systolic-array-based one-hit finding stage, a novel RAM-REG-based two-hit finding stage, a refined ungapped extension stage, a faster gapped extension stage, and a highly efficient parallel sorter. The system is implemented on an Altera Stratix V FPGA with a processing speed of more than 500 giga cell updates per second (GCUPS). It can receive a query sequence, compare it with the sequences in the database, and generate a list sorted in descending order of the similarity scores between the query sequence and the subject sequences. Moreover, it is capable of processing both q3uery and subject protein sequences comprising as many as 8192 amino acid residues in a single pass. Using data from the National Center for Biotechnology Information (NCBI) database, we show that a speed-up of more than 3X can be achieved with our hardware compared to the runtime required by BLASTP software on an 8-thread Intel Xeon CPU with 144 GB DRAM. In the automatic protein function annotation study, we are interested in the prediction performance while using sequence alignment methods to annotate automatically. Homology-based transfer is frequently used to predict protein functions of unannotated sequences through similarity analysis between the target and previously annotated sequences. The most direct and accessible homology-based transfer approach is sequence alignment. To assess the reliability of alignment-based prediction, we applied a 10-fold cross-validation test in SWISS-Prot database. We compared Matthews correlation coefficient, sensitivity, as well as precision, and examined different parameter settings used in the alignment-based methods, with BLASTP and PSI-BLAST. As the results shown in this study, in the categories of domain, ligand, molecular function, biological process, cellular component, and PTM, the keywords can be confidently used for protein function predictions, whereas the others are less reliable. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76837 |
| DOI: | 10.6342/NTU202003424 |
| 全文授權: | 未授權 |
| 顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1408202015085800.pdf 未授權公開取用 | 2.32 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
