請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/718
標題: | 以核苷酸k聚體頻度分類序列 Sequence Classification Based on k-mer Frequencies |
作者: | Hung-Yu Chen 陳泓宇 |
指導教授: | 趙坤茂(Kun-Mao Chao) |
關鍵字: | 序列分類,環境基因體學,基因體學,k聚體,免序列比對,序列特徵,演算法, sequence classification,metagenomics,genomics,k-mer,alignment-free,sequence signature,algorithm, |
出版年 : | 2019 |
學位: | 碩士 |
摘要: | 序列分類在計算生物學的許多研究中是一個在研究初期就需要解決之問題,有許多方法被研發出來計算此問題,但隨著高通量定序技術的發展,需要計算的資料量也大幅增加,導致許多現有方法已無法在能取得的計算資源及可接受的時間內完成計算。以核苷酸k聚體為基礎的演算法就是其中一種,目前已有不少方法可以快速且準確的完成分類,但卻需要大量的計算空間,因此無法在一般個人電腦中完成計算。
在本篇論文中,我們提出一個以核苷酸k聚體為基礎的演算法,在時間上與現有方法相當,在空間上則避免現有方法中儲存上的冗餘性而做出改善。為進一步降低所需記憶體空間,我們提出一個分割架構,此架構除了可以減少所需空間,也適合平行化以縮短計算所需時間。 Sequence classification is a preliminary step in many researches of computational biology. There are a variety of methods proposed to compute this problem. However, with the development of high-throughput sequencing technologies, the datasets of sequencing data are getting much larger. As a result, many existing methods cannot accomplish this task with limited computational resource and acceptable time. The k-mer based algorithms are some of these methods. Most of them could finish the classification fast and accurately, but they need large computational space, which is not available in common personal computers. In this thesis, we propose a k-mer based algorithm. The time complexity of our algorithm is comparable to those of the existing methods, while we make an improvement in space usage by avoiding the redundancy of storing the k-mers. To further reduce the memory usage, we propose a partitioning strategy. In addition to the reduction in memory usage, the algorithm under this partitioning structure can be highly parallelized to improve performance. |
URI: | http://tdr.lib.ntu.edu.tw/handle/123456789/718 |
DOI: | 10.6342/NTU201902038 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 生醫電子與資訊學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf | 1.99 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。