基於 3D NAND 快閃記憶體近似字串比對架構之儲存內 DNA 序列比對系統

鄭又愷; You-Kai Zheng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87719

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲	zh_TW
dc.contributor.advisor	Chia-Lin Yang	en
dc.contributor.author	鄭又愷	zh_TW
dc.contributor.author	You-Kai Zheng	en
dc.date.accessioned	2023-07-19T16:05:48Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-07-19	-
dc.date.issued	2023	-
dc.date.submitted	2023-05-05	-
dc.identifier.citation	[1] G. S. Ginsburg and K. A. Phillips, “Precision Medicine: From Science To Value,” Health Affairs (Project Hope), vol. 37, no. 5, pp. 694–701, 2018. [2] G. S. Ginsburg and H. F. Willard, “Genomic and personalized medicine: Foundations and applications,” Translational Research: The Journal of Laboratory and Clinical Medicine, vol. 154, no. 6, pp. 277–287, 2009. [3] C. A. Gilchrist, S. D. Turner, M. F. Riley, W. A. Petri, and E. L. Hewlett, “Whole-genome sequencing in outbreak analysis,” Clinical Microbiology Reviews, vol. 28, no. 3, pp. 541–563, 2015. [4] J. S. Bloom, L. Sathe, C. Munugala, E. M. Jones, M. Gasperini, N. B. Lubock, F. Yarza, E. M. Thompson, K. M. Kovary, J. Park, D. Marquette, S. Kay, M. Lucas, T. Love, A. Sina Booeshaghi, O. F. Brandenberg, L. Guo, J. Boocock, M. Hochman, S. W. Simpkins, I. Lin, N. LaPierre, D. Hong, Y. Zhang, G. Oland, B. J. Choe, S. Chandrasekaran, E. E. Hilt, M. J. Butte, R. Damoiseaux, C. Kravit, A. R. Cooper, Y. Yin, L. Pachter, O. B. Garner, J. Flint, E. Eskin, C. Luo, S. Kosuri, L. Kruglyak, and V. A. Arboleda, “Massively scaled-up testing for SARS-CoV-2 RNA via next-generation sequencing of pooled and barcoded nasal and saliva samples,” Nature Biomedical Engineering, vol. 5, no. 7, pp. 657–665, 2021. [5] J. Gagan and E. M. Van Allen, “Next-generation sequencing to guide cancer therapy,” Genome Medicine, vol. 7, no. 1, p. 80, 2015. [6] H. Ellegren, “Genome sequencing and population genomics in non-model organisms,” Trends in Ecology & Evolution, vol. 29, no. 1, pp. 51–63, 2014. [7] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, and M. A. DePristo, “The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303, 2010. [8] R. Poplin, P.-C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger, J. Dijamco, N. Nguyen, P. T. Afshar, S. S. Gross, L. Dorfman, C. Y. McLean, and M. A. DePristo, “A universal SNP and small-indel variant caller using deep neural networks,” Nature Biotechnology, vol. 36, no. 10, pp. 983–987, 2018. [9] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195–197, 1981. [10] M. Alser, J. Lindegger, C. Firtina, N. Almadhoun, H. Mao, G. Singh, J. Gomez-Luna, and O. Mutlu, “Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis,” arXiv, 2022. [Online]. Available: https://arxiv.org/abs/2205.07957. [11] M. Alser, J. Rotman, D. Deshpande, K. Taraszka, H. Shi, P. I. Baykal, H. T. Yang, V. Xue, S. Knyazev, B. D. Singer, B. Balliu, D. Koslicki, P. Skums, A. Zelikovsky, C. Alkan, O. Mutlu, and S. Mangul, “Technology dictates algorithms: Recent developments in read alignment,” Genome Biology, vol. 22, no. 1, p. 249, 2021. [12] H. Li, “Minimap2: Pairwise alignment for nucleotide sequences,” Bioinformatics, vol. 34, no. 18, pp. 3094–3100, 2018. [13] G. Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming,” Journal of the ACM, vol. 46, no. 3, pp. 395–415, 1999. [14] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM,” arXiv, 2013. [Online]. Available: https://arxiv.org/abs/ 1303.3997. [15] A. Andoni and K. Onak, “Approximating edit distance in near-linear time,” in ACM Symposium on Theory of Computing, 2009, pp. 199–204. [16] D. Chakraborty, D. Das, E. Goldenberg, M. Koucky, and M. Saks, “Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time,” in 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), 2018, pp. 979–990. [17] J. Daily, “Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments,” BMC Bioinformatics, vol. 17, no. 1, p. 81, 2016. [18] E. J. Houtgast, V. Sima, K. Bertels, and Z. AlArs, “An Efficient GPU-Accelerated Implementation of Genomic Short Read Mapping with BWA-MEM,” ACM SIGARCH Computer Architecture News, vol. 44, no. 4, pp. 38–43, 2017. [19] T. Nishimura, J. L. Bordim, Y. Ito, and K. Nakano, “Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU,” in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017, pp. 932–941. [20] E. J. Houtgast, V.-M. Sima, K. Bertels, and Z. Al-Ars, “Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths,” Computational Biology and Chemistry, vol. 75, pp. 54–64, 2018. [21] H. Cheng, Y. Zhang, and Y. Xu, “BitMapper2: A GPU-Accelerated All-Mapper Based on the Sparse q-Gram Index,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 3, pp. 886–897, 2019. [22] Q. Aguado-Puig, S. Marco-Sola, J. C. Moure, C. Matzoros, D. Castells-Rufas, A. Espinosa, and M. Moreto, “WFA-GPU: Gap-affine pairwise alignment using GPUs,” bioRxiv, 2022. [Online]. Available: https://www.biorxiv.org/ content/10.1101/2022.04.18.488374v1. [23] Y.-T. Chen, J. Cong, J. Lei, and P. Wei, “A Novel High-Throughput Acceleration Engine for Read Alignment,” in IEEE International Symposium on Field-Programmable Custom Computing Machines, 2015, pp. 199–202. [24] X. Fei, Z. Dan, L. Lina, M. Xin, and Z. Chunlei, “FPGASW: Accelerating Large-Scale Smith-Waterman Sequence Alignment Application with Backtracking on FPGA Linear Systolic Array,” Interdisciplinary Sciences, Computational Life Sciences, vol. 10, no. 1, pp. 176–188, 2018. [25] E. Rucci, C. Garcia, G. Botella, A. De Giusti, M. Naiouf, and M. Prieto-Matias, “SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences,” BMC Systems Biology, vol. 12, no. 5, p. 96, 2018. [26] S. Marco-Sola, J. C. Moure, M. Moreto, and A. Espinosa, “Fast gap-affine pairwise alignment using the wavefront algorithm,” Bioinformatics, vol. 37, no. 4, pp. 456–463, 2021. [27] S. S. Banerjee, M. El-Hadedy, J. B. Lim, Z. T. Kalbarczyk, D. Chen, S. Lumetta, and R. K. Iyer, “ASAP: Accelerated Short-Read Alignment on Programmable Hardware,” IEEE Transactions on Computers, vol. 68, no. 3, pp. 331–346, 2019. [28] D. Fujiki, S. Wu, N. Ozog, K. Goliya, D. Blaauw, S. Narayanasamy, and R. Das, “SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 937–950. [29] Y.-L. Chen, B.-Y. Chang, C.-H. Yang, and T.-D. Chiueh, “A High-Throughput FPGA Accelerator for Short-Read Mapping of the Whole Human Genome,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 6, pp. 1465–1478, 2021. [30] A. Haghi, S. Marco-Sola, L. Alvarez, D. Diamantopoulos, C. Hagleitner, and M. Moreto, “An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment,” in International Conference on Field-Programmable Logic and Applications (FPL), 2021, pp. 151–159. [31] L. Li, J. Lin, and Z. Wang, “PipeBSW: A Two-Stage Pipeline Structure for Banded Smith-Waterman Algorithm on FPGA,” in IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2021, pp. 182–187. [32] D. Fujiki, A. Subramaniyan, T. Zhang, Y. Zeng, R. Das, D. Blaauw, and S. Narayanasamy, “GenAx: A Genome Sequencing Accelerator,” in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2018, pp. 69–82. [33] D. S. Cali, G. S. Kalsi, Z. Bingol, C. Firtina, L. Subramanian, J. S. Kim, R. Ausavarungnirun, M. Alser, J. Gomez-Luna, A. Boroumand, A. Nori, A. Scibisz, S. Subramoney, C. Alkan, S. Ghose, and O. Mutlu, “GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 951–966. [34] R. Kaplan, L. Yavits, and R. Ginosasr, “BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data,” in ACM International Systems and Storage Conference, 2020, pp. 36–48. [35] S. Diab, A. Nassereldine, M. Alser, J. G. Luna, O. Mutlu, and I. E. Hajj, “High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory,” in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022, pp. 163–163. [36] S. Gupta, M. Imani, B. Khaleghi, V. Kumar, and T. Rosing, “RAPID: A ReRAM Processing in-Memory Architecture for DNA Sequence Alignment,” in IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2019, pp. 1–6. [37] S. Kalikar, C. Jain, V. Md, and S. Misra, “Accelerating long-read analysis on modern CPUs,” bioRxiv, p. 2021.07.21.453294, 2022. [Online]. Available: https: //www.biorxiv.org/content/10.1101/2021.07.21.453294v2. [38] Y. Turakhia, G. Bejerano, and W. J. Dally, “Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly,” in ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2018, pp. 199–213. [39] P. Chen, C. Wang, X. Li, and X. Zhou, “Accelerating the Next Generation Long Read Mapping with the FPGA-Based System,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 5, pp. 840–852, 2014. [40] N. Ahmed, J. Lévy, S. Ren, H. Mushtaq, K. Bertels, and Z. Al-Ars, “GASAL2: A GPU accelerated sequence alignment library for high-throughput NGS data,” BMC Bioinformatics, vol. 20, no. 1, p. 520, 2019. [41] Md. Vasimuddin, S. Misra, H. Li, and S. Aluru, “Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems,” in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019, pp. 314–324. [42] Y. Liu, A. Wirawan, and B. Schmidt, “CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions,” BMC Bioinformatics, vol. 14, no. 1, p. 117, 2013. [43] Y. Liu and B. Schmidt, “GSWABE: Faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences,” Concurrency and Computation: Practice and Experience, vol. 27, no. 4, pp. 958–972, 2015. [44] R. Wilton, T. Budavari, B. Langmead, S. J. Wheelan, S. L. Salzberg, and A. S. Szalay, “Arioc: High-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space,” PeerJ , vol. 3, e808, 2015. [45] E. F. d. O. Sandes, G. Miranda, X. Martorell, E. Ayguade, G. Teodoro, and A. C. M. Melo, “CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 10, pp. 2838–2850, 2016. [46] H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, and O. Mutlu, “Shifted Hamming distance: A fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping,” Bioinformatics, vol. 31, no. 10, pp. 1553–1560, 2015. [47] M. Alser, H. Hassan, A. Kumar, O. Mutlu, and C. Alkan, “Shouji: A fast and efficient pre-alignment filter for sequence alignment,” Bioinformatics, vol. 35, no. 21, pp. 4255–4263, 2019. [48] M. Alser, T. Shahroodi, J. Gómez-Luna, C. Alkan, and O. Mutlu, “SneakySnake: A fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs,” Bioinformatics, vol. 36, no. 22-23, pp. 5282–5290, 2021. [49] H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, and C. Alkan, “Accelerating read mapping with FastHASH,” BMC Genomics, vol. 14, no. 1, S13, 2013. [50] M. Alser, O. Mutlu, and C. Alkan, “MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering,” arXiv, 2017. [Online]. Available: https://arxiv.org/abs/1707.01631. [51] J. S. Kim, D. Senol Cali, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu, “GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies,” BMC Genomics, vol. 19, no. S2, p. 89, 2018. [52] F. Hameed, A. A. Khan, and J. Castrillon, “ALPHA: A Novel Algorithm-Hardware Co-Design for Accelerating DNA Seed Location Filtering,” IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 3, pp. 1464–1475, 2021. [53] R. Kaplan, L. Yavits, and R. Ginosar, “RASSA: Resistive Prealignment Accelerator for Approximate DNA Long Read Mapping,” IEEE Micro, vol. 39, no. 4, pp. 44–54, 2019. [54] H. Sadasivan, M. Maric, E. Dawson, V. Iyer, J. Israeli, and S. Narayanasamy, “Accelerating Minimap2 for Accurate Long Read Alignment on GPUs,” Journal of Biotechnology and Biomedicine, vol. 06, no. 01, pp. 13–23, 2023. [55] J. L. Pernez, R. L. Uy, K. V. A. Borja, and J. C. G. Maghirang, “QCKer: An x86-AVX/AVX2 Implementation of Q-gram Counting Filter for DNA Sequence Alignment,” in International Conference on Bioinformatics Research and Applications (ICBRA), 2020, pp. 49–54. [56] M. Khalifa, R. Ben-Hur, R. Ronen, O. Leitersdorf, L. Yavits, and S. Kvatinsky, “FiltPIM: In-Memory Filter for DNA Sequencing,” in IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2021, pp. 1–4. [57] J. C. G. Maghirang, R. Luis Uy, K. V. A. Borja, and J. L. Pernez, “QCKer-FPGA: An FPGA Implementation of q-gram Counting Filter for DNA Sequence Alignment,” in IEEE International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), 2019, pp. 1–6. [58] G. Singh, M. Alser, D. S. Cali, D. Diamantopoulos, J. Gómez-Luna, H. Corporaal, and O. Mutlu, “FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications,” IEEE Micro, vol. 41, no. 4, pp. 39–48, 2021. [59] Z. Bingöl, M. Alser, O. Mutlu, O. Ozturk, and C. Alkan, “GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping,” in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021, pp. 209–209. [60] M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, and C. Alkan, “GateKeeper: A new hardware architecture for accelerating pre-alignment in DNA short read mapping,” Bioinformatics, vol. 33, no. 21, pp. 3355–3363, 2017. [61] A. Nag, C. N. Ramachandra, R. Balasubramonian, R. Stutsman, E. Giacomin, H. Kambalasubramanyam, and P.-E. Gaillardon, “GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019, pp. 334–346. [62] N. Mansouri Ghiasi, J. Park, H. Mustafa, J. Kim, A. Olgun, A. Gollwitzer, D. Senol Cali, C. Firtina, H. Mao, N. Almadhoun Alserr, R. Ausavarungnirun, N. Vijaykumar, M. Alser, and O. Mutlu, “GenStore: A high-performance in-storage processing system for genome sequence analysis,” in ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022, pp. 635–654. [63] L. Guo, J. Lau, Z. Ruan, P. Wei, and J. Cong, “Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU,” in IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019, pp. 127–135. [64] F. Chen, L. Song, H. H. Li, and Y. Chen, “PARC: A Processing-in-CAM Architecture for Genomic Long Read Pairwise Alignment using ReRAM,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2020, pp. 175–180. [65] C. Alkan, J. M. Kidd, T. Marques-Bonet, G. Aksay, F. Antonacci, F. Hormozdiari, J. O. Kitzman, C. Baker, M. Malig, O. Mutlu, S. C. Sahinalp, R. A. Gibbs, and E. E. Eichler, “Personalized copy number and segmental duplication maps using next-generation sequencing,” Nature Genetics, vol. 41, no. 10, pp. 1061–1067, 2009. [66] P.-H. Tseng, F.-M. Lee, Y.-H. Lin, L.-Y. Chen, Y.-C. Li, H.-W. Hu, Y.-Y. Wang, C.-C. Hsieh, M.-H. Lee, H.-L. Lung, K.-Y. Hsieh, K.-C. Wang, and C.-Y. Lu, “In-Memory-Searching Architecture Based on 3D-NAND Technology with Ultra-high Parallelism,” in IEEE International Electron Devices Meeting (IEDM), 2020, pp. 36.1.1–36.1.4. [67] P.-H. Tseng, F.-M. Lee, Y.-H. Lin, Y.-Y. Wang, M.-H. Lee, K.-Y. Hsieh, K.-C. Wang, and C.-Y. Lu, “A Hybrid In-Memory-Searching and In-Memory-Computing Architecture for NVM Based AI Accelerator,” in Symposium on VLSI Technology, 2021, pp. 1–2. [68] P.-H. Tseng, Y.-H. Lin, F.-M. Lee, T.-C. Bo, Y.-C. Li, M.-H. Lee, K.-Y. Hsieh, K.-C. Wang, and C.-Y. Lu, “In-Memory Approximate Computing Architecture Based on 3D-NAND Flash Memories,” in IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 270–271. [69] P.-H. Tseng, Y.-H. Lin, T.-C. Bo, F.-M. Lee, Y.-Y. Lin, M.-H. Lee, K.-Y. Hsieh, K.-C. Wang, and C.-Y. Lu, “An Analog In-Memory-Search Solution based on 3D-NAND Flash Memory for Brain-Inspired Computing,” in International Electron Devices Meeting (IEDM), 2022, pp. 33.6.1–33.6.4. [70] Y.-H. Lin, P.-H. Tseng, F.-M. Lee, M.-H. Lee, C.-C. Hsieh, D.-Y. Lee, K.-C. Wang, and C.-Y. Lu, “NOR Flash-based Multilevel In-Memory-Searching Architecture for Approximate Computing,” in IEEE International Memory Workshop (IMW), 2022, pp. 1–4. [71] M. Alser, Z. Bingol, D. S. Cali, J. Kim, S. Ghose, C. Alkan, and O. Mutlu, “Accelerating Genome Analysis: A Primer on an Ongoing Journey,” IEEE Micro, vol. 40, no. 5, pp. 65–75, 2020. [72] “Illumina and Gene Sequencing Technology - SBIR Success Story - NCI.” (2022), [Online]. Available: https://sbir.cancer.gov/portfolio/success-stories/illumina. [73] “An Introduction to Next-Generation Sequencing Technology.” (2015), [Online]. Available: https://www.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf. [74] S. R. Head, H. K. Komori, S. A. LaMere, T. Whisenant, F. Van Nieuwerburgh, D. R. Salomon, and P. Ordoukhanian, “Library construction for next-generation sequencing: Overviews and challenges,” BioTechniques, vol. 56, no. 2, pp. 61–77, 2014. [75] M. L. Metzker, “Sequencing technologies —the next generation,” Nature Reviews Genetics, vol. 11, no. 1, pp. 31–46, 2010. [76] “Deep Sequencing.” (2023), [Online]. Available: https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/deep-sequencing.html. [77] V. A. Schneider, T. Graves-Lindsay, K. Howe, N. Bouk, H.-C. Chen, P. A. Kitts, T. D. Murphy, K. D. Pruitt, F. Thibaud-Nissen, D. Albracht, R. S. Fulton, M. Kremitzki, V. Magrini, C. Markovic, S. McGrath, K. M. Steinberg, K. Auger, W. Chow, J. Collins, G. Harden, T. Hubbard, S. Pelan, J. T. Simpson, G. Threadgold, J. Torrance, J. M. Wood, L. Clarke, S. Koren, M. Boitano, P. Peluso, H. Li, C.-S. Chin, A. M. Phillippy, R. Durbin, R. K. Wilson, P. Flicek, E. E. Eichler, and D. M. Church, “Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly,” Genome Research, vol. 27, no. 5, pp. 849–864, 2017. [78] “Samsung ssd pm1735.” (2020), [Online]. Available: https://semiconductor.samsung.com/ssd/enterprise-ssd/pm1733-pm1735/mzplj3t2hbjr-00007. [79] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 3rd. Cambridge University Press, 2020. [80] J. S. Kim, C. Firtina, M. B. Cavlak, D. S. Cali, M. Alser, N. Hajinazar, C. Alkan, and O. Mutlu, “AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes,” arXiv, 2022. [Online]. Available: https://arxiv.org/abs/1912.08735. [81] M. M. Khayat, S. M. E. Sahraeian, S. Zarate, A. Carroll, H. Hong, B. Pan, L. Shi, R. A. Gibbs, M. Mohiyuddin, Y. Zheng, and F. J. Sedlazeck, “Hidden biases in germline structural variant detection,” Genome Biology, vol. 22, no. 1, p. 347, 2021. [82] “Samsung V-NAND SSD 860 EVO Data Sheet.” (2018), [Online]. Available: https://semiconductor.samsung.com/resources/data-sheet/Samsung_SSD_860_EVO_Data_Sheet_Rev1.pdf. [83] “Samsung Electronics Begins Mass Production of 8th-Gen Vertical NAND With Industry’s Highest Bit Density.” (2022), [Online]. Available: https://news.samsung.com/global/samsung-electronics-begins-mass-production-of-8th-gen-vertical-nand-with-industrys-highest-bit-density. [84] “232-Layer NAND Flash Memory \| Micron Technology.” (2022), [Online]. Available: https://www.micron.com/products/nand-flash/232-layer-nand. [85] “SK hynix Develops World’s Highest 238-Layer 4D NAND Flash.” (2022), [Online]. Available: https://news.skhynix.com/sk-hynix-develops-worlds-highest-238-layer-4d-nand-flash/. [86] A. Goda, “Recent Progress on 3D NAND Flash Technologies,” Electronics, vol. 10, no. 24, p. 3156, 2021. [87] E. W. Sayers, E. E. Bolton, J. R. Brister, K. Canese, J. Chan, D. C. Comeau, R. Connor, K. Funk, C. Kelly, S. Kim, T. Madej, A. Marchler-Bauer, C. Lanczycki, S. Lathrop, Z. Lu, F. Thibaud-Nissen, T. Murphy, L. Phan, Y. Skripchenko, T. Tse, J. Wang, R. Williams, B. W. Trawick, K. D. Pruitt, and S. T. Sherry, “Database resources of the national center for biotechnology information,” Nucleic Acids Research, vol. 50, no. D1, pp. D20–D26, 2022. [88] “Intel® CoreTM i7-7700 cpu.” (2017), [Online]. Available: https://www.intel.com/content/www/us/en/products/sku/97128/intel-core-i77700-processor-8m-cache-up-to-4-20-ghz/specifications.html. [89] “Intel® CoreTM i9-10900x cpu.” (2019), [Online]. Available: https://www.intel.com/content/www/us/en/products/sku/198019/intel-core-i910900x-xseries-processor-19-25m-cache-3-70-ghz/specifications.html. [90] S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator,” IEEE Computer Architecture Letters, vol. 19, no. 2, pp. 106–109, 2020. [91] D. Gouk, M. Kwon, J. Zhang, S. Koh, W. Choi, N. S. Kim, M. Kandemir, and M. Jung, “Amber: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 469–481. [92] R. Edgar, “Syncmers are more sensitive than minimizers for selecting conserved k‐mers in biological sequences,” PeerJ , vol. 9, e10805, 2021. [93] D. Ho, J. Ding, S. Misra, N. Tatbul, V. Nathan, V. Md, and T. Kraska, “LISA: Towards Learned DNA Sequence Search,” arxiv, 2019. [Online]. Available: https://arxiv.org/abs/1910.04728. [94] R. Langarita, A. Armejach, J. Setoain, P. Ibáñez-Marín, J. Alastruey-Benedé, and M. Moretó, “Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 1, pp. 355–368, 2022. [95] C. Firtina, J. Park, M. Alser, J. S. Kim, D. S. Cali, T. Shahroodi, N. M. Ghiasi, G. Singh, K. Kanellopoulos, C. Alkan, and O. Mutlu, “BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis,” NAR Genomics and Bioinformatics, vol. 5, no. 1, lqad004, 2023. [Online]. Available: https://arxiv.org/abs/2112.08687. [96] W. Huangfu, S. Li, X. Hu, and Y. Xie, “RADAR: A 3D-ReRAM based DNA Alignment Accelerator Architecture,” in ACM/ESDA/IEEE Design Automation Conference (DAC), 2018, pp. 1–6. [97] S. Angizi, J. Sun, W. Zhang, and D. Fan, “AlignS: A Processing-In-Memory Accelerator for DNA Short Read Alignment Leveraging SOT-MRAM,” in ACM/ IEEE Design Automation Conference (DAC), 2019, pp. 1–6. [98] F. Zokaee, H. R. Zarandi, and L. Jiang, “AligneR: A Process-in-Memory Architecture for Short Read Alignment in ReRAMs,” IEEE Computer Architecture Letters, vol. 17, no. 2, pp. 237–240, 2018. [99] W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y. Xie, “MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019, pp. 587–599. [100] T. Batu, F. Ergun, and C. Sahinalp, “Oblivious string embeddings and edit distance approximations,” in ACM-SIAM Symposium on Discrete Algorithm (SODA), 2006, pp. 792–801. [101] M. Šošic and M. Šikic, “Edlib: A C/C ++ library for fast, exact sequence alignment using edit distance,” Bioinformatics, vol. 33, no. 9, pp. 1394–1395, 2017. [102] M. Charikar, O. Geri, M. P. Kim, and W. Kuszmaul, “On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings,” arxiv, 2018. [Online]. Available: https://arxiv.org/abs/1804.09907. [103] F. Hach, I. Sarrafi, F. Hormozdiari, C. Alkan, E. E. Eichler, and S. C. Sahinalp, “mrsFAST-Ultra: A compact, SNP-aware mapper for high performance sequencing applications,” Nucleic Acids Research, vol. 42, W494–500, 2014. [104] G. Rizk and D. Lavenier, “GASSST: Global alignment short sequence search tool,” Bioinformatics, vol. 26, no. 20, pp. 2534–2540, 2010.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87719	-
dc.description.abstract	目前，序列比對 (Read mapping) 是基因組序列分析的效能瓶頸之一，因為其需要進行複雜的近似字串比對演算法，以找出定序儀的讀取序列輸出 (即 reads) 和參考序列之間的潛在對齊位置。先前的研究表明，預對齊過濾 (pre-alignment filtering) 演算法透過過濾對齊分數較低的位置，能夠大幅減少複雜對齊算法的執行次數，並顯著提升序列比對的效能。考慮到預對齊過濾算法需要密集存取記憶體的特性，已經有許多基於記憶體內運算的處理架構被提出以提高該算法的效能。然而，我們發現將該計算架構與當前最先進的序列比對軟體 Minimap2 整合時，預對齊過濾的運算速度會顯著低於 Minimap2 的執行速度，使過濾延遲無法透過管線 (Pipeline) 化執行隱藏在 Minimap2 的執行時間中。這是由於在大多數情況下，每個讀取序列的潛在對齊位置數量遠小於記憶體內運算架構提供的運算平行度，大幅限制了預對齊過濾的效能。在本篇論文中，我們提出一個基於 3D NAND flash 的儲存內預對齊過濾架構，以提高序列比對的效能。我們利用了讀取序列資料集中的序列深度特性，設計了一種新的以讀取序列為中心的預對齊過濾方法，能夠更好的利用底層硬體提供的運算平行度。然而，若將讀取序列為中心的過濾方法應用到先前在主機系統實現的記憶體內部運算架構時，還需要在 PCIe 通道間進行額外的資料搬移才能實現。因此，我們進行了軟硬體共同設計，使得該過濾方法能夠利用 3D NAND flash 的近似比對架構直接在儲存端內部實現預對齊過濾演算法，藉此減少大量的資料搬移。實驗結果顯示，在與目前最先進的記憶體內運算架構相比，我們提出的設計在過濾階段平均可以提升 42.69 倍的吞吐量，並且可以帶來 19.69% 的端到端效能提升。	zh_TW
dc.description.abstract	Read mapping is currently the performance bottleneck of genome sequence analysis since it needs to perform costly approximate string matching to identify the potential matches between the sequencing output (i.e., reads) and an already-known reference genome sequence. Prior works have demonstrated that pre-alignment filtering, which filters out unnecessary mapping locations that will result in a poor match, can significantly improve the performance of read mapping. Considering the memory-intensive characteristic of pre-alignment filtering, processing-in-memory (PIM) designs have been proposed to improve the filtering throughput. However, we find that when integrating the PIM-based pre-alignment filter with the SOTA read mapper, Minimap2, the filtering latency overhead cannot be completely hidden as the filtering throughput lags behind the mapping throughput. The limited number of potential matching locations per read restricts the parallelism of pre-alignment filtering, even though the underlying PIM architecture enables highly parallel in-memory computing. In this work, we propose a 3D NAND flash-based in-storage pre-alignment filtering architecture to improve read mapping performance. To increase the filtering throughput, we exploit the read depth property present in most read datasets to design a new read-centric pre-alignment filtering approach. The read-centric design enables multiple reads to be filtered concurrently to better utilize the parallelism provided by the underlying hardware. Nevertheless, extra data movements across PCIe channels are required if it is implemented on the host memory side. Thus, we co-design the software and hardware to enable read-centric pre-alignment filtering to be in-situ processed in the storage, leveraging the approximate parallel search capability of 3D NAND Flash. The evaluation results show that the proposed design on average achieves 42.69x higher filtering throughput compared to the SOTA PIM solution and can provide up to 19.69% end-to-end performance improvement.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-07-19T16:05:48Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-07-19T16:05:48Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 致謝 iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Background 7 2.1 GenomeSequenceAnalysis 7 2.1.1 Genome Sequencing 7 2.1.2 Read Mapping 8 2.1.3 Q-gram Pre-Alignment Filtering 9 2.2 3D NAND Flash Approximate String Matching Architecture 12 2.2.1 SSD Organization 12 2.2.2 3D NAND Flash Organization 13 2.2.3 Approximate IMS Architecture 13 Chapter 3 Motivation 17 3.1 Potential of Seed Filtering and Limitations of the State-of-the-art 17 3.2 Read-centric Filtering Methodology 19 3.3 Opportunity of 3D NAND AIMS Architecture 20 Chapter 4 Read-centric Q-gram filtering 23 4.1 Read Clustering with LSH 24 4.2 Cluster-wise Q-gram Filtering 26 Chapter 5 Proposed System 29 5.1 Cluster-wise Seeding 31 5.2 Approximate Q-gram Filtering 32 5.3 Approximate Q-gram Filtering in AIMS 35 Chapter 6 Evaluation Methodology 39 Chapter 7 Evaluation 43 7.1 Performance Analysis 43 7.2 Clustering Ratio Analysis 45 7.3 Hardware Configuration Analysis 46 Chapter 8 Related Work 49 Chapter 9 Conclusion 53 References 55	-
dc.language.iso	en	-
dc.subject	序列比對	zh_TW
dc.subject	記憶體內搜索	zh_TW
dc.subject	記憶體內運算	zh_TW
dc.subject	固態硬碟	zh_TW
dc.subject	儲存內運算架構	zh_TW
dc.subject	三維快閃記憶體	zh_TW
dc.subject	In Storage Computing	en
dc.subject	3D NAND Flash	en
dc.subject	In Memory Search	en
dc.subject	Computing-in-Memory	en
dc.subject	SSD	en
dc.subject	Read Mapping	en
dc.title	基於 3D NAND 快閃記憶體近似字串比對架構之儲存內 DNA 序列比對系統	zh_TW
dc.title	An In-Storage DNA Read Mapping System based on 3D-NAND Flash Approximate String Matching Architecture	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	鄭湘筠;王克中	zh_TW
dc.contributor.oralexamcommittee	Hsiang-Yun Cheng;Keh-Chung Wang	en
dc.subject.keyword	固態硬碟,儲存內運算架構,記憶體內運算,三維快閃記憶體,記憶體內搜索,序列比對,	zh_TW
dc.subject.keyword	SSD,In Storage Computing,Computing-in-Memory,3D NAND Flash,In Memory Search,Read Mapping,	en
dc.relation.page	67	-
dc.identifier.doi	10.6342/NTU202300763	-
dc.rights.note	未授權	-
dc.date.accepted	2023-05-05	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	2.57 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。