利用儲存內計算加速向量資料庫搜尋

宋沛誠; Pei-Cheng Sung

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97065

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲	zh_TW
dc.contributor.advisor	Chia-Lin Yang	en
dc.contributor.author	宋沛誠	zh_TW
dc.contributor.author	Pei-Cheng Sung	en
dc.date.accessioned	2025-02-26T16:17:28Z	-
dc.date.available	2025-02-27	-
dc.date.copyright	2025-02-26	-
dc.date.issued	2025	-
dc.date.submitted	2025-02-11	-
dc.identifier.citation	[1] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, 2013. [2] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, 2020. [3] R. Chen, B. Liu, H. Zhu, Y. Wang, Q. Li, B. Ma, Q. Hua, J. Jiang, Y. Xu, H. Deng et al., “Approximate nearest neighbor search under neural similarity metric for large-scale recommendation,” in ACM International Conference on Information & Knowledge Management, 2022. [4] Q. Chen, B. Zhao, H. Wang, M. Li, C. Liu, Z. Li, M. Yang, and J. Wang, “Spann: Highly-efficient billion-scale approximate nearest neighborhood search,” Advances in Neural Information Processing Systems, 2021. [5] Y. Xu, H. Liang, J. Li, S. Xu, Q. Chen, Q. Zhang, C. Li, Z. Yang, F. Yang, Y. Yang et al., “Spfresh: Incremental in-place update for billion-scale vector search,” in Symposium on Operating Systems Principles, 2023. [6] Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. [7] C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” arXiv, 2017. [8] S. Jayaram Subramanya, F. Devvrit, H. V. Simhadri, R. Krishnawamy, and R. Kadekodi, “Diskann: Fast accurate billion-point nearest neighbor search on a single node,” Advances in Neural Information Processing Systems, 2019. [9] J. Ni, X. Xu, Y. Wang, C. Li, J. Yao, S. Xiao, and X. Zhang, “Diskann++: Efficient page-based search over isomorphic mapped graph index using query-sensitivity entry vertex,” arXiv, 2023. [10] Z. Zhu, J. Liu, G. Dai, S. Zeng, B. Li, H. Yang, and Y. Wang, “Processingin-hierarchical-memory architecture for billion-scale approximate nearest neighbor search,” in ACM/IEEE Design Automation Conference, 2023. [11] J. Kwak, S. Lee, K. Park, J. Jeong, and Y. H. Song, “Cosmos+ openssd: Rapid prototype for flash storage systems,” ACM Transactions on Storage, 2020. [12] D. Reinsel, J. Gantz, and J. Rydning, “Data age 2025: The evolution of data to life-critical; don＇t focus on big data, focus on the data that＇s big,” IDC, Tech. Rep., 2017. [13] E. Mehmood and T. Anees, “Distributed real-time etl architecture for unstructured big data,” Knowledge and Information Systems, 2022. [14] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov, “Devise: A deep visual-semantic embedding model,” Advances in Neural Information Processing Systems, 2013. [15] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, 2013. [16] J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu et al., “Milvus: A purpose-built vector data management system,” in International Conference on Management of Data, 2021. [17] M. Wang, X. Xu, Q. Yue, and Y. Wang, “A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,” arXiv, 2021. [18] H. Jegou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010. [19] I. NVM Express, “Nvm express base specification, revision 2.1,” https://nvmexpress. org/specifications/, 2024. [20] K.-C. Ho, P.-C. Fang, H.-P. Li, C.-Y. M. Wang, and H.-C. Chang, “A 45nm 6b/cellcharge-trapping flash memory using ldpc-based ecc and drift-immune soft-sensing engine,” in IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013. [21] S. Li and T. Zhang, “Improving multi-level nand flash memory storage reliability using concatenated bch-tcm coding,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2009. [22] Y. Kang, Y.-s. Kee, E. L. Miller, and C. Park, “Enabling cost-effective data processing with smart ssd,” in Symposium on Mass Storage Systems and Technologies, 2013. [23] G. Koo, K. K. Matam, T. I, H. K. G. Narra, J. Li, H.-W. Tseng, S. Swanson, and M. Annavaram, “Summarizer: trading communication with computing near storage,” in IEEE/ACM International Symposium on Microarchitecture, 2017. [24] J. Do, S. Sengupta, and S. Swanson, “Programmable solid-state storage in future cloud datacenters,” Communications of the ACM, 2019. [25] A. Barbalace and J. Do, “Computational storage: Where are we today?” in Conference on Innovative Data Systems Research, 2021. [26] I. Jo, D.-H. Bae, A. S. Yoon, J.-U. Kang, S. Cho, D. D. Lee, and J. Jeong, “Yoursql: a high-performance database system leveraging in-storage computing,” Proceedings of the VLDB Endowment, 2016. [27] B. Gu, A. S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho et al., “Biscuit: A framework for near-data processing of big data workloads,” ACM SIGARCH Computer Architecture News, 2016. [28] C. Zou and A. A. Chien, “Assasin: Architecture support for stream computing to accelerate computational storage,” in IEEE/ACM International Symposium on Microarchitecture, 2022. [29] J. H. Lee, H. Zhang, V. Lagrange, P. Krishnamoorthy, X. Zhao, and Y. S. Ki, “Smartssd: Fpga accelerated near-storage data analytics on ssd,” IEEE Computer Architecture Letters, 2020. [30] S. Salamat, A. Haj Aboutalebi, B. Khaleghi, J. H. Lee, Y. S. Ki, and T. Rosing, “Nascent: Near-storage acceleration of database sort on smartssd,” in ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021. [31] H. Jang, J. Song, J. Jung, J. Park, Y. Kim, and J. Lee, “Smart-infinity: Fast large language model training using near-storage processing on a real system,” in IEEE International Symposium on High-Performance Computer Architecture, 2024. [32] Y. Chen, J. Xu, C. Wei, Y. Wang, X. Yuan, Y. Zhang, X. Yu, Y. Chen, Z. Wang, S. He et al., “Bm-store: A transparent and high-performance local storage architecture for bare-metal clouds enabling large-scale deployment,” in IEEE International Symposium on High-Performance Computer Architecture, 2023. [33] Q. Chen, H. Wang, M. Li, G. Ren, S. Li, J. Zhu, J. Li, C. Liu, L. Zhang, and J. Wang, “SPTAG: A library for fast approximate nearest neighbor search,” https://github.com/Microsoft/SPTAG, 2018. [34] H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg, “Datasets for approximate nearest neighbor search,” http://corpus-texmex.irisa.fr/, 2011. [35] Microsoft, “SPACEV1B: A Billion-Scale Vector Dataset for Text Descriptors,” https://github.com/microsoft/SPTAG/tree/main/datasets/SPACEV1B, 2020. [36] S. Chen, A. C. Zhou, Y. Shi, Y. Li, and X. Yao, “Memanns: Enhancing billion-scale anns efficiency with practical pim hardware,” arXiv, 2024. [37] J. Jang, H. Choi, H. Bae, S. Lee, M. Kwon, and M. Jung, “CXL-ANNS:Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,” in USENIX Annual Technical Conference, 2023. [38] S. Liang, Y. Wang, Z. Yuan, C. Liu, H. Li, and X. Li, “Vstore: in-storage graph based vector search accelerator,” in ACM/IEEE Design Automation Conference, 2022. [39] J.-H. Kim, Y.-R. Park, J. Do, S.-Y. Ji, and J.-Y. Kim, “Accelerating large-scale graph-based nearest neighbor search on a computational storage platform,” IEEE Transactions on Computers, 2022. [40] Y. Lee, H. Choi, S. Min, H. Lee, S. Beak, D. Jeong, J. W. Lee, and T. J. Ham, “Anna: Specialized architecture for approximate nearest neighbor search,” in IEEE International Symposium on High-Performance Computer Architecture, 2022. [41] S. Zeng, Z. Zhu, J. Liu, H. Zhang, G. Dai, Z. Zhou, S. Li, X. Ning, Y. Xie, H. Yang et al., “Df-gas: a distributed fpga-as-a-service architecture towards billion-scale graph-based approximate nearest neighbor search,” in IEEE/ACM International Symposium on Microarchitecture, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97065	-
dc.description.abstract	向量資料庫（Vector Databases, VDBs）透過向量搜尋實現高效的資訊檢索，在現代資料處理中至關重要。雖然近似最近鄰搜尋（Approximate Nearest Neighbor Search, ANNS）方法（如基於圖的和基於叢集的方法）解決了大型資料集在計算上的挑戰。然而，大量資料移動所導致的效能瓶頸，限制了向量搜尋的可擴展性和效率。透過儲存內計算 (In-storage Computing) 技術能夠有效緩解此瓶頸所造成的性能損失。然而，過去的研究缺乏存儲內計算裝置內部的量化分析，既無法直觀的說明效能提升的原因，系統中也可能存在潛在的效能瓶頸。本研究提出了一種基於 COSMOS+ OpenSSD 平台的儲存內向量搜尋裝置設計與實現。借助實際平台的分析發現直觀的設計會導致指令管線化效能下降並針對此問題提出改善方案。我們的方法在距離計算功能的效上相較於現有的儲存內計算解決方案提升了 1.49 倍，達到了理想效能的 88%。此外，我們的方案在端到端吞吐量上，與儲存內計算方法相比，提升了 1.42 ~ 1.44 倍; 與使用傳統 SSD 方法相比，提升了 1.75 ~ 2.03 倍。最後，我們分析了引入儲存內計算所帶來的新的系統瓶頸，並且針對此瓶頸提出未來的研究建議。	zh_TW
dc.description.abstract	Vector Databases (VDBs) enable efficient information retrieval through vector search, crucial in modern data processing. While Approximate Nearest Neighbor Search (ANNS) methods address computational challenges, data movement bottlenecks limit the scalability and efficiency of vector search. Prior In-storage computing (ISC) reserch offers a solution but lacks quantitative analysis on real devices, making it difficult to explain performance gains and identify potential bottlenecks. This study presents the design and implementation of an in-storage vector search device based on the COSMOS+ OpenSSD platform. Through real-device analysis, we identify a key bottleneck where a naive PE design leads to pipeline inefficiencies between commands. To address this, we propose an optimized approach that improves distance computation performance by 1.49x, achieving 88% of the theoretical peak performance. Additionally, our design enhances end-to-end throughput by 1.42x ~ 1.44x compared to existing ISC solutions and by 1.75x ~ 2.03x over traditional SSD-based methods. Finally, we analyze new system bottlenecks introduced by ISC and provide insights for future research directions.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-26T16:17:28Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-26T16:17:28Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書i 致謝ii 中文摘要iii 英文摘要iv 目次vi 圖次ix 表次xi 第一章Introduction 1 第二章Background & Motivation 3 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Vector database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 SSD Architecture & Open Source SSD Platform . . . . . . . . . . . 7 2.1.2.1 SSD Architecture . . . . . . . . . . . . . . . . . . . . 7 2.1.2.2 Open Source SSD Platform . . . . . . . . . . . . . . . 9 2.1.3 In-Storage Computing . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 第三章Implementation & Analysis 18 3.1 In-storage Distance Calculation . . . . . . . . . . . . . . . . . . . . . . 18 3.2 In-storage Distance Calculation Design . . . . . . . . . . . . . . . . . . 19 3.3 Processing Element Design . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 In-storage Distance Calculation Implementation . . . . . . . . . . . . . 23 3.5 Performance Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.6.1 Impact of In-Storage Distance Calculation on PCIe Bottleneck . . . 28 3.6.2 Root cause analysis of the delay between read trigger and read transfer 29 第四章Methodology 31 4.1 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.1 Async Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Auto Argument Switch . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Processing Element Design . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Async Setup Implementation . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 Auto Argument Switch Implementation . . . . . . . . . . . . . . . 35 4.3 Async PE Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 第五章Evaluation 39 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Stress Test Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.2 Time Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3 End-to-End Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.2 Cluster Size and Performance Correlation . . . . . . . . . . . . . . 44 5.4 Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 第六章Related Works 48 6.1 SSD-Based ANNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 Processing-in-Memory for ANNS . . . . . . . . . . . . . . . . . . . . 49 6.3 Compute Express Link (CXL) for ANNS . . . . . . . . . . . . . . . . 50 6.4 In-Storage Computing-Based ANNS . . . . . . . . . . . . . . . . . . . 50 6.5 Other Hardware Accelerators for ANNS . . . . . . . . . . . . . . . . . 51 第七章Conclusion 52 參考文獻54	-
dc.language.iso	en	-
dc.title	利用儲存內計算加速向量資料庫搜尋	zh_TW
dc.title	Accelerate Search in Vector Databases with In-Storage Computing	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張原豪;鄭湘筠	zh_TW
dc.contributor.oralexamcommittee	Yuan-Hao Chang;Hsiang-Yun Cheng	en
dc.subject.keyword	儲存內計算,近數據計算,向量資料庫,近似最鄰近搜尋,場效可程式化邏輯閘陣列,	zh_TW
dc.subject.keyword	In-Storage Computing,Near Data Processing,Vector Database,Approximate Nearest Neighbor Search,FPGA,	en
dc.relation.page	59	-
dc.identifier.doi	10.6342/NTU202500592	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-02-11	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2026-02-28	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 此日期後於網路公開 2026-02-28	10.57 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。