高效能記憶體內向量相似度比對系統

江浩瑋; Hao-Wei Chiang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101548

標題:	高效能記憶體內向量相似度比對系統 Energy-Efficient In-Memory Vector Similarity Searching System
作者:	江浩瑋 Hao-Wei Chiang
指導教授:	吳安宇 An-Yeu Wu
關鍵字:	記憶體內搜索,向量相似度搜索三元可定址內容記憶體 In-memory search,Vector similarity searchTernary Content Addressable Memory
出版年 :	2026
學位:	碩士
摘要:	向量相似度搜索（Vector Similarity Search）在眾多資料密集型應用中扮演關鍵角色，例如檢索增強生成（Retrieval-Augmented Generation）、推薦系統（Recommendation System）與影像檢索（Image Retrieval）等。然而在傳統馮紐曼架構（von Neumann Architecture）中執行向量相似度搜索，會因為記憶體與運算單元間頻繁的資料傳輸，造成顯著的能耗與延遲開銷。為了改善此問題，近年來研究者提出了將計算靠近資料來源的記憶體內搜索（In-Memory Search）架構。根據資料規模的不同，可選擇不同型態的記憶體：針對小規模應用，SRAM 架構具備快速存取與高可靠性的優勢；而對於含有數百萬至數十億筆向量的大規模資料集，NAND 快閃記憶體則因其高儲存密度與容量優勢而更具潛力。本論文針對這兩類不同規模的應用情境，提出高能源效率的記憶體內搜索解決方案。以 SRAM 為基礎的小規模向量搜索中，多數的應用場景對於搜索的精確度有較高的要求。在此場景中，基於雙模式SRAM記憶體陣列的兩階段式搜索框架是一種極具潛力的方法，其透過 TCAM 模式進行粗篩選，並利用 IMC 模式做更精確的相似度計算，這種方式在維持高精確度要求的同時，也降低了所需的能量消耗。然而，這類設計仍面臨一些挑戰，包括向量過濾效率低下以及搜尋精度受限等問題。為了解決這些問題，我們提出投票式（voting-based）的後處理機制來提升過濾效率，並延伸現行的角度編碼，以強化向量搜索的準確度。此外，我們實作了具管線化的 Top-K 單元，並將上述功能與 SRAM 雙模式記憶體整合於單一晶片中。實驗結果顯示，該設計在能耗與準確度方面皆優於現有的先進記憶體內運算加速器。針對使用 NAND 快閃記憶體的大規模向量搜索，其主要面對的挑戰包括有限的資料精度、較長的搜尋延遲以及元件本身的硬體變異性。為了克服這些問題，我們提出一系列演算法與硬體協同優化的策略，包括用於提高數值精度的多位元溫度計編碼（Multi-bit Thermometer Code）、可減少搜尋次數的非對稱向量相似度搜索（Asymmetric Vector Similarity Search），以及考量硬體變異性的硬體覺察式訓練（Hardware-Aware Training）。這些方法能有效提升搜索準確度，同時顯著降低能源消耗，實現具擴展性的高效能向量搜索系統。總結而言，本論文提出一套涵蓋不同記憶體技術的整合性方法，實現在不同資料規模下的能源效率最佳化架構。所提出的系統已在多種應用場景中進行實驗驗證，包括少樣本學習（Few-shot Learning）與近似最鄰近搜尋（Approximate Nearest Neighbor Search），展現了其在準確度與能源效率方面的優越表現。 Vector similarity search (VSS) plays a critical role in a wide range of data-intensive applications, such as retrieval-augmented generation, recommendation systems, and image retrieval. However, performing VSS in conventional von Neumann architectures incurs significant energy and latency costs due to frequent data transfers between memory and processing units. To mitigate this, recent research has adopted in-memory search (IMS) architectures that bring computation closer to the data. Depending on the scale of the target dataset, different types of memory are preferred. For small-scale applications, SRAM-based designs offer fast access and high reliability. In contrast, NAND flash memory becomes more promising for large-scale datasets involving millions or billions of vectors due to its high density and storage capacity. In this thesis, we present energy-efficient IMS solutions that address both small- and large-scale VSS scenarios. For small-scale VSS using SRAM, most applications focus on exact search, which requires higher search accuracy. Two-stage search frameworks built on dual-mode SRAM arrays have emerged as a promising approach in this scenario. This framework leverages TCAM operation for coarse filtering and IMC operation for fine-grained similarity computation, which meets the high accuracy requirement with low energy consumption. However, this approach still faces challenges such as inefficient candidate filtering and limited search accuracy. To address these issues, we introduce a voting-based post-processing mechanism to improve filtering efficiency, while an extended angular encoding method enhances search accuracy in both the filtering and refinement stages. We further design and implement a pipelined Top-K unit and integrate it with SRAM-based dual-mode memory arrays into a single chip. The proposed design achieves significant improvements in both energy efficiency and accuracy compared to prior state-of-the-art designs. For large-scale VSS using NAND flash, key challenges arise from hardware limitations such as limited precision, long search latency, and the device variations. To overcome these issues, we propose a set of algorithm-hardware co-optimization strategies, including Multi-bit Thermometer Code for reliable distance encoding, Asymmetric Vector Similarity Search for reducing search iterations, and Hardware-Aware Training to mitigate accuracy loss caused by device variations. These methods enable efficient and scalable VSS with high retrieval accuracy, while significantly reducing energy consumption. In summary, this thesis proposes a comprehensive set of techniques that span different memory technologies to support energy-efficient in-memory VSS across varying data scales. The proposed frameworks have been rigorously evaluated on a range of tasks, including few-shot learning and approximate nearest neighbor search, demonstrating their effectiveness in both accuracy and energy efficiency.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101548
DOI:	10.6342/NTU202600487
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	6.12 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。