高效能記憶體內向量相似度比對系統

江浩瑋; Hao-Wei Chiang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101548

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳安宇	zh_TW
dc.contributor.advisor	An-Yeu Wu	en
dc.contributor.author	江浩瑋	zh_TW
dc.contributor.author	Hao-Wei Chiang	en
dc.date.accessioned	2026-02-11T16:16:24Z	-
dc.date.available	2026-02-12	-
dc.date.copyright	2026-02-11	-
dc.date.issued	2026	-
dc.date.submitted	2026-01-30	-
dc.identifier.citation	References T. Mikolov, “Efficient estimation of word representations in vector space,” arXiv, vol. 3781, 2013. J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds., Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. DOI: 10.3115/v1/D14- 1162. [Online]. Available: https://aclanthology.org/D14-1162/. Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 4, pp. 824–836, 2018. M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 2002, pp. 380–388. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in International conference on machine learning, PMLR, 2016, pp. 1842–1850. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances in neural information processing systems, vol. 29, 2016. L. Liu, T. Zhou, G. Long, J. Jiang, and C. Zhang, “Many-class few-shot learning on multi-granularity class hierarchy,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2293–2305, 2020. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020. H.-W. Hu, W.-C. Wang, Y.-H. Chang, Y.-C. Lee, B.-R. Lin, H.-M. Wang, Y.-P. Lin, Y.-M. Huang, C.-Y. Lee, T.-H. Su, C.-C. Hsieh, C.-M. Hu, Y.-T. Lai, C.-K. Chen, H.-S. Chen, H.-P. Li, T.-W. Kuo, M.-F. Chang, K.-C. Wang, C.-H. Hung, and C.-Y. Lu, “Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search acceleration,” in 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, pp. 763–783. DOI: 10.1109/MICRO56248.2022.00058. P.-H. Tseng, T.-C. Bo, Y.-H. Lin, Y.-C. Lin, J.-Y. Liao, F.-M. Lee, Y.-Y. Lin, M.-H. Lee, K.-Y. Hsieh, K.-C. Wang, and C.-Y. Lu, “Slc and mlc in-memory-approximate-search solutions in commercial 48-layer and 96-layer 3d-nand flash memories,” in 2023 IEEE International Memory Workshop (IMW), 2023, pp. 1–4. DOI: 10.1109/IMW56887.2023.10145964. A. F. Laguna, X. Yin, D. Reis, M. Niemier, and X. S. Hu, “Ferroelectric fet based in-memory computing for few-shot learning,” in Proceedings of the 2019 Great Lakes Symposium on VLSI, ser. GLSVLSI ’19, Tysons Corner, VA, USA: Association for Computing Machinery, 2019, pp. 373–378, ISBN: 9781450362528. DOI: 10.1145/3299874.3319450. [Online]. Available: https://doi.org/10.1145/3299874.3319450. H. Li, W.-C. Chen, A. Levy, C.-H. Wang, H. Wang, P.-H. Chen, W. Wan, W.-S. Khwa, H. Chuang, Y.-D. Chih, M.-F. Chang, H.-S. P. Wong, and P. Raina, “Sapiens: A 64-kb rram-based non-volatile associative memory for one-shot learning and inference at the edge,” IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6637–6643, 2021. DOI: 10.1109/TED.2021.3110464. D. Reis, A. F. Laguna, M. Niemier, and X. S. Hu, “Attention-in-memory for few-shot learning with configurable ferroelectric fet arrays,” in Proceedings of the 26th Asia and South Pacific Design Automation Conference, ser. ASPDAC ’21, Tokyo, Japan: Association for Computing Machinery, 2021, pp. 49–54, ISBN: 9781450379991. DOI: 10.1145/3394885.3431526. [Online]. Available: https://doi.org/10.1145/3394885.3431526. Y. Chen, J. Mu, H. Kim, L. Lu, and T. T.-H. Kim, “Bp-scim: A reconfigurable 8t sram macro for bit-parallel searching and computing in-memory,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 5, pp. 2016–2027, 2023. DOI: 10.1109/TCSI.2023.3240303. Z. Lin, Z. Zhu, H. Zhan, C. Peng, X. Wu, Y. Yao, J. Niu, and J. Chen, “Two-direction in-memory computing based on 10t sram with horizontal and vertical decoupled read ports,” IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2832–2844, 2021. DOI: 10.1109/JSSC.2021.3061260. A. Bremler-Barr, Y. Harchol, D. Hay, and Y. Hel-Or, “Encoding short ranges in tcam without expansion: Efficient algorithm and applications,” in Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016, pp. 35–46. C.-T. Huang, C.-Y. Chang, H.-Y. Cheng, and A.-Y. Wu, “Bore: Energy-efficient banded vector similarity search with optimized range encoding for memory-augmented neural network,” in 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024, pp. 1–6. DOI: 10.23919/DATE58400.2024.10546659. L. Perron and F. Didier, Cp-sat, version v9.10, Google, May 7, 2024. [Online]. Available: https://developers.google.com/optimization/cp/cp_solver/. B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015. C.-T. Huang, J.-C. Wang, H.-Y. Cheng, and A.-Y. Wu, “Segmented angular pre-processing for accurate and efficient in-memory vector similarity search,” in 2025 ACM/IEEE Design Automation Conference (DAC), IEEE, 2025. L. Zhang, Y. Zhang, J. Tang, K. Lu, and Q. Tian, “Binary code ranking with weighted hamming distance,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1586–1593. DOI: 10.1109/CVPR.2013.208. R. Mao, B. Wen, A. Kazemi, Y. Zhao, A. F. Laguna, R. Lin, N. Wong, M. Niemier, X. S. Hu, X. Sheng, et al., “Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search,” Nature communications, vol. 13, no. 1, p. 6284, 2022. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713. DOI: 10.1109/CVPR.2018.00286. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv, 2013. M. Li, S. Liu, M. M. Sharifi, and X. S. Hu, “Camasim: A comprehensive simulation framework for content-addressable memory based accelerators,” arXiv, 2024. Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline: Exploring simple meta-learning for few-shot learning,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9042–9051. DOI: 10.1109/ICCV48922.2021.00893. P. Mangla, M. Singh, A. Sinha, N. Kumari, V. N. Balasubramanian, and B. Krishnamurthy, “Charting the right manifold: Manifold mixup for few-shot learning,” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2207–2216. DOI: 10.1109/WACV45572.2020.9093338. W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin, “Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1475–1488, 2019. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” California Institute of Technology, Tech. Rep. CNS-TR-2011-001, 2011. H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011. DOI: 10.1109/TPAMI.2010.57. H. Xiao, K. Rasul, and R. Vollgraf. “Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms.” arXiv: cs.LG/1708.07747 [cs.LG]. H.-H. Hsu, T.-H. Wen, W.-H. Huang, W.-S. Khwa, Y.-C. Lo, C.-J. Jhang, Y.-H. Chin, Y.-C. Chen, C.-C. Lo, R.-S. Liu, K.-T. Tang, C.-C. Hsieh, Y.-D. Chih, T.-Y. J. Chang, and M.-F. Chang, “A nonvolatile ai-edge processor with slc–mlc hybrid reram compute-in-memory macro using current–voltage-hybrid readout scheme,” IEEE Journal of Solid-State Circuits, vol. 59, no. 1, pp. 116–127, 2024. DOI: 10.1109/JSSC.2023.3314433. Y. Kim, H. Kim, S. Kim, S. J. Kim, and P. Panda, “Gradient-based bit encoding optimization for noise-robust binary memristive crossbar,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2022, pp. 1111–1114. H. Zhu, M. Long, J. Wang, and Y. Cao, “Deep hashing network for efficient similarity retrieval,” in Proceedings of the AAAI conference on Artificial Intelligence, vol. 30, 2016. L. Wang, Y. Pan, C. Liu, H. Lai, J. Yin, and Y. Liu, “Deep hashing with minimal-distance-separated hash centers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 23 455–23 464.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101548	-
dc.description.abstract	向量相似度搜索（Vector Similarity Search）在眾多資料密集型應用中扮演關鍵角色，例如檢索增強生成（Retrieval-Augmented Generation）、推薦系統（Recommendation System）與影像檢索（Image Retrieval）等。然而在傳統馮紐曼架構（von Neumann Architecture）中執行向量相似度搜索，會因為記憶體與運算單元間頻繁的資料傳輸，造成顯著的能耗與延遲開銷。為了改善此問題，近年來研究者提出了將計算靠近資料來源的記憶體內搜索（In-Memory Search）架構。根據資料規模的不同，可選擇不同型態的記憶體：針對小規模應用，SRAM 架構具備快速存取與高可靠性的優勢；而對於含有數百萬至數十億筆向量的大規模資料集，NAND 快閃記憶體則因其高儲存密度與容量優勢而更具潛力。本論文針對這兩類不同規模的應用情境，提出高能源效率的記憶體內搜索解決方案。以 SRAM 為基礎的小規模向量搜索中，多數的應用場景對於搜索的精確度有較高的要求。在此場景中，基於雙模式SRAM記憶體陣列的兩階段式搜索框架是一種極具潛力的方法，其透過 TCAM 模式進行粗篩選，並利用 IMC 模式做更精確的相似度計算，這種方式在維持高精確度要求的同時，也降低了所需的能量消耗。然而，這類設計仍面臨一些挑戰，包括向量過濾效率低下以及搜尋精度受限等問題。為了解決這些問題，我們提出投票式（voting-based）的後處理機制來提升過濾效率，並延伸現行的角度編碼，以強化向量搜索的準確度。此外，我們實作了具管線化的 Top-K 單元，並將上述功能與 SRAM 雙模式記憶體整合於單一晶片中。實驗結果顯示，該設計在能耗與準確度方面皆優於現有的先進記憶體內運算加速器。針對使用 NAND 快閃記憶體的大規模向量搜索，其主要面對的挑戰包括有限的資料精度、較長的搜尋延遲以及元件本身的硬體變異性。為了克服這些問題，我們提出一系列演算法與硬體協同優化的策略，包括用於提高數值精度的多位元溫度計編碼（Multi-bit Thermometer Code）、可減少搜尋次數的非對稱向量相似度搜索（Asymmetric Vector Similarity Search），以及考量硬體變異性的硬體覺察式訓練（Hardware-Aware Training）。這些方法能有效提升搜索準確度，同時顯著降低能源消耗，實現具擴展性的高效能向量搜索系統。總結而言，本論文提出一套涵蓋不同記憶體技術的整合性方法，實現在不同資料規模下的能源效率最佳化架構。所提出的系統已在多種應用場景中進行實驗驗證，包括少樣本學習（Few-shot Learning）與近似最鄰近搜尋（Approximate Nearest Neighbor Search），展現了其在準確度與能源效率方面的優越表現。	zh_TW
dc.description.abstract	Vector similarity search (VSS) plays a critical role in a wide range of data-intensive applications, such as retrieval-augmented generation, recommendation systems, and image retrieval. However, performing VSS in conventional von Neumann architectures incurs significant energy and latency costs due to frequent data transfers between memory and processing units. To mitigate this, recent research has adopted in-memory search (IMS) architectures that bring computation closer to the data. Depending on the scale of the target dataset, different types of memory are preferred. For small-scale applications, SRAM-based designs offer fast access and high reliability. In contrast, NAND flash memory becomes more promising for large-scale datasets involving millions or billions of vectors due to its high density and storage capacity. In this thesis, we present energy-efficient IMS solutions that address both small- and large-scale VSS scenarios. For small-scale VSS using SRAM, most applications focus on exact search, which requires higher search accuracy. Two-stage search frameworks built on dual-mode SRAM arrays have emerged as a promising approach in this scenario. This framework leverages TCAM operation for coarse filtering and IMC operation for fine-grained similarity computation, which meets the high accuracy requirement with low energy consumption. However, this approach still faces challenges such as inefficient candidate filtering and limited search accuracy. To address these issues, we introduce a voting-based post-processing mechanism to improve filtering efficiency, while an extended angular encoding method enhances search accuracy in both the filtering and refinement stages. We further design and implement a pipelined Top-K unit and integrate it with SRAM-based dual-mode memory arrays into a single chip. The proposed design achieves significant improvements in both energy efficiency and accuracy compared to prior state-of-the-art designs. For large-scale VSS using NAND flash, key challenges arise from hardware limitations such as limited precision, long search latency, and the device variations. To overcome these issues, we propose a set of algorithm-hardware co-optimization strategies, including Multi-bit Thermometer Code for reliable distance encoding, Asymmetric Vector Similarity Search for reducing search iterations, and Hardware-Aware Training to mitigate accuracy loss caused by device variations. These methods enable efficient and scalable VSS with high retrieval accuracy, while significantly reducing energy consumption. In summary, this thesis proposes a comprehensive set of techniques that span different memory technologies to support energy-efficient in-memory VSS across varying data scales. The proposed frameworks have been rigorously evaluated on a range of tasks, including few-shot learning and approximate nearest neighbor search, demonstrating their effectiveness in both accuracy and energy efficiency.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-02-11T16:16:24Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-02-11T16:16:24Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 iii ABSTRACT v CONTENTS vii LIST OF FIGURES xi LIST OF TABLES xv Chapter 1 Introduction 1 1.1 Vector Similarity Search 1 1.1.1 Introduction to Vector Similarity Search 1 1.1.2 Applications of Vector Similarity Search 2 1.1.3 Memory Bottleneck 4 1.2 Part A: SRAM-based IMS for Small-scale VSS 7 1.2.1 Common Hardware for SRAM-based IMS 7 1.2.2 Dual-mode SRAM-based Memory Array 9 1.2.3 Design Challenges 11 1.2.4 Research Contributions 12 1.3 Part B: NAND-based Multi-bit Content Addressable Memory for Large-scale VSS 14 1.3.1 NAND-based Multi-bit Content Addressable Memory 14 1.3.2 Design Challenges 15 1.3.3 Research Contributions 16 1.4 Thesis Organization 18 Chapter 2 Review of In-memory Vector Similarity Search Frameworks 21 2.1 Principles of Two-stage Search Framework 21 2.1.1 Filtering Stage with TCAM 22 2.1.2 Refinement Stage with IMC 23 2.1.3 Hardware Architecture of Dual-mode Memory Array 25 2.2 Challenges of Two-stage Search Frameworks 27 2.2.1 Latency and Energy Overhead in Filtering Stage 27 2.2.2 Limited Accuracy of Distance-based Similarity Metrics 28 2.3 Principles of NAND-based MCAM 29 2.3.1 Architecture of NAND-based MCAM 29 2.3.2 VSS with NAND-based MCAM 31 2.4 Challenges of NAND-based MCAM for VSS 33 2.4.1 Limited Precision 34 2.4.2 Bottleneck Effect 35 2.4.3 Long Search Latency 38 2.4.4 Device Variations 39 2.5 Summary 40 Chapter 3 Small-scale VSS with SRAM-based Dual-mode Memory Array 43 3.1 Post-Processing Optimization 43 3.1.1 A Deep Look at the Filtering Stage 43 3.1.2 Voting Mechanism 45 3.1.3 Analysis of Energy-Accuracy Trade-off for Voting Mechanism 47 3.2 Angular Encoding Design 49 3.2.1 Locality Sensitive Hashing and Its Challenge 49 3.2.2 Extended Locality Sensitive Hashing 51 3.2.3 Discriminative Power Analysis 52 3.3 Evaluation Result 54 3.3.1 Experimental Settings 54 3.3.2 Filtering Overhead Analysis 55 3.3.3 Evaluation of the Proposed Two-stage Search Approach 56 3.4 Design and Implementation of SRAM-based Dual-mode IMS Engine 58 3.4.1 Overview of the system architecture 58 3.4.2 Design of the Top-K Unit 60 3.4.3 Chip Implementation and Dataflow Design 62 3.4.4 Comparison with the State-of-the-art Designs 64 3.5 Summary 65 Chapter 4 Large-scale VSS with NAND-based Multi-bit Content Addressable Memory 67 4.1 Multi-bit Thermometer Code 67 4.1.1 Encoding Design 68 4.1.2 Mismatch Level Analysis for MTMC 70 4.2 Asymmetric Vector Similarity Search 71 4.2.1 Asymmetric Setting 72 4.2.2 Accuracy Compensation through Modified Quantization-Aware Training 73 4.3 Hardware-aware Training 75 4.3.1 Hardware-Aware Training Flow 75 4.3.2 Training Flow for Many-Class Few-Shot Learning 77 4.3.3 Training Flow for Approximate Nearest-Neighbor Search 79 4.4 Evaluation Result 80 4.4.1 Experimental Settings 81 4.4.2 Pareto Front of Energy-Accuracy Trade-off 82 4.4.3 Comparison between SVSS and AVSS 85 4.5 Summary 86 Chapter 5 Conclusion 87 5.1 Main Contributions 87 5.2 Future Directions 88 References 91	-
dc.language.iso	en	-
dc.subject	記憶體內搜索	-
dc.subject	向量相似度搜索	-
dc.subject	三元可定址內容記憶體	-
dc.subject	In-memory search	-
dc.subject	Vector similarity search	-
dc.subject	Ternary Content Addressable Memory	-
dc.title	高效能記憶體內向量相似度比對系統	zh_TW
dc.title	Energy-Efficient In-Memory Vector Similarity Searching System	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳坤志;魏一勤;鄭湘筠	zh_TW
dc.contributor.oralexamcommittee	Kun-Chih Chen;I-Chyn Wey;Hsiang-Yun Cheng	en
dc.subject.keyword	記憶體內搜索,向量相似度搜索三元可定址內容記憶體	zh_TW
dc.subject.keyword	In-memory search,Vector similarity searchTernary Content Addressable Memory	en
dc.relation.page	95	-
dc.identifier.doi	10.6342/NTU202600487	-
dc.rights.note	未授權	-
dc.date.accepted	2026-02-02	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	6.12 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。