適用於記憶體增強型人工智慧之高效能向量相似度比對技術

黃其澤; Chi-Tse Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101470

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳安宇	zh_TW
dc.contributor.advisor	An-Yeu Wu	en
dc.contributor.author	黃其澤	zh_TW
dc.contributor.author	Chi-Tse Huang	en
dc.date.accessioned	2026-02-03T16:31:46Z	-
dc.date.available	2026-02-04	-
dc.date.copyright	2026-02-03	-
dc.date.issued	2026	-
dc.date.submitted	2026-01-19	-
dc.identifier.citation	[1] Statista, “Volume of data created, captured, copied, and consumed worldwide from 2010 to 2029,” July 2024. Accessed: 2025-12-17. [2] G. Dai, X. Chen, M. Gao, and Z. Zhu, “Sparse acceleration for artificial intelligence: Progress and trends,” in Proceedings of the 2024 29th Asia and South Pacific De sign Automation Conference (ASP-DAC), (Incheon, South Korea), 2024. Tutorial Session. [3] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neu ral networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017. [4] H.-W. Hu, W.-C. Wang, Y.-H. Chang, Y.-C. Lee, B.-R. Lin, H.-M. Wang, Y.-P. Lin, Y.-M. Huang, C.-Y. Lee, T.-H. Su, et al., “ICE: An intelligent cognition engine with 3D NAND-based in-memory computing for vector similarity search acceleration,” in Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 763–783, 2022. [5] L. Perron and V. Furnon, OR-Tools v9.7. Google, Mountain View, CA, 2023. [6] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in Proceedings of the International Con ference on Machine Learning (ICML), pp. 1842–1850, 2016. [7] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Advances in Neural Information Processing Sys tems (NeurIPS), vol. 33, pp. 9459–9474, 2020. [8] A. Catav, R. Schwartz, I. Weiner, N. Drory, and A. Feder, “RAG makes LLMs better and equal.” Pinecone Engineering Blog, January 2024. Accessed: 2025-12-17. [9] H. V. Simhadri, “Approximate nearest neighbor search systems at scale.” Microsoft Research Talk, 2021. [10] C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” Proceedings of the VLDB Endowment, vol. 12, no. 5, pp. 461–474, 2019. [11] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances in Neural Information Processing Systems (NeurIPS), vol. 29, 2016. [12] L. Liu, T. Zhou, G. Long, J. Jiang, and C. Zhang, “Many-class few-shot learning on multi-granularity class hierarchy,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2293–2305, 2020. [13] A. Ranjan, S. Jain, J. R. Stevens, D. Das, B. Kaul, and A. Raghunathan, “X-MANN: A crossbar based architecture for memory augmented neural networks,” in Proceed ings of the 56th Annual Design Automation Conference (DAC), pp. 1–6, 2019. [14] A. Bremler-Barr, Y. Harchol, D. Hay, and Y. Hel-Or, “Encoding short ranges in TCAM without expansion: Efficient algorithm and applications,” IEEE/ACM Trans actions on Networking, vol. 26, no. 02, pp. 835–850, 2018. [15] A. F. Laguna, M. Niemier, and X. S. Hu, “Design of hardware-friendly memory enhanced neural networks,” in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1583–1586, 2019. [16] A. F. Laguna, X. Yin, D. Reis, M. Niemier, and X. S. Hu, “Ferroelectric FET based in-memory computing for few-shot learning,” in Proceedings of the 2019 Great Lakes Symposium on VLSI (GLSVLSI), pp. 373–378, 2019. [17] K. Ni, X. Yin, A. F. Laguna, S. Joshi, S. Dünkel, M. Trentzsch, J. Müller, S. Beyer, M. Niemier, X. S. Hu, et al., “Ferroelectric ternary content-addressable memory for one-shot learning,” Nature Electronics, vol. 2, no. 11, pp. 521–529, 2019. [18] R. Mao, B. Wen, A. Kazemi, Y. Zhao, A. F. Laguna, R. Lin, N. Wong, M. Niemier, X. S. Hu, X. Sheng, et al., “Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search,” Nature Communica tions, vol. 13, no. 1, p. 6284, 2022. [19] H. Li, W.-C. Chen, A. Levy, C.-H. Wang, H. Wang, P.-H. Chen, W. Wan, H.-S. P. Wong, and P. Raina, “One-shot learning with memory-augmented neural networks using a 64-kbit, 118 GOPS/W RRAM-based non-volatile associative memory,” in Proceedings of the Symposium on VLSI Technology, pp. 1–2, 2021. [20] H. Li, W.-C. Chen, A. Levy, C.-H. Wang, H. Wang, P.-H. Chen, W. Wan, W.-S. Khwa, H. Chuang, Y.-D. Chih, et al., “SAPIENS: A 64-kb RRAM-based non -volatile associative memory for one-shot learning and inference at the edge,” IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6637–6643, 2021. [21] M. Imani, X. Yin, J. Messerly, S. Gupta, M. Niemier, X. S. Hu, and T. Rosing, “Searchd: A memory-centric hyperdimensional computing with stochastic training,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2422–2433, 2019. [22] A. Kazemi, F. Müller, M. M. Sharifi, H. Errahmouni, G. Gerlach, T. Kämpfe, M. Imani, X. S. Hu, and M. Niemier, “Achieving software-equivalent accuracy for hyperdimensional computing with ferroelectric-based in-memory computing,” Sci entific Reports, vol. 12, no. 1, p. 19201, 2022. [23] X. Yin, Y. Qian, M. Imani, K. Ni, C. Li, G. L. Zhang, B. Li, U. Schlichtmann, and C. Zhuo, “Ferroelectric ternary content addressable memories for energy-efficient associative search,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 4, pp. 1099–1112, 2022. [24] L. Liu, M. M. Sharifi, R. Rajaei, A. Kazemi, K. Ni, X. Yin, M. Niemier, and X. S. Hu, “Eva-cam: A circuit/architecture-level evaluation tool for general content ad dressable memories,” in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1173–1176, 2022. [25] D. Fujiki, S. Mahlke, and R. Das, “In-memory data parallel processor,” ACM SIG PLAN Notices, vol. 53, no. 2, pp. 1–14, 2018. [26] X. Yin, M. Niemier, and X. S. Hu, “Design and benchmarking of ferroelectric FET based TCAM,” in Proceedings of the Design, Automation & Test in Europe Confer ence & Exhibition (DATE), pp. 1444–1449, 2017. [27] W. Kang, H. c. Wang, Z. Wang, Y. Zhang, and W. Zhao, “In-memory processing paradigm for bitwise logic operations in STT–MRAM,” IEEE Transactions on Mag netics, vol. 53, no. 11, pp. 1-4, 2017. [28] S. Li, L. Liu, P. Gu, C. Xu, and Y. Xie, “NVsim-CAM: A circuit-level simulator for emerging nonvolatile memory based content-addressable memory,” in Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD), pp. 1–7, 2016. [29] M. Imani, S. Bosch, S. Datta, S. Ramakrishna, S. Salamat, J. M. Rabaey, and T. Ros ing, “QuantHD: A quantization framework for hyperdimensional computing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2268–2278, 2019. [30] C.-T. Huang, C.-Y. Chang, Y.-C. Chuang, and A.-Y. Wu, “PQ-HDC: Projection- based quantization scheme for flexible and efficient hyperdimensional computing,” in IFIP International Conference on Artificial Intelligence Applications and Inno vations, pp. 425–435, Springer, 2021. [31] M. Imani, S. Salamat, B. Khaleghi, M. Samragh, F. Koushanfar, and T. Rosing, “SparseHD: Algorithm-hardware co-optimization for efficient high-dimensional computing,” in Proceedings of the 2019 IEEE 27th Annual International Sympo sium on Field-Programmable Custom Computing Machines (FCCM), pp. 190–198, 2019. [32] C.-Y. Chang, Y.-C. Chuang, E.-J. Chang, and A.-Y. A. Wu, “MulTa-HDC: A multi- task learning framework for hyperdimensional computing,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1269–1284, 2021. [33] C.-T. Huang, C.-Y. Chang, H.-Y. Cheng, and A.-Y. Wu, “BORE: Energy-efficient banded vector similarity search with optimized range encoding for memory augmented neural network,” in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6, 2024. [34] B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum, “One shot learning of simple visual concepts,” in Proceedings of the Annual Meeting of the Cognitive Science Society, 2011. [35] M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pp. 380–388, ACM, 2002. [36] A. Kazemi, S. Sahay, A. Saxena, M. M. Sharifi, M. Niemier, and X. S. Hu, “A flash- based multi-bit content-addressable memory with euclidean squared distance,” in Proceedings of the 2021 IEEE/ACM International Symposium on Low Power Elec tronics and Design (ISLPED), pp. 1–6, 2021. [37] X. Yin, K. Ni, D. Reis, S. Datta, M. Niemier, and X. S. Hu, “An ultra-dense 2Fe FET TCAM design based on a multi-domain FeFET model,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, no. 9, pp. 1577–1581, 2018. [38] C. Jin, H. Wang, and K. G. Shin, “Hop-count filtering: An effective defense against spoofed DDoS traffic,” in Proceedings of the 10th ACM Conference on Computer and Communications Security (CCS), pp. 30–41, ACM, 2003. [39] M. S. Seddiki, M. Shahbaz, S. Donovan, S. Grover, M. Park, N. Feamster, and Y.-Q. Song, “FlowQoS: QoS for the rest of us,” in Proceedings of the Third Workshop on Hot Topics in Software Defined Networking, pp. 207–208, ACM, 2014. [40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. [41] P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC), pp. 604–613, 1998. [42] W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin, “Approximate nearest neighbor search on high dimensional data—experiments, analyses, and im provement,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1475–1488, 2019. [43] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, 2014. [44] D. Dua and C. Graff, “UCI machine learning repository,” 2017. [45] Y. Lee, H. Choi, S. Min, H. Lee, S. Beak, D. Jeong, J. W. Lee, and T. J. Ham, “ANNA: Specialized architecture for approximate nearest neighbor search,” in Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 169–183, 2022. [46] X. Peng, S. Huang, Y. Luo, X. Sun, and S. Yu, “DNN+ NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” in Proceedings of the IEEE International Electron Devices Meeting (IEDM), pp. 32–5, 2019.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101470	-
dc.description.abstract	隨著全球網路產生的資料指數上升，國際數據組織 (International Data Corporation, IDC) 指出，在2029年時，全球的年資料產生量將會達到527.5兆GB。同時，少量樣本學習 (Few-shot Learning, FSL) 的發展與生成式人工智慧 (Generative Artificial Intelligence, Generative AI) 及檢索增強生成 (Retrieval-Augmented Generation, RAG) 技術的興起，結合外部記憶體 (External Memory) 與向量相似度搜尋 (Vector Similarity Search, VSS) 能解決過去AI難以拓展至未學習之資料分布的問題，並減少大語言模型 (Large Language Model, LLM) 產生幻覺(Hallucination) 的可能性。因此，記憶體增強型人工智慧 (Memory-Augmented Artificial Intelligence) 與向量相似度比對已成為現代 AI 應用的核心架構。現有的搜索系統面臨記憶體傳輸瓶頸，即使處理器的運算速度遠快於記憶體讀寫，資料的處理速度仍然會受到記憶體傳輸的頻寬所限制。傳統馮·紐曼架構 (von Neumann Architecture) 中，處理器與記憶體的分離導致了嚴重的資料傳輸瓶頸 (Memory Wall)，難以應對大量資料的檢索需求。基於記憶體內搜尋（In-Memory Search, IMS）之非馮紐曼架構(non-von Neumann Architecture) 逐漸興起，三元內容定址記憶體 (Ternary Content-Addressable Memory, TCAM) 將搜索比對單元嵌入記憶體單元中，以解決資料傳輸的瓶頸問題，並具備低功耗、低延遲以及高密度之優點，能有效提升搜索之能源效率，逐漸成為解決大規模之向量相似度搜索的關鍵技術。在本論文中，我們的目標在於利用演算法與架構協同優化之概念，提升TCAM在向量相似度搜索中的能量效率與準確度。儘管TCAM具備高平行度的搜尋優勢，但現有的向量相似度搜索系統仍面臨三大挑戰：首先，精確比對型TCAM (Exact-Match TCAM, EX-TCAM) 需要多次迭代搜尋與極長的編碼長度，導致高延遲與面積消耗；其次，餘弦相似度 (Cosine Similarity) 與硬體可實現之漢明距離 (Hamming Distance) 或 L∞ 範數之間存在度量不一致性，造成搜索準確度下降；最後，最佳比對型TCAM (Best-Match TCAM, Best-TCAM) 需雖能達餘弦相似度搜索，但仍需使用極長之編碼，不利於邊緣裝置部署。為了克服上述困難，本論文提出了帶狀向量相似度比對 (Banded VSS)，利用統計特徵限縮搜尋範圍以減少搜索迭代次數，並開發支援範圍對範圍 (Range-to-Range) 比對的編碼機制以縮短編碼長度;此外，本論文亦同時引入基於 L∞ 範數的訓練機制，提升向量搜索準確率，確保向量特徵與搜索行為於軟體與硬體間之一致性。此外，我們基於近似運算 (Approximate Computing) 的概念，提出了範圍保真度感知之範圍編碼 (Range Fidelity-aware Range Encoding)，利用搜尋過程中的容錯特性進行有損編碼優化 (Lossy Encoding Optimization)，在維持搜索準確率的前提下，大幅降低編碼長度與硬體成本。此外，本論文提出了分段餘弦相似度 (Segmented Cosine Similarity) 與其架構，透過對漢明距離的重新詮釋與推導，使其能有效支援高維度向量的角度比對。最終，本論文提出適用於記憶體內搜尋之向量相似度搜索架構，並在台積電 (TSMC) 28 奈米製程環境下整合記憶體內搜索模塊及數位模組，實現上述演算法於晶片實作。量測結果驗證此搜索晶片在少量樣本學習應用中，能達到極高的能源效率與準確度。	zh_TW
dc.description.abstract	With the exponential growth of data, the international data corporation (IDC) projects that global annual data generation will reach 527.5 trillion GB by 2029. Simultaneously, the development of few-shot learning (FSL) and the rise of retrieval-augmented generation (RAG) have demonstrated that combining external memory with vector similarity search (VSS) can address the limitations of AI in handling unlearned data distributions, while reducing hallucinations in large language models (LLMs). Consequently, memory-augmented AI and VSS have become core structures for modern AI applications. However, existing search systems face memory transmission bottlenecks. Even though processor speeds far exceed memory read/write speeds, data processing throughput remains limited by memory bandwidth. In traditional von Neumann architectures, the separation of processor and memory creates a severe “Memory Wall.” In-memory search (IMS) based on non-von Neumann architecture is emerging as a solution. Ternary content-addressable memory (TCAM) embeds comparison units directly within memory cells to resolve these bottlenecks. With advantages in energy consumption, latency, and density, TCAM enhances energy efficiency and becomes a key technology for large-scale VSS. This dissertation aims to improve the energy efficiency and accuracy of TCAM-based VSS through algorithm-architecture co-optimization. Despite the advantages of high parallelism, existing VSS systems face three major challenges: First, Exact-Match TCAM (EX-TCAM) requires multiple iterative searches and extremely long encoding lengths, resulting in high latency and area overhead. Second, there is a metric mismatch between Cosine Similarity and hardware-realizable metrics such as Hamming Distance or the L∞ norm, causing a degradation in search accuracy. Finally, while Best-Match TCAM (Best-TCAM) can achieve cosine similarity search, it requires excessive encoding lengths that are unfavorable for deployment on edge devices. This dissertation proposes banded VSS which utilizes statistical features to narrow the search scope, thereby reducing the number of search iterations, and develops a range encoding scheme supporting range-to-range matching to shorten codeword length. Simultaneously, we introduce an L∞ norm-based training mechanism to enhance search accuracy, ensuring consistency in search mechanisms across software and hardware. Furthermore, we propose range fidelity-aware range encoding, which utilizes the error tolerance inherent in the search process to perform lossy encoding optimization. This reduces codeword length while maintaining search accuracy. Additionally, we propose segmented cosine similarity and its corresponding framework through the reinterpretation of Hamming distance. Finally, we implement the aforementioned algorithms on a chip by integrating IMS modules and digital modules using TSMC 28nm technology. Measurement results verify that this chip achieves high energy efficiency and accuracy in FSL.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-02-03T16:31:46Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-02-03T16:31:46Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 iii 摘要 v Abstract vii Contents xi List of Figures xvii List of Tables xxi Chapter 1 Introduction 1 1.1 Background 1 1.1.1 Growing Gap between Vector Database 1 1.1.2 Memory Wall Bottleneck 2 1.2 In-Memory Search Architecture 4 1.3 Redundancy-aware Search Complexity Reduction 7 1.3.1 Design Challenges 7 1.3.2 Research Contributions 8 1.4 Fidelity-aware Encoding Optimization 10 1.4.1 Design Challenges 10 1.4.2 Research Contributions 10 1.5 Hamming Distance-Compatible Angular Metric 11 1.5.1 Design Challenges 12 1.5.2 Research Contributions 13 1.6 TCAM-based Framework for Angular VSS 13 1.6.1 Design Challenges 14 1.6.2 Research Contributions 14 1.7 Dissertation Organization 15 Chapter 2 Review of Related Works 17 2.1 Memory-Augmented Artificial Intelligence 17 2.2 TCAM-based VSS Framework 19 2.2.1 Implementations of EX-TCAM-based VSS 20 2.2.2 Implementations of Best-TCAM-based VSS 21 2.3 Pre-processing Techniques for TCAM-based VSS 22 2.3.1 Block Encoding in the Spatial Domain 23 2.3.2 Probabilistic Hashing in the Angular Domain 25 2.3.3 Pre-Processing Techniques with Range-based Format 26 2.3.4 Ternary Encoding Scheme for Range Representation 28 2.4 Summary 29 Chapter 3 Banded VSS with Optimized Range-to-Range Encoding (BORE) 31 3.1 Overview of Proposed Framework 31 3.2 Proposed Banded L∞ Norm Distance-based Search 33 3.2.1 Challenges of Iterative Mechanism for EX-TCAM-based VSS 33 3.2.2 Observations of Characteristics across Search Iterations 34 3.2.3 Motivational Experiments 35 3.2.4 Banded L∞ Norm Distance Search (L∞b ) with Iteration Skipping 37 3.3 Proposed Range Encoding Scheme Supporting Range-to-Range Search 38 3.3.1 Challenges of Exploiting Banded VSS for Codeword Length 38 3.3.2 Range Redistribution Strategy for Banded VSS 39 3.3.3 Range Encoding Optimization for Range-to-Range Search with Constrained Programming 40 3.4 Proposed Distance-based Training Mechanism for VSS 44 3.4.1 Challenges of Training Mechanism with Cosine Similarity 44 3.4.2 L∞ Norm Distance-based Training Mechanism 44 3.5 Performance Evaluation 45 3.5.1 Simulation Setup 45 3.5.2 Energy-Accuracy Trade-off 46 3.5.3 Analysis of EX-TCAM-based Search Methods 49 3.5.4 Analysis of Similarity Metrics in Training Stage 49 3.5.5 Comparison with Best-TCAM-based VSS 49 3.6 Summary 51 Chapter 4 Fidelity-aware Optimization for Range Encoding (FORE) 53 4.1 Challenges of Codeword Reduction 53 4.2 Proposed Lossy Optimization for Range Encoding Scheme Design 54 4.2.1 Range Fidelity (RF) in Range Encoding 54 4.2.2 Fidelity-aware Optimization for Lossy Range Encoding 56 4.3 Hardware Implementation of IMS-based VSS Engine 58 4.3.1 Proposed In-Memory Search Architecture 58 4.3.2 Design of Encoder for Range Encoding Module 59 4.4 Experimental Results 61 4.4.1 Settings of Dataset and Simulation 61 4.4.2 Energy-Accuracy Trade-off 61 4.4.3 Analysis of EX-TCAM-based Search Methods 65 4.4.4 Analysis of Range Fidelity 67 4.4.5 Comparison with Best-TCAM-based VSS 68 4.5 Measurement Results 69 4.6 Summary 70 Chapter 5 Segmented Cosine Similarity 73 5.1 Proposed Bounds for Angular Similarity Metrics 73 5.1.1 Challenges of Implementing Angular Metrics in TCAM-based VSS 73 5.1.2 Motivational Experiments 74 5.1.3 Angular Perspective for Interpreting Hamming Distance 75 5.1.4 Rigorous Bound of Angular Difference 77 5.2 Proposed Hamming Distance-Compatible Segmented Cosine Similarity 80 5.2.1 Challenges of Proxy Metric Development in Angular Domain 80 5.2.2 Operational Estimates of Angular Difference 80 5.2.3 Boundary-aware Alignment Technique of Estimates 82 5.3 Proposed Complementary Estimates 84 5.3.1 Motivational Experiments 84 5.3.2 Processing Flow of Complementary Estimates 85 5.4 Performance Evaluation 86 5.4.1 Simulation Setup 86 5.4.2 Evaluation of Recall Rate 88 5.4.3 Analysis of Curse of Dimensionality 89 5.5 Summary 89 Chapter 6 Angular TCAM-based VSS Framework with Segmented Cosine Similarity (Seg-Cos) 91 6.1 Proposed Angular Pre-processing Techniques for TCAM Architecture 91 6.1.1 Challenge of Integrating Segmented Cosine Similarity into IMS-based Architecture 91 6.1.2 Processing Flow of IMS-based Segmented Cosine Similarity 92 6.2 Proposed Encoding for TCAM-based Segmented Cosine Similarity 94 6.2.1 Challenges of Encoding Scheme Design for Circular Distance Calculation 94 6.2.2 Motivational Experiments 95 6.2.3 Möbius Encoding for Circular Range Representation 95 6.3 Proposed Codeword-Length-Aware Adaptive Don’t Care Masking 97 6.4 Proposed Complementary Querying with Codeword Inversion 98 6.5 Performance Evaluation 99 6.5.1 Simulation Setup 99 6.5.2 Energy-Recall Trade-off 101 6.5.3 Analysis of Strength between Different Estimates 102 6.5.4 Analysis of Optimal Don’t Care Ratio 103 6.6 Summary 103 Chapter 7 Conclusions and Future Works 105 7.1 Design Achievements 105 7.2 Future Works 106 References 109	-
dc.language.iso	en	-
dc.subject	記憶體內搜索 (IMS)	-
dc.subject	三元內容定址記憶體 (TCAM)	-
dc.subject	向量相似度搜尋 (VSS)	-
dc.subject	In-Memory Search (IMS)	-
dc.subject	Ternary Content-Addressable Memory (TCAM)	-
dc.subject	Vector Similarity Search (VSS)	-
dc.title	適用於記憶體增強型人工智慧之高效能向量相似度比對技術	zh_TW
dc.title	Energy-Efficient Vector Similarity Search for Memory-Augmented Artificial Intelligence	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	鄭湘筠;李進福;魏一勤;曾柏皓;張原豪;粘儆夫	zh_TW
dc.contributor.oralexamcommittee	Hsiang-Yun Cheng;Jin-Fu Li;I-Chyn Wey;Po-Hao Tseng;Yuan-Hao Chang;Chin-Fu Nien	en
dc.subject.keyword	記憶體內搜索 (IMS),三元內容定址記憶體 (TCAM)向量相似度搜尋 (VSS)	zh_TW
dc.subject.keyword	In-Memory Search (IMS),Ternary Content-Addressable Memory (TCAM)Vector Similarity Search (VSS)	en
dc.relation.page	115	-
dc.identifier.doi	10.6342/NTU202600061	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2026-01-20	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
dc.date.embargo-lift	2026-02-04	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	15.92 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。