透過共享 GPU 記憶體的發佈/訂閱中介軟體實現低延遲邊緣人工智慧通訊

官澔恩; Hao-En Kuan

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97774

標題:	透過共享 GPU 記憶體的發佈/訂閱中介軟體實現低延遲邊緣人工智慧通訊 Low-Latency Edge AI Communication with Pub/Sub Middleware via GPU Memory Sharing
作者:	官澔恩 Hao-En Kuan
指導教授:	洪士灝 Shih-Hao Hung
關鍵字:	發佈/訂閱中介軟體,圖形處理器,共享記憶體,即時動態記憶體配置器, Pub/Sub Middleware,GPU,Shared Memory,Real-Time Dynamic Memory Allocator,
出版年 :	2025
學位:	碩士
摘要:	即時邊緣人工智慧應用通常需要有效率的 GPU 資料處理與傳輸。由於這類應用通常高度模組化，因此廣泛使用發佈－訂閱模式以在各個元件之間傳遞資料。然而，現有的發佈－訂閱中介軟體在 GPU 與主記憶體之間會產生多餘的記憶體複製，導致顯著的延遲。為了解決這個問題，我們提出了認知 GPU 的發佈－訂閱通訊機制（GPU-Aware Pub/Sub communication，簡稱 GAPS），這是一種通用解決方案，將共享的 CUDA 記憶體與現有的發佈－訂閱中介軟體（如 Zenoh-pico 和 Iceoryx）整合在一起。GAPS 透過讓發佈者與訂閱者共享 GPU 記憶體，來消除不必要的記憶體複製，進而大幅降低資料傳輸延遲。在我們的設計中，我們提出了一個獨立的共享 CUDA 記憶體管理器，會在每個「主題」初始化時，為該「主題」建立一個共享的 CUDA 記憶體池。為了在此記憶體池實現細粒度的記憶體分配，我們修改了一種即時動態記憶體配置器 Two-Level Segregated Fit（TLSF），使其具備多執行緒安全性且能管理 GPU 記憶體。此外，我們還開發了 PyGAPS，一個用於加速發佈 PyTorch 張量的延伸版本，能消除在人工智慧應用中的序列化開銷。根據我們的實驗結果，GAPS 顯著降低端到端延遲，並提升簡化的電腦視覺流程的吞吐量（在影像分割任務中提升最多達 1.5 倍，在分類任務中提升最多達 3.8 倍），是一個適用於即時邊緣人工智慧的穩健解決方案。 Real-time Edge AI applications often require efficient GPU-based data processing and communication. Since the applications are typically highly modularized, publish–subscribe (pub/sub) pattern is widely used to deliver data among components. However, existing pub/sub middleware introduces significant latency due to redundant memory copies between GPU and host memory. To address this, we propose GPU-Aware Pub/Sub communication (GAPS), a universal solution that integrates shared CUDA memory with existing pub/sub middleware, such as Zenoh-pico and Iceoryx. GAPS minimizes data transfer latency by enabling GPU memory sharing between publishers and subscribers, eliminating unnecessary memory copies. In our work, we propose an independent shared CUDA memory manager that creates a shared CUDA memory pool for each topic during a topic’s initialization. For fine-grained allocation from the pool, we modify Two-Level Segregated Fit (TLSF), a real-time dynamic memory allocator, making it process-safe and capable of managing GPU memory. Additionally, we develop PyGAPS, an extension that accelerates publications of PyTorch tensors, eliminating serialization overhead in AI-driven applications. Our evaluation demonstrates that GAPS significantly reduces end-to-end latency and improves throughput of simplified computer vision pipelines—by up to 1.5× in the segmentation task and 3.8× in the classification task—making it a robust solution for real-time Edge AI.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97774
DOI:	10.6342/NTU202501397
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2025-07-17
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	2.05 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。