Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97774
Title: 透過共享 GPU 記憶體的發佈/訂閱中介軟體實現低延遲邊緣人工智慧通訊
Low-Latency Edge AI Communication with Pub/Sub Middleware via GPU Memory Sharing
Authors: 官澔恩
Hao-En Kuan
Advisor: 洪士灝
Shih-Hao Hung
Keyword: 發佈/訂閱中介軟體,圖形處理器,共享記憶體,即時動態記憶體配置器,
Pub/Sub Middleware,GPU,Shared Memory,Real-Time Dynamic Memory Allocator,
Publication Year : 2025
Degree: 碩士
Abstract: 即時邊緣人工智慧應用通常需要有效率的 GPU 資料處理與傳輸。由於這類應用通常高度模組化,因此廣泛使用發佈-訂閱模式以在各個元件之間傳遞資料。然而,現有的發佈-訂閱中介軟體在 GPU 與主記憶體之間會產生多餘的記憶體複製,導致顯著的延遲。為了解決這個問題,我們提出了認知 GPU 的發佈-訂閱通訊機制(GPU-Aware Pub/Sub communication,簡稱 GAPS),這是一種通用解決方案,將共享的 CUDA 記憶體與現有的發佈-訂閱中介軟體(如 Zenoh-pico 和 Iceoryx)整合在一起。GAPS 透過讓發佈者與訂閱者共享 GPU 記憶體,來消除不必要的記憶體複製,進而大幅降低資料傳輸延遲。在我們的設計中,我們提出了一個獨立的共享 CUDA 記憶體管理器,會在每個「主題」初始化時,為該「主題」建立一個共享的 CUDA 記憶體池。為了在此記憶體池實現細粒度的記憶體分配,我們修改了一種即時動態記憶體配置器 Two-Level Segregated Fit(TLSF),使其具備多執行緒安全性且能管理 GPU 記憶體。此外,我們還開發了 PyGAPS,一個用於加速發佈 PyTorch 張量的延伸版本,能消除在人工智慧應用中的序列化開銷。根據我們的實驗結果,GAPS 顯著降低端到端延遲,並提升簡化的電腦視覺流程的吞吐量(在影像分割任務中提升最多達 1.5 倍,在分類任務中提升最多達 3.8 倍),是一個適用於即時邊緣人工智慧的穩健解決方案。
Real-time Edge AI applications often require efficient GPU-based data processing and communication. Since the applications are typically highly modularized, publish–subscribe (pub/sub) pattern is widely used to deliver data among components. However, existing pub/sub middleware introduces significant latency due to redundant memory copies between GPU and host memory. To address this, we propose GPU-Aware Pub/Sub communication (GAPS), a universal solution that integrates shared CUDA memory with existing pub/sub middleware, such as Zenoh-pico and Iceoryx. GAPS minimizes data transfer latency by enabling GPU memory sharing between publishers and subscribers, eliminating unnecessary memory copies. In our work, we propose an independent shared CUDA memory manager that creates a shared CUDA memory pool for each topic during a topic’s initialization. For fine-grained allocation from the pool, we modify Two-Level Segregated Fit (TLSF), a real-time dynamic memory allocator, making it process-safe and capable of managing GPU memory. Additionally, we develop PyGAPS, an extension that accelerates publications of PyTorch tensors, eliminating serialization overhead in AI-driven applications. Our evaluation demonstrates that GAPS significantly reduces end-to-end latency and improves throughput of simplified computer vision pipelines—by up to 1.5× in the segmentation task and 3.8× in the classification task—making it a robust solution for real-time Edge AI.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97774
DOI: 10.6342/NTU202501397
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2025-07-17
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
Access limited in NTU ip range
2.05 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved