Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 生物機電工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92095
Title: 卷積神經網路之深度學習加速器架構設計
Architecture Design of Deep Learning Accelerator for CNN
Authors: 邵長威
Chang-Wei Shao
Advisor: 方煒
Wei Fang
Keyword: 計算機架構,深度學習加速器,深度卷積神經網路,人工智慧,演算法,
Computer Architecture,Deep Learning Accelerator (DLA),Deep Convolutional Neural Networks (DCNN),Artificial Intelligence (AI),Algorithms,
Publication Year : 2024
Degree: 碩士
Abstract: 由於深度卷積神經網路 (DCNN) 在人工智慧 (AI) 的應用中對計算力的需求指數成長,對於專用硬體的架構設計來說是一大挑戰。小型系統的功能支援上通常不齊全,而大型晶片變得過於複雜。在不同功能的運算上,同樣仰賴太多模組的切換,導致過多資料搬移,有高延遲與部分模組使用率不高的問題。
這項研究提出了一種基於統計方法的改進結構,包括參數分析、演算法劃分後映射到硬體,以在暫存器傳輸級 (RTL) 建構出一個管線的硬體架構。
研究結果顯示,本論文的架構在單核處理元件 (PE) 上比起其他評比 (e.g, MIT Eyeriss, UCLA DCNN Acc., Google TPU) 的設計,有較佳的硬體重用率、硬體共用性且支援最多功能,也減少了激活函數時的延遲,並確保了所有功能的運算都在核心內完成,且擁有相同的週波時間。
The demand for computing power in deep convolutional neural networks (DCNNs) for artificial intelligence (AI) applications is exponentially increasing. Designing application-specific hardware architectures poses significant challenges. Smaller systems typically lack essential functionality, while large-scale chips become overly complex. Dynamic switching of modules with different functions results in excessive data movement, leading to high latency and underutilization of certain modules.
This research proposes a statistical-based improved structure comprising parameter analysis, algorithm partitioning, and hardware mapping to construct a pipeline-based architecture in Register Transfer Level (RTL).
The research results indicate that this architecture demonstrates improved hardware reuse, sharing, and functionality on a single processing element (PE) compared to the benchmark. (e.g, MIT Eyeriss, UCLA DCNN Acc., Google TPU). It reduces activation function latency and ensures that all functions operate in-core with consistent cycle times.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92095
DOI: 10.6342/NTU202400510
Fulltext Rights: 同意授權(全球公開)
Appears in Collections:生物機電工程學系

Files in This Item:
File SizeFormat 
ntu-112-1.pdf3.76 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved