Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87567
Title: | 基於稀疏性之低記憶體使用量激活值壓縮引擎設計 Sparsity-based Activation Compression Engine Design for Low-memory Access in DLA |
Authors: | 王則勛 Tse-Hsun Wang |
Advisor: | 吳安宇 An-Yeu Wu |
Keyword: | 激活值壓縮,零資料壓縮,塊狀壓縮,繞過機制,有損壓縮,可調整架構, Activation compression,Zero-value compression,Block Compression,Bypass Mechanism,K-lossy,scalable architecture, |
Publication Year : | 2022 |
Degree: | 碩士 |
Abstract: | 隨著現在深度學習的蓬勃發展,深度學習已經是解決各種問題的重要方法。然而,深度學習的運算量非常龐大,以往只能在伺服器上做運算。然而近年來,大資料時代讓資料量呈現指數性的成長。如果我們只於伺服器上做深度學習運算,我們必須面對傳輸資料時間過長和資料隱私的問題。為了解決上述問題,大多研究都指向把深度學習實做在邊緣端,利用深度學習加速器提高運算效能。邊緣端的深度學習加速器仍然需要克服許多困難,最重要的是其高能耗的特性。其能量消耗來自兩個原因,一個是運算上的消耗,另一個是在資料傳輸上的消耗,而後者也是大家往往所忽略的。在深度學習加速器上,每當進行一層運算時,我們時常需要先把激活值的結果從動態隨機存取記憶體 (DRAM) 中取出,運算後再將結果放進DRAM中,因此造成高能量消耗。針對這個問題,本文利用資料壓縮的方式,將輸出激活值壓縮,以減少能量的消耗。本文會利用激活值有很高稀疏性的特性,使用零資料壓縮 (Zero-value Compression, ZVC)技術,此外我們還會搭配塊狀壓縮 (Block Compression, BC) 和繞過機制 (Bypass Mechanism),讓壓縮率來到2.39倍。另外,我們也提出K有損壓縮 (K-lossy Compression),在只降低0.4%準確率的情況下,讓壓縮率來到3.73倍。最後,我們會結合上述提及的演算法優化技術,提出一可調整架構(Scalable architecture)的資料壓縮/解壓縮引擎,相較於代表作,吞吐量提高19%,並只有增加8%的面積。最後用DRAMSim2來驗證此引擎能降低56%在DRAM資料傳輸上的消耗。 As the development of deep learning (DL) has become more and more popular, DL has become an important solution to different kinds of problems. However, DL requires a large amount of computation, which can be computed on the cloud. In recent years, the number of data increases exponentially. Thus, cloud-based DL systems face the challenge of large data transmissions and data privacy leakages. To address these issues, most of the research aims to move the inferencing of the DL system to edge devices. The DL accelerators are developed to enhance the computational efficiency of the inferencing process. However, the DLA consumes a lot of energy. There are two aspects to reducing energy consumption: computation and data transmission. We will focus on reducing the energy consumption of the data transmission as it is the bottleneck in the current DLA. When computing a layer in the DLA, the activations are fetched from the DRAM. After computations in DLA, the output activations are stored back in DRAM. The data transmission between the DLA and DRAM causes high energy consumption. In this thesis, we use activation compression (AC) techniques to reduce the transmission between the DLA and DRAM and thus reduce the overall energy consumption. We exploit the high sparsity of activations generated from the ReLU function. The zero-value compression (ZVC) is combined with the block compression and the bypass mechanism. It can achieve a ×2.39 compression ratio. We also propose two K-lossy compression techniques, that is mixed-K lossy compression and K-lossy aware training. With 0.4% accuracy drops, we can achieve a ×3.73 compression ratio. Finally, combining the above algorithms, we propose a scalable architecture and implement it with hardware. The proposed scalable architecture can outperform the state-of-the-art by increasing by 19% throughput with 8% hardware overhead. The overall system’s energy consumption is also verified with DRAMSim2, showing that our method reduces read energy and write energy by 56%. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87567 |
DOI: | 10.6342/NTU202210009 |
Fulltext Rights: | 未授權 |
Appears in Collections: | 電子工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-111-1.pdf Restricted Access | 4.11 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.