請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70635
標題: | 應用於卷積神經網路之可調變式紋理單元 Design and Implementation of Configurable Texture Unit for Convolutional Neural Network |
作者: | Yi-Hsiang Chen 陳貽祥 |
指導教授: | 簡韶逸 |
關鍵字: | 卷積神經網路,圖形處理器,紋理單元,可調變,硬體, CNN,GPU,Texture Unit,Configurable,Hardware,Deep, |
出版年 : | 2018 |
學位: | 碩士 |
摘要: | 這幾年來,深度神經網路的發展使得人工智慧再次得到重視。在許多神經網路類型當中,卷積神經網路是最為普遍流行的模型,它可以靠多個卷積運算抽出目標的特徵。儘管它有許多優點,大量的卷積運算也使得卷積神經網路需要一個強大運算能力的裝置來執行。由於圖形處理器的強大平行運算能力,它是目前最常用來運算卷積神經網路模型的硬體,然而,這運算並未達到最高硬體使用率,雖然圖形處理器裡的紋理單元能夠處理濾波運算,但它們在渲染核心做卷積運算時,卻很少被啟用。而且若這些紋理單元常處於閒置狀態,那也會對於能源效率相當不利。
為了解決這問題,我們提出了一個新的紋理單元的架構,稱為「張量和紋理處理單元」,為了能支援卷積神經網路的運算,我們只需多加一些硬體成本,就能把紋理單元改成調變能力更強以及更多功能,他整個可以分成兩個路徑,一個是去拿二維資料,一個是去拿權重係數,兩者可以同時處理,如此一來,除了一般紋理功能之外,在其他應用上,它也可以分擔渲染核心的運算。而實驗的結果顯示,我們的張量和紋理處理單元的調變能力遠超於其他可程式的紋理單元,它可以支援更多濾波種類和更大的窗,我們的傳輸量可分別達到1.47和2.94倍,此外,在卷積神經網路應用的性能可達到渲染核心的18.5倍,而且僅用39.9%傳統紋理單元面積即可多分擔46.746 GOPS。 In these years, the development of deep neural networks (DNNs) makes artificial intelligence (AI) acquire attention again. Among many classes of neural networks, the convolution neural network (CNN) is the most popular model. It extracts the features by multiple convolutional operations. Despite many advantages, the myriad convolutions make CNNs require high computing power device to run. The graphics processing unit (GPU) is the most common hardware to perform CNN models so far because of its high parallel computing capability. However, the computation does not achieve the maximum of hardware utilization. Although the texture units in GPUs can handle filtering operations, they are seldom launched while the shader cores are executing convolutional operations. It is also harmful to energy efficiency if those texture unit are usually idle. To address this issue, we propose a new architecture of texture unit, called Tensor and Texture Processing Unit (TATPU). We modify the texture unit to be more configurable and multi-purpose in order to support computation of CNNs with small incremental hardware overhead. It is divided into two routines to fetch 2-D data and weight coefficients at the same time. As a result, it is able to offload the operations of shader cores in many other applications, besides texture functionality. The results of our experiments show that the configurability of our TATPU exceeds other programmable texture units. We can support more types of filter and larger size of window. Our throughput is also about 1.47 and 2.94 times higher. Moreover, the performance of CNNs application is nearly 18.5 times of shader cores', and it can share 46.746 GOPS with 39.9% of area of a conventional texture unit. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70635 |
DOI: | 10.6342/NTU201802799 |
全文授權: | 有償授權 |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 7.15 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。