基於CNN方法與多層計算之即時超級解析度架構設計

Chung-Yan Chih; 池忠諺

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68137

標題:	基於CNN方法與多層計算之即時超級解析度架構設計 Architecture Design for Real-Time CNN-Based Super Resolution Using Multi-Layer Computation
作者:	Chung-Yan Chih 池忠諺
指導教授:	陳良基
關鍵字:	超級解析度加速器,逆摺積轉換,多層計算,資料重複使用設計,有效率的記憶體設計, Super resolution accelerator,Deconvolution modification,Multi-layer computation,Data reuse design,Memory efficient design,
出版年 :	2017
學位:	碩士
摘要:	隨著科技的進步，多媒體裝置的螢幕解析度也越來越高，也造成許多撥放內容的解析度比螢幕解析度低。利用超級解析度演算法可以將圖片或影像的解析度提升並且有著較多的細節，以保證視覺效果良好。而另一方面，深度學習中類神經網絡演算法近年來在許多電腦視覺相關領域皆取得了很好的成效，其中也包括超級解析度問題。目前的超級解析度演算法由於計算量龐大，在軟體及現有的硬體加速皆無法達到高解析度即時運算輸出，因此我們的目標放在輸出Full-HD解析度條件下達成每秒60張輸出水準。我們將目標放在應用深度學習的超級解析度演算法上，比較過後我們決定使用FSRCNN進行實作。而由於深度學習所帶來的大量計算導致無法實現即時處理的問題，我們提出客製化硬體架構設計來加速處理。與現有的類神經網絡加速器的單層處理相異之處在於，我們提出一個跨層平行運算的架構來減少頻寬的需求。在實驗中我們得出此方法相較於單層處理可以減少將近95%的頻寬需求，並且基於Level B的重複使用方法，資料與記憶體將會有效率的使用。而為了重複使用運算單元，我們將其中的逆摺積層調整成多個相異尺寸的摺積計算。最後我們提出一個允許2、3、4倍放大倍率，每秒輸出60張影像的超級解析度加速器，處理速度為FSRCNN中軟體執行的19倍。 In recent years, the display resolution on end devices is enhancing rapidly. The super resolution aims to enhance visual quality when low-resolution contents play at high-resolution displays. Besides, high frame-rate is also an important issue to make the scene smooth. On the other hand, deep learning is growing rapidly in recent years, it can achieve better performance compared to the original methods in many computer vision domain, including super resolution. In super resolution domain, we find that the learning based methods are better than the non-learning-based methods as expected. But learning-based methods have massive computation, real-time performance is hard to achieve. The truth is, the architectures with non-learning based methods which can achieve real-time performance are rare. In upscaling option, most of the hardware can only perform upscale 2 and most of the architectures' output resolution is below full-HD. We proposed a customized ASIC implementation of super resolution which can perform 60 fps upscaled full-HD or even higher output by different upscaling factors. Our architecture is learning based super resolution algorithm. We first compare the difference between high-resolution (HR) network and low-resolution (HR) network and decide to use LR network for computation cost concern. We then decide to use 'Fast Super Resolution Convolutional Neural Network' (FSRCNN) to implement. Since the original FSRCNN has deconvolution layer but the major part is convolution, we decide to modify the deconvolution filter into multi-size convolution filters to make the hardware has higher utilization. We propose our architecture for hardware friendly FSRCNN and we define the bandwidth problem and use a different strategy to reduce the bandwidth and memory cost. Instead of layer-wise computation, which is commonly used in convolutional neural network hardware accelerator, we use multi-layer computation to reduce bandwidth requirement and enhance throughput. We arrange data flow and store reuse data by using level B data reuse to make our work a memory efficient architecture We implement an FSRCNN accelerator with a lower bandwidth, smaller on-chip memory, and high frame-rate.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68137
DOI:	10.6342/NTU201704425
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	6.06 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。