請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78023
標題: | 計算攝影學之區塊匹配加速器 Exact Nearest Patch Matching Accelerator for Computational Photography |
作者: | Zih-Yu Chao 趙子宇 |
指導教授: | 簡韶逸 |
關鍵字: | 計算攝影學,區塊匹配, computational photography,patch matching, |
出版年 : | 2017 |
學位: | 碩士 |
摘要: | 完美的感測器並不存在,計算攝影學就是利用計算來預測完好的高品質影像。此外,隨著科技的進展,我們越來越容易利用像智慧型手機、數位相機來取得高解析度的照片,這個趨勢必定會讓原先的演算法花上更多的時間來做運算。
此研究提出一套關於計算攝影學(例如超解析影像、影像去噪)的加速器。區塊匹配是許多計算攝影學應用的關鍵運算。區塊是一張影像中許多重疊的方塊。區塊匹配是從來源影像和目標影像中尋找最像似的區塊來探究他們之間的關係。若我們將占大量運算時間的部分用硬體取代軟體,相對於現在的處理器而言,速度可以得到很大的提升。我們使用400個部分平方差和加法器平行計算所有搜尋範圍內的目標區塊,400個特殊的累加器可以藉由減去之前的部分和再加上新的部分和來實現緊鄰搜尋範圍內的部分和重複使用。累加器使用shift registers來接受對應到要被加和減的資料。最後,我們可以修改每位元處理的rank order濾波器來平行處理所用候選和的部分排序。它比較所有候選和的相同位置的位元而非整個數字大小且將最高位比較後的資訊傳遞至最低位。小心地對架構安排暫存器、排時程、利用平行運算、使用SRAM作為local buffer以及權衡搜索範圍的大小和演算法效能以大量降低頻寬和工作頻率來實現被應用在很多的計算攝影學領域上面的區塊匹配運算。我們實現250MHz、4k(3840x2160)解析度、即時(30fps)的區塊匹配運算。這套硬體能夠有益於加速許多計算攝影學的研究。 There are no perfect image sensors, we need to compute and predict the original clean image with high quality which means computational photography. In addition, with the development of technology such as smart phone, digital cameras, it is easy to get many high-resolution images in our daily life. It definitely takes much longer time to process these images by previous ways. This study presents an implementation of a hardware accelerator for computational photography (such as image super resolution, image denoising.). Patch matching is a key building block for many computational photography applications. Patch is overlapped blocks in an image. Patch matching is finding most similar patch(es) between source image and target image. In order to get correspondence between them. By replacing large part of software on this important operation with hardware will achieve strongly speed raising compared with software implementations on current image processors. We use 400 partial sum (one column) of square difference circuits to compute all the target patches in one search range in parallel and output one column of partial sum cycle by cycle to realize partial sum reuse, 400 special accumulators which can keep current correct sum by subtracting previous partial sum and add next partial sum to realize partial sum reuse in next search range. Accumulators use shift register to access the data we want to add and subtract from the current sum. Finally, we can perform partial sorting of all the candidates by modifying bit by bit rank order filter. It compares the bits of same position of all candidates rather than the whole numbers and propagate the information from most significant bits to least significant bits. Careful pipeline, scheduling, parallel computing, use SRAM as local buffer for architecture design and tradeoff between size of search range and performance to reduce great amount of bandwidth and working frequency to realize patch matching which can be widely used in many applications. A flexible image processor realizes 4K (3840x2160) resolution real-time (30fps) patch matching processing at 250MHz. Proposed hardware implementation will benefit many tasks in computational photography. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78023 |
DOI: | 10.6342/NTU201700228 |
全文授權: | 有償授權 |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-R03943127-1.pdf 目前未授權公開取用 | 2.75 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。