請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39024
標題: | 針對離散小波轉換以及移動補償式時間濾波之積體電路架構設計與分析 VLSI Architecture and Analysis of Discrete Wavelet Transform and Motion-Compensated Temporal Filtering |
作者: | Chao-Tsung Huang 黃朝宗 |
指導教授: | 陳良基 |
關鍵字: | 超大型積體電路架構,移動補償式時間濾波,離散小波轉換,視訊影像壓縮, motion-compensated temporal filtering,VLSI architecture,discrete wavelet transform,video and image compression, |
出版年 : | 2005 |
學位: | 博士 |
摘要: | 離散小波轉換已為靜態影像以及動態視訊壓縮技術帶來革命性的進展,在本論文中,針對離散小波轉換的積體電路設計以及記憶分析,依三個不同的維度分為三個部份來討論,依次為:一維離散小波轉換、二維離散小波轉換以及從時間軸進行離散小波轉換的移動補償式時間濾波。
第一章將介紹離散小波轉換在視訊壓縮演算法上扮演的角色,以及硬體實現上的考量。接著第一部份由第二章介紹一維離散小波轉換演算法和既有硬體架構開始,在第三章裡翻轉式結構將被提出,其可加快提昇式結構的工作頻率,第四章將提出一種全新的設計類別,可提供最少的邏輯閘個數。從第五章開始進入第二部份—二維離散小波轉換,首先是分析不同的影像輸入方法造成效能的差異,第六章將提出可適用於任意離散小波轉換硬體模組的一般型二維條狀式架構,並可最小化內部記憶體,第七章更進一步討論條狀式內部記憶體的實現方法,藉由所提出的多重提昇結構可以減少內部記憶體的面積大小以及存取功率,第八章為討論邊界延伸問題以及形狀適應離散小波轉換的硬體架構。第三部份—移動補償式時間濾波—由第九章開始,在介紹完該演算法後,接著為一階移動補償式時間濾波的記憶體分析以及所提出的資料共享機制,第十章進一步討論多階移動補償式時間濾波的系統分析,最後一個具高度彈性的系統架構將被提出。本論文的主要貢獻與未來方向於第十一章中做結論。 Discrete Wavelet Transform (DWT) has led the revolution of block-based image coding and close-loop video coding systems. In this dissertation, VLSI architectures and memory analysis of DWT in three dimensions are discussed in three different parts: One-Dimensional (1-D) DWT, Two-Dimensional (2-D) DWT, and Motion-Compensated Temporal Filtering (MCTF) that performs DWT in the temporal direction. Because 1-D DWT, 2-D DWT, and MCTF are pixel-level, framelevel, and group-of-picture-level operations, the design levels target at processing element, module, and system, respectively. The implementation method of 1-D DWT is highly related to the algorithm view. In Part I of this dissertation, many different algorithm views for DWT are surveyed first: two-channel filter bank, polyphase decomposition, lifting scheme, and B-spline factorization. The previous 1-D DWT architectures can be classified into convolution- and lifting-based. Second, we propose a flipping structure to reduce the critical path of lifting-based DWT architecture without any hardware overhead. The lifting-based architectures are usually adopted because of its fewer computation complexity and in-place implementation. However, the critical path is potentially long owing to the serial connection of triangular matrices. The flipping structure multiplies the inverses on the timing accumulation path for efficiently reducing the critical path, instead of the conventional pipelining technique that introduces many registers. The case studies of JPEG 2000 (9,7) filter and the linear (6,10) filter demonstrate the efficiency of flipping structure. Third, a new category of DWT implementation based on B-spline factorization is proposed, which can use fewer multipliers. For Daubechies wavelets, it can guarantee to reduce about one half of multiplies compared to convolution-based architectures. However, the lifting scheme cannot reduce the computation complexity for even linear DWT filters. By case studies of the (6,10), (10,18), and (9,7) filters, the proposed B-spline-based architecture shows the superior performance in terms of logic gate count. The 2-D DWT belongs to frame-based computations, so the performance of hardware implementation is dominated by external memory bandwidth and internal memory size. In Part II of this dissertation, a detailed survey for different scan methods is first given and classified into five categories. An overlapped stripe-based scan is proposed to provide a better trade-off for memory requirement. Second, generic line-based 2-D DWT architectures are proposed, which can adopt any kind of 1-D DWT modules. For 1-level 2-D line-based architecture, the line buffer is separated into data buffer and temporal buffer. We propose a data flow for data buffer and a mapping method for temporal buffer, which can minimize the line buffer size. Two multi-level 2-D DWT line-based architectures are also proposed, which can minimize the external memory access, with different hardware utilizations. Third, we propose a memory-efficient implementation for line-based 2-D DWT, which is called multiple-lifting scheme. The implementation issues of temporal buffer are first discussed. Then, the proposed multiple-lifting scheme provides a new implementation method for temporal buffer. It can reduce the temporal buffer access frequency to replace the two-port SRAM by single-port SRAM. The reduction of access frequency also decreases the power consumption of temporal buffer proportionally. By evaluating hardware designs for the (9,7) filter with Artisan 0.18um cell library and RAM compiler, the efficiency of area and power reduction is proven. Fourth, an efficient VLSI implementation for 2-D Shape-Adaptive DWT (SA-DWT) is proposed. The SA-DWT requires the capability to process the boundary extension for very short signal segments. It is proposed to be implemented by use of stage-based boundary extension strategy and shape-adaptive boundary handling unit. The SA-DWT with the JPEG 2000 lossy (9,7) filter and the MPEG-4 VTC (9,3) filter are implemented to prove the efficiency. Furthermore, the SA-DWT implementation with (9,7) filter is fabricated with core area 1.68x1.68mm2 in TSMC 0.25um process. This chip has real-time processing capability of 1-level 2-D SA-DWT for HDTV1080p 30fps sequences when working at 50MHz. MCTF is to perform DWT in the temporal direction with Motion Compensation (MC). MCTF has become the core technology in interframe wavelet video coding and the next generation video coding standard MPEG SVC. In Part III of this dissertation, the first research work on VLSI architecture and memory analysis of MCTF are presented. First, memory issues of one-level MCTF are discussed. The 5/3 MCTF consists of prediction and update stages. The former is analyzed in terms of macroblock- and frame-level data reuse schemes separately. After reviewing previous macroblock-level reuse schemes, we propose a new Level C+ scheme to provide a good trade-off between Level C and D schemes. Based on the open-loop prediction nature, we propose three frame-level data reuse schemes: Double Reference Frames ME, Double Current Frames ME, and modified Double Current Frames ME. The analysis of 5/3 MCTF is based on the combination of prediction and update stages, which includes external memory bandwidth and storage size. Second, system issues of multi-level MCTF are discussed, including computation complexity, external memory bandwidth and storage size, and coding delay. The computation complexities are very similar for different MCTF configurations, but other three system issues are quite different. They depend on the adopted macroblock- and frame-level reuse scheme, decomposition level, inter- or intra-coded lowpass frames, and 5/3 or 1/3 MCTF. Based on simulation results, the impact of the latter three parameters on coding performance is evaluated. Accordingly, a flexible and efficient system architecture is proposed for multi-level MCTF. It can adapt the temporal prediction structures to any-level 5/3 MCTF, 1/3 MCTF, or Hierarchial B-frames, and even the close-loop MCP with two reference frames. In summary, this dissertation presents a fast lifting-based architecture, named flipping structure, and a new design category based on B-spline factorization that can provide the smallest gate count for 1-D DWT processing element design. For 2-D DWT, a generic line-based architecture is proposed to minimize the on-chip memory and to be capable of adopting any kind of DWT modules. Furthermore, a memory-efficient implementation for temporal buffer, called multiple-lifting scheme, is presented to reduce the memory area and access power efficiently. Besides, the boundary extension of SA-DWT is addressed by proposed stage-based boundary handling units. As for MCTF, system-level implementation issues are considered. The block-level and frame-level data reuse schemes are both discussed for one-level MCTF. According to analysis results, a flexible and efficient system architecture is proposed for multi-level MCTF, which can support many configurations of MCTF systems. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39024 |
全文授權: | 有償授權 |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-94-1.pdf 目前未授權公開取用 | 1.01 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。