自由視角3D電視：演算法與架構設計

Pei-Kuei Tsung; 叢培貴

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66192

標題:	自由視角3D電視：演算法與架構設計 Free-viewpoint 3DTV: Algorithm and Architecture Design
作者:	Pei-Kuei Tsung 叢培貴
指導教授:	陳良基(Liang-Gee Chen)
關鍵字:	系統晶片,視訊,自由視角,3D,虛擬實境,壓縮,編解碼器,移動估計,位移估計,平行架構,低功率,MVC,多視角影像編碼,H.264,MPEG,AVC,FTV,Free-viewpoint, System-on-a-Chip (SoC),Video,Compression,Free-viewpoint,3D,3DTV,FTV,virtual reality,Codec,Parallel archi-tecture,Low Power,MVC,Multiview Video Coding,H.264,MPEG, AVC,
出版年 :	2012
學位:	博士
摘要:	真實，一直是人類的夢想，在遠古的年代，就有著達文西等人對於人類究竟能夠呈現多少的「真實」的探索。近年來，由於顯示技術的進步，將影像顯示的定義從原本的二維平面提升到極為仿真的三維平面漸漸的不再只是紙上談兵，各類的多視角以及立體顯示相關的運用，例如三維立體電視以及自由視角電視等等，也如同雨後春筍般地不斷發展。然而，即使已經有了愈來愈高的視訊畫面解析度，以及愈來愈立體的3D技術，人類仍在窮盡自身的想像力來追求「真實」。在本論文中，首先我們將探討人眼的三維感官原理，與目前3D電視產業中所能提供的究竟有何差距，在分析了人眼的感官原理之後，我們提出了經由加上了觀察者與被視訊源(電視，電影)的交互動作以及自由視角的概念，來建立出目前的3D電視缺乏的Motion Parallax的技巧。經由此一交互作用的建立，目前的3D電視能夠更進一步的進化為自由視角3D電視，並提供虛擬實境等應用。在自由視角3D電視環境中，主要包含了自由視角生成與即時多視角視訊編解碼兩方面的視訊訊號處理。首先，由於物理上的限制，在攝影端不可能拍攝無限多個視角的影像，因此，為了能夠自由的轉換視角，必須要使用虛擬視角生成的技術來生成出在真實拍攝到的視角間的虛擬視角影像。但在虛擬視角生成中，由於目標的影像不存在，如何能夠在有限的資訊之下得到最高品質的影像畫面便成為一門相當重要的課題。另外，就如同現行的視訊訊號一般，即時並且高壓縮效率的視訊壓縮仍然是非常重要的領域。在自由視角3D電視環境中，多個視角的影像需要同時被壓縮，因此所需傳輸的資料量也就隨著視角的個數而倍數成長。更由於近年來高畫質1920x1080畫素，甚至於4096x2160畫素等超高解析度已成為視訊應用中不可忽略的規格，達到即時的視訊壓縮與解壓縮也變成了愈來愈為困難的一項要求。本篇論文分為三部分，將探討在自由視角3D電視中虛擬視角生成，多視角視訊壓縮，以及整體系統的實現。首先，論文的第一部分將探討虛擬視角生成的演算法設計，為了同時提高所生成的虛擬視角的畫質與節省所需的運算量，我們首先提出了單一循環的虛擬視角生成演算法來降低運算的複雜度。接下來，根據所生成的虛擬視角的畫質，我們提出了單一循環的畫面修補演算法，在運算虛擬視角的同時便能夠進行畫面的修補。而在運算的前處理與後處理部分，我們也分別提出了視角間的亮度與彩度補償(Inter-view Color Calibration)與以深度資訊為考量之畫面缺損填補(Depth-based Inpainting)，在虛擬視角生成之前與之後提升畫面的品質。在本論文第一部分的最後，我們將這些演算法整合於目前的3D筆電平台，藉由整合XBOX Kinect系統來偵測使用者位置，以及我們提出的虛擬視角生成演算法，我們能夠提供未來自由視角3D電視的雛型。在本論文的第二部分中，我們將探討在3D電視系統中，視訊壓縮系統的演算法與硬體架構設計。首先，針對目前的MVC壓縮系統的頻寬分析，經由引入圖論中的優先限制演算法，不同的編碼架構能夠根據所對應的優先限制來選擇能夠得到最小頻寬的演算法。另外，我們則是提出了目前文獻上第一個發表的MVC單晶片編碼器。經由在台積電90奈米製程的實作，我們提出的MVC單晶片編碼器能夠即時提供從單一視角4096×2160畫素解析度；三個視角1920×1080解析度；一直到七個視角1280×720解析度的視角可調性。這樣的可調性使我們的設計能夠同時支援多視角3DTV以及超高畫質QFHD的應用。最後，在本論文的第三部分，為了實現出最真實的影像，我們提出了整合目前的多視角MVC解碼器與自由視角的產生器於單一系統晶片的概念。經由目前最高的視訊解析度Quad Full-HD (4096x2160 pixels)，目前最多的視角個數所需要的每秒鐘216張Quad Full-HD的自由視角畫面輸出規格，以及目前最高規格，能夠即時解碼Quad Full-HD@30fps的MVC解碼器，不光是目前的2D與3D電視，未來電視產業的多人裸視3D電視以及虛擬實境的應用都可以在我們的系統晶片中實現。經由台積電的40奈米製程，我們實做出了世界上第一顆，也是目前唯一一顆的自由視角3D電視機上盒系統晶片。我們的晶片最高的時脈為240MHz，最高能支援的畫面輸出為Quad Full-HD@216fps。另外，我們的晶片能夠支援在3D空間上所有包含了三個維度的旋轉以及三個維度的位移，總共六個維度的自由視角生成。與過去發表在ISSCC與VLSI Symposium等最高知名度的國際級晶片會議上的3D電視晶片相比較之下，我們提升了6.6到229倍的功率效率，以及9倍到40.5倍的整體系統規格。簡言之，針對未來的3D視訊系統，本論文在不同的面向中提出了各種的可能性。藉由本論文提供之技術，人類從此能夠更進一步的定義出「真實」。 3DTV is the promising mainstream of the next-generation TV systems. High-resolution 3DTV provides users vivid watching experience. Moreover, free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into the virtual reality by generating unlimited views on any desired viewpoint. In this dissertation, the algorithms and VLSI architecture designs in the free-viewpoint 3DTV system are introduced in three parts: the visual quality improvement, the system analysis and implementation of the 3DTV coding system, and the system integration of 3DTV set-top box SoC. In the visual quality improvement part, different algorithms are proposed to solve the perception issues on the free-viewpoint virtual view synthesis. In the 3DTV coding system part, the system analysis and the VLSI architecture design on the MVC encoder is introduced. By the proposed MVC encoder, the 4096 × 2160p H.264/AVC and the HDTV MVC real time encoding is achieved. Finally, the free-viewpoint 3DTV system is integrated in the worldwide first single chip free-viewpoint 3DTV set-top box SoC, including the MVC decoder and free-viewpoint synthesizer. The 216fps 4096 × 2160p throughput enables 9 possible view angles real-time displayed in parallel. In the first part of this dissertation, the visual quality improvement algorithms in FVVS are introduced. In order to provide better free-viewpoint video quality, the visual quality improvement algorithms are designed in the inter-view color calibration, virtual view synthesis, and post processing blocks. In order to reduce the computational complexity and the complex scheduling of FVVS, a single iteration view interpolation algorithm is proposed. The redundant computation is reduced by 86% after the single iteration scheme. Further, the artifacts due to the imperfect depth map are eliminated by the proposed running interpolation and background erosion under the same single iteration. Then, a hybrid color compensation scheme is proposed as the pre-processing. Based on the inter-view color correspondence estimation, a linear and smooth light field model is established. As the result, color mismatch and ghost effects in the synthesized virtual view frames are eliminated. Besides, the regions with strong reflection are detected from the outliers in linear regression and optimized by hybrid reflection model. Thus, a proper reflection behavior is shown. Compared with virtual views without color compensation, the proposed method improves the PSNR result by about 0.26-0.42 dB. After the pre-processing engine and the view synthesis engine, a hybrid inpainting algorithm is presented as the post processing engine. The motion-oriented, depth-based, and conventional anisotropic filter diffusion manners are used to aim for the better visual quality. The appropriate solution is found with good variety for dealing with different types of image artifacts. The simulation results show that the proposed hybrid inpainting algorithm outperforms by both perceptual quality and the objective metric measure. Finally, a real-time viewpoint-aware 3D video synthesis system is developed in the end of the first part of this dissertation. After the GPU-CPU co-optimization, 1280 × 720p and 30fps throughput is achieved on a 4-core notebook. The algorithm and architecture design for the the 3DTV coding system is proposed in the second part of this dissertation. At first, a new bandwidth analysis scheme for various MVC structures is proposed. The concept of precedence constraint in the graph theory is adopted to derive the processing order in a MVC structure. In addition, two scheduling flows in MVC are proposed for systematical analysis. With the combination of the level-C+ data reuse scheme, several design points can be derived. Hardware resource allocation can be systematical defined with the trade-off between the system memory bandwidth and the on-chip memory. Then, toward the MVC encoder design, several issues about video encoder design for 3DTV applications are discussed in this part. The system analysis shows that the previous design methods used in the single video coding cause a dramatic hardware resource requirement and cannot be employed directly. In order to deal with these design challenges, solutions for each module in the MVC encoder, including cache-based and predictor-centered IMDE, hybrid open-close loop intra prediction, and FPPDD CABAC, are proposed. After adopting all the proposed algorithm and architecture optimizations, an MVC single chip encoder is implemented under TSMC 90nm process. The proposed MVC encoder design supports from the 1920 × 1080p full HD three views to 1280 × 720 HDTV seven views HD MVC real-time encoding. Furthermore, the single view 4096 × 2160p QFHD H.264/AVC encoding is also supported. With the proposed VLSI techniques, real-time 3D video applications become feasible. The third part of this dissertation introduces the SoC integration of the worldwide first free-viewpoint 3DTV set-top box SoC. In order to reduce the hardware complexity in the warping engine, which is the key module of the SoC, a new 3D warping engine model is presented. We show that the rationality and low-cost characteristic for linear interpolation approach (LIA) are suitable for hardware design. In addition, the redundant information for fractional bits of parameters are further reduced by the precision fitting scheme. By doing so, 95.9 % and 69.5 % of area are saved for the Homographic matrix rendering and vector transform stage, with the negligible 0.0059 dB overhead of PSNR. In the system scheduling level, a hardware oriented 6D FVVS flow is proposed. Through the proposed texture reorder and the corresponding inverse reorder scheme, the on-chip data scheduling can be fitted the conventional block pipelining even under different viewpoints and geometries. Then, the DWRFS scheduling and the on-chip texture buffer optimization deal with the frame-level and DRAM word-level bandwidth saving respectively. After integrating all these technologies, about 95.7 % system memory bandwidth is saved. Moreover, instead of the conventional line-buffer based data scheduling and the corresponding horizontal-shift-only geometry, the proposed hardware-oriented 6D FVVS flow supports full 6D free-viewpoint geometries. Finally, after the MVC decoder integration and other VLSI architecture contribution, the proposed free-viewpoint 3DTV set-top box SoC is realized under TSMC 40nm technology. A MVC decoder and a free-viewpoint synthesizer are integrated together to support various free-viewpoint 3DTV applications. By the FVVS engine, users can explore the 3D scenes in the virtual reality with any desired directions and positions rather than only seeing the stereo 3DTV by horizontal-shift view angles. The 6D FVVS flow overcomes the view angle limitation on the state-of-the-art 3DTV chips and supports full geometries including 3-dimensional rotation and 3-dimensional translation. The 216fps 4096 × 2160p throughput of the proposed 3DTV set-top box SoC can be used in real-time displaying 9 different virtual views in parallel. Comparing with the state-of-the-art 3DTV chips, a 9× to 40.5× higher throughput is presented. After the proposed DWRFS scheme and texture reorder cache design, 93 % of the system memory bandwidth is saved. Furthermore, with the aid of the advanced CMOS technology from TSMC 40nm, 27.5MPixels/mW power efficiency, which is 6.6× to 229× higher than the state-of-the-art 3DTV chips, is achieved. In brief, the 3D technologies provided in this dissertation lead the way to the possible next generation free-viewpoint 3DTV system. Another step toward the human dream of the 'reality' is achieved by the research contributions in this dissertation. We sincerely hope that these research contributions can create a new era for digital multimedia life.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66192
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 目前未授權公開取用	11.61 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。