高畫質視訊可重組式移動估計之架構分析及實作

Sung-Fang Tsai; 蔡松芳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25068

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳良基(Liang-Gee Chen)
dc.contributor.author	Sung-Fang Tsai	en
dc.contributor.author	蔡松芳	zh_TW
dc.date.accessioned	2021-06-08T06:01:31Z	-
dc.date.copyright	2007-07-30
dc.date.issued	2007
dc.date.submitted	2007-07-26
dc.identifier.citation	[1] Y.-W. Huang et al:, 'A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications,' Proc. IEEE International Solid-State Circuits Conference, 2005. [2] C.-Y. Chen, S.-Y. Chien, Y.-W. Huang, T.-C. Chen, T.-C. Wang, and L.-G. Chen, 'Analysis and architecture design of variable block size motion estimation for H.264/AVC,' Accepted by IEEE Transactions on CASI. [3] P.-C. Tseng, S.-S. Lin, and L.-G. Chen, 'Low-power parallel tree architecture for full-search block-matching motion estimation,' in Proceedings of IEEE International Symposium on Circuits and Systems, May 2004, pp. 239–244. [4] S.-S. Lin, 'Low-power motion estimation processors for mobile video application,', Master’s thesis, Graduate Institute of Electronics Engineering in National Taiwan University, Taipei, Taiwan, 2004. [5] Draft ITU-T Recommendation H.261. ITU-T, 1993. [6] Coding of Moving Picture and Coding of Continous Audio for Digital Storage Media up to 1.5Mbps. ISO/IEC IS 11172(MPEG1), 1992. [7] General Coding of Moving Pictures and Associated Audio Information. ISO/IEC IS 13818(MPEG-2), 1994. [8] Very-low Bitrate Audio-visual Coding. ISO/IEC DIS 14496(MPEG-4), Oct. 1998. [9] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification. ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, May 2003. [10] J. V. Team, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification. ITU-T Recommendation H.264 and ISO/IEC 14496-10 AVC, May 2003. [11] T. Wiegand and et. al., 'Rate-constrained coder control and comparison of video coding standards,' IEEE Transactions on CSVT, vol. 13, no. 7, pp.688–703, Jul. 2003. [12] ——, 'Overview of the H.264/AVC video coding standard,' IEEE Transactions on CSVT, vol. 13, no. 7, pp. 560–576, Jul. 2003. [13] T.-C. Chen and et. al., 'Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder,' IEEE Transactions on CSVT, vol. 16, no. 6, pp. 673–688, June 2006. [14] D. Linden, Handbook of Batteries, 2nd ed. New York: McGraw-Hill, 1995. [15] Performance analysis and architecture evaluation of MPEG-4 video codec system, vol. 2, May 2000. [16] J.-H. Lee and N.-S. Lee, 'Variable block size motion estimation algorithm and its hardware architecture for H.264,' in Proceedings of ISCAS’04, 2004. [17] Y.-W. Huang, T.-C. Wang, B.-Y. Hsieh, and L.-G. Chen, 'Hardware architecture design for variable block size motion estimation in MPEG-4 AVC/JVT/ITU-T H.264,' in Proceedings of IEEE International Symposium on Circuits and Systems, vol. 2, May 2003, pp. 796–799. [18] S. Y. Yap and J. McCanny, 'A VLSI architecture for variable block size video motion estimation,' IEEE Transactions on CASII, 2004. [19] J. Miyakoshi, Y. Murachi, K. Hamano, T. Matsuno, M. Miyama, and M. Yoshimoto, 'A low-power systolic array architecture for block-matching motion estimation,' IEICE Transactions on Electronics, pp. 559–569, 2005. [20] W.-M. Chao and et. al., 'A novel hybrid motion estimator supporting diamond search and fast full search,' in Proceedings of IEEE ISCAS’02, 2002, pp. II–492–II–495. [21] J. Miyakoshi and et. al., 'A sub-mW MPEG-4 motion estimation processor core for mobile video application,' in Proceedings of IEEE on CICC’03, Sep. 2003, pp. 181–184. [22] W. Li and E. Salari, 'Successive elimination algorithm for motion estimation,' IEEE Transaction on Image Processing, vol. 4, no. 1, pp. 105–107, Jan. 1995. [23] X. Gao, C. Duanmu, and C. Zou, 'A multilevel successive elimination algorithm for block matching motion estimation,' IEEE Transaction on Image Processing, vol. 9, no. 3, pp. 501–504, Mar. 2000. [24] Y. W. Huang, S. Y. Chien, B. Y. Hsieh, and L. G. Chen, 'Global elimination algorithm and architecture design for fast block matching motion estimation,' IEEE Transaction on Circuits and Systems for Video Technology, vol. 14, pp. 898–907, Jun. 2004. [25] S. Eckart and C. Fogg, ISO/IEC MPEG-2 Software Video Codec. SPIE Digital Video Compression: Algorithms and Technologies, 1995. [26] J. N. Kim and T. S. Choi, 'A fast motion estimation for software based realtime video coding,' IEEE Transactions on Consumer Electronics, vol. 45, no. 2, pp. 417–426, May 1999. [27] T. Komarek and P. Pirsch, 'Array architectures for block matching algorithms,' IEEE Transactions on Circuits and Systems, vol. 36, no. 2, pp. 1301–1308, Oct. 1989. [28] L. D. Vos and M. Stegherr, 'Parameterizable vlsi architectures for the fullsearch block-matching algorithm,' IEEE Transaction on Circuits and Systems, vol. 36, no. 10, pp. 1309–1316, Oct 1989. [29] Y. S. Jehng, L. G. Chen, and T. D. Chiueh, 'An efficient and simple vlsi tree architecture for motion estimation algorithms,' IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 889–900, Feb. 1993. [30] M. J. Chen, , L. G. Chen, K. N. Cheng, and T. D. Chiueh, 'Efficient hybrid tree/linear array architectures for block-matching motion estimation algorithms,' IEE Proceeding on Vision, Image and Signal Processing, vol. 143, pp. 217–222, Aug. 1996. [31] W. M. Chao, C. W. Hsu, Y. C. Chang, and L. G. Chen, 'A novel hybrid motion estimator supporting diamond search and fast full ssearch,' in IEEE International Symposium on Circuits and Systems, 2002. [32] J. Miyakoshi, Y. Kuroda, M. Miyama, K. Imamura, H. Hashimoto, and M. Yoshimoto, 'A sub-mw mpeg-4 motion estimation processor core for mobile video application,' in Custom Integrated Circuit Conference, 2003. [33] J.-C. Tuan and et. al., 'On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture,' IEEE Transactions on CSVT, vol. 12, pp. 61–72, Jan. 2002. [34] D.-L. Lee, 'Architecture of an array processor using a nonlinear skewing scheme,' IEEE Trans. Computers, vol. 41, no. 4, pp. 499–505, 1992. [35] K. Kim and V. K. Prasanna, 'latin squares for parallel array access,' IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 4, pp. 361–370, 1993. [36] P. Budnik and D. J. Kuck, 'The organization and use of parallel memories,' IEEE Trans. Computers, vol. 20, no. 12, pp. 1566–1569, 1971. [37] Z. Liu and X. Li, 'XOR storage schemes for frequently used data patterns,' IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 2, pp. 162–173, 1995. [38] J. K. Tanskanen, R. Creutzburg, and J. T. Niittylahti, 'On design of parallel memory access schemes for video coding,' Journal of VLSI signal processing, vol. 40, no. 2, pp. 215–237, Jun. 2005. [39] G. Kuzmanov, G. Gaydadjiev, and S. Vassiliadis, 'Multimedia rectangularly addressable memory,' IEEE Trans. Multimedia, vol. 8, no. 2, pp. 315–322, Apr. 2006. [40] J. Tanskanen, T. Sihva, J. Niittylahti, J. Takala, and R. Creutzburg2, 'Parallel memory access schemes for H.263 encoder,' in Proceedings of IEEE International Symposium on Circuits and Systems, vol. 1, May 2000, pp. 691–694. [41] M. Bhardwaj, R. Min, and A. P. Chandrakasan, 'Quantifying and enhancing power awareness of VLSI systems,' IEEE Trans. VLSI Syst., vol. 9, pp. 757– 772, Dec. 2001. [42] Y.-H. Chen, T.-C. Chen, and L.-G. Chen, 'Hardware oriented contentadaptive fast algorithm for variable block-size integer motion estimation in H.264,' Proc. IEEE Int. Symp. on Intelligent Signal Processing and Communications Systems, 2005. [43] T. R&D, ITU-T Recommendation H.263 Software Implementation. Digital Video Coding Group, 1995. [44] J.-C. Tuan, T.-S. Chang, and C.-W. Jen, 'On the data reuse and memory bandwidth analysis for full-search block matching VLSI architecture,' IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 61–72, 2002. [45] X. Q. Gao, C. J. Duanmu, and C. R. Zou, 'A multilevel successive elimination algorithm for block matching motion estimation,' IEEE Trans. Image Process., vol. 9, no. 3, pp. 501–504, Mar. 2000. [46] H. M. Jong, L. G. Chen, and T. D. Chiueh, 'Parallel architectures for 3-step hierarchical search block-matching algorithm,' IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 407–416, Aug. 1994. [47] S. Zhu and K. K. Ma, 'A new diamond search algorithm for fast blockmatching motion estimation,' IEEE Trans. Image Process., vol. 9, no. 2, pp. 287–290, Feb. 2000. [48] A. M. Tourapis, O. C. Au, and M. L. Liu, 'Highly efficient predictive zonal algorithms for fast block-matching motion estimation,' IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 10, pp. 934–947, Oct. 2002. [49] T.-C. Chen, Y.-H. Chen, S.-F. Tsai, S.-Y. Chien, and L.-G. Chen, 'Fast algorithm and architecture design of low-power integer motion estimation for h.264/avc,' IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 568–577, 2007. [50] Y.-W. Huang and et. al., 'A 1.3TOPS H.264/AVC Single-Chip Encoder for HDTV Applications,' in Proceedings of IEEE ISSCC’05, 2005, pp. 128–130. [51] W.-M. Chao, 'Platform-based design and chip implementation of MERG-4 video coding,' Master’s thesis, Graduate Institute of Electronics Engineering in National Taiwan University, Taipei, Taiwan, 2002. [52] C.-P. Lin and et. al., 'A 5mW MPEG4 SP encoder with 2D bandwidth sharing motion estimation,' in Proceedings of IEEE ISSCC’06, 2006, pp.1626–1635.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25068	-
dc.description.abstract	多媒體產品的大量普及下，視訊功能已是數位行動裝置不可或缺的部分，而其解析度也隨著技術提升而不斷提升，逐漸走向高畫質視訊的應用。由於高解析度視訊的資料量極為龐大，為了能夠順暢的在有限的通訊網路頻寬下傳輸，故必需使用壓縮方法減少資料量。移動估計在視訊壓縮中可以有效的減低時間上的多餘性，故被許多不同的視訊標準所採納。移動估計提供了有效率的編碼，卻也佔大部分的運算量，以H.264為例，整個編碼器的複雜度有90%以上著落在移動估計的部份上。在高畫質視訊時更形嚴重，同時功率消耗是行動裝置中相當重要的因素，故在本篇論文中，提出了一個具功率可調性的可重組式移動估計處理器，以及可支援二維任意存取與資料快取的可重組計憶體系統，同時也在能提供不同功率要求的功率感知系統環境中驗證。先前有許多關於移動估計的研究，在快速演算法方面主要可分為兩大類，一為快速全搜尋演算法，一為快速演算法。第一類是快速全域搜尋演算法其代表如連續消除演算法，及部分誤差消除法等等。第二類是快速演算法，如坡度下降演算法，三步搜尋，四步搜尋，六邊形搜尋，可以極有效的減少搜尋點數目，故被廣泛的採用在軟體實作中。在硬體架構方面，許多學者也提出了許多不同的架構，代表如，一維陣列、二維陣列、一維分支樹狀架構、二維樹狀架構等。為了減低移動估計龐大的外部頻寬與功率消耗，在這些硬體架構中需要將讀取過的資料再利用。在系統階層，Level-C 資料再利用可以有效的減少外部存取需求，但若用於高畫質視訊所需之較大範圍的搜尋區域，則有晶片之計憶體面積需求過大的問題。此外，在搜尋點階層，之前的快速演算法與架構往往不能良好配合，記憶體擺放的方法也會造成資料讀取的瓶頸，只能提供一維之資料再利用，造成資料再利用率低下的問題。為解決這個問題，在這裡提出了移動估計的演算法與相對應的架構，同時也提出了一個可重組式移動估計的架構，以提高資料重復利用率。在所提出之移動估計的演算法方面，利用視訊信號的特點，提出了一個功率可調快速演算法，對於不同之功率需求，透過改變搜尋所使用之運算量以及門檻值來達到。此演算法以資料再利用率較高之四步搜尋為基礎，在搜尋視窗中進行多次搜尋。此外，功率與品質之間交換效率也是重要的考量，在減少功率的同時，應盡量在有限的運算量範圍內維持品質。為提升功率與品質之間交換效率，使用了內容感知的起始點決定方式，透過周圍區塊的移動特性，計算出可能之起始點位置。在所提出之移動估計的架構方面，選擇二維樹狀加法器架構為底，因為其架構提供較好的資料重複利用率，並且在H.264編碼標準中，不需額外的暫存器儲存可變區塊大小之部分差別結果。為了提升搜尋路徑點之間之資料再使用率，提出了的參考畫面儲存器，能同時支援上下左右四種方向平移，藉由增加一個維度的資料再使用能力，有效支援不規則的資料流，近一步改善移動估計功率及效能。此外，使用模組端閘控時脈的技巧，近一步把閒置狀態的功耗減低。以上架構在支援所提出之快速演算法時，資料再利用率會受到每次搜尋開始時需要重新讀取起始點的資料而受到限制，在這裡提出了進階的資料搜尋流程，藉由把前後搜尋路徑串連的方式，將分散的資料流整合成單一的資料流，達到更有效率的資料搜尋方式。在計憶體架構方面，移動估計若要正確支援在搜尋視窗中二維移動，記憶體必須也能支援其所需的二維任意平行存取的功能，然而傳統的穿插擺放的記憶體組織並不能提供所需的二維任意平行存取的功能。此外，一般在設計移動估計時所使用的Level-C 搜尋視窗記憶體在高畫質視訊使用時會有許多額外的外部存取與晶片內較大的記憶體位元數，為了減低這個問題所造成的影響，這裡提出了一個可重組記憶體系統的設計，可以同時提供系統中不同存取形態的模組使用，其中主要可分為可重組式資料路徑與區塊轉譯快取兩部分。可重組式資料路徑可以由軟體控制，經由改變其中可程式化組態控制之設定，產生不同的平移信號給進入與輸出記憶體的路由器。從資料處理模組來的計憶體需求資訊，因為不同的平移信號，會對應到產生不同的記憶體穿插映對方式支援不同區塊大小之二維任意存取；同時，針對不同的模組所需記憶體需求，透過記憶體空間定位器，可以對應到不同的記憶體映射組態。區塊轉譯快取可以在晶片上在受限的記憶體大小下，暫存最有可能再利用的資料。由於移動估計時使用快速演算法，只有部分的參考畫面都會被利用到，故只需使用少部分記憶體把最有可能利用到的資料暫存，即可達到減少外部多餘存取的目的。在這裡利用視訊處理時計憶體存取的關聯性，使用了以區塊為基礎的存取方式，提升標籤記憶體的效率。在晶片實作上，利用了兩種電路的技巧達到降低功率消耗的目的。第一是模組端閘控時脈，依據模組閒置狀態控制其時脈信號是否進入模組，同時此處也使用了門閂式閘控電路使得控制信號不會造成時脈干擾的問題。第二是降低供應電壓，降低供應電壓可以進一步降低功率消耗，但降低供應電源會造成較嚴重的訊號延遲，故設計時變考慮這個問題，避免延遲路徑過長。在高畫質視訊可重組式移動估計器的方面，使用了TSMC 0.18μm CMOS 1P6M製程，根據實際量測結果，可支援之畫面大小可從QCIF到HDTV1920p，功率消耗從0.24mW到49.86mW；與其他架構比較，在CIF每秒30張下，在頻率13.5Mhz,1.3V供應電壓下，功率消耗為2.13mW，與之前之設計比較改進許多。在系統實作上，整合進低功率與功率可調的H.264編碼器，提出之重組式整數移動估計能夠適當的提供功率感知系統所需的可調性，並且能夠系統端的功率感知演算法正確的整合，最後整個系統輸出的功率為2.8mW ~ 67.2mW。在所使用之系統環境中，提出之可重組式移動估計之設計與可重組式記憶體系統設計能提供適當的功率與品質交換；透過在演算法階層、架構階層與記憶體階層的資料再利用設計，可以把資料存取的功率降低至相當低的程度。由以上結果顯示，故提出的架構非常適合現今需要高畫質視訊的行動裝置應用上。	zh_TW
dc.description.abstract	Motion estimation provides and efficient encoding tools, but it also occupy a lot of computing power. In high definition video application, the problem would be more serious. A reconfigurable motion estimation with power scalability is proposed. Besides, to support the random access requirement, a reconfigurable memory system is also proposed. Both is successfully verified in different two system. With use of the video signal characteristics, based on fast search algorithm, a parallel architecture with efficient DR techniques and hardware-oriented algorithm is proposed. Content-adaptive parallel-VBS fast search algorithm is firstly designed with the inter-/intra-candidate DR capability in hardware, and computational complexity can be largely saved. It provides controllability on computation so as to achieve power scalability. It reduce computation with graceful coding performance degradation through content adaptation. On the motion estimation architecture, based on the systolic array and 2-D adder tree architecture, 4 direction of movement is supported. It enables the support of irregular moving path of fast algorithm efficiently. Advanced searching flow are applied to support inter-candidate DR and to reduce the latency cycles. IME require 2D parallel access on SW memory. Traditional memory organization cannot provide the required parallel access. A reconfigurable memory system design is proposed. It allows various reconfigurable engines and dedicated accelerators with various access patterns to access data through run-time configurable memory system. The reconfigurable memory system contains three hierarchies, block translation cache, reconfgurable datapath, and physical memories. The reconfigurable datapath allows arbitrary parallel 2D access patterns including row, column, block, and subsampling by run-time reconfigurations. The block translation cache uses one tag entry to represent a block of pixels in a frame. In chip implementation, proposed reconfigurable IME uses TSMC 0.18 μm CMOS 1P6M process. According to chip measurement result, it may support from QCIF with 0.24mW up to HDTV1920p with 49.86mW. In system implementation, it has verified in low-power and power-aware H.264 encoder, fabricated with TSMC 0.18 μm CMOS 1P6M process. Final power consumption is 2.8mW ~67.2mW. In this different system, proposed reconfigurable motion estimation and reconfigurable memory provide proper power and quality tradeoff. Proposed architecture is very suitable for mobile application with high definition video requirement.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T06:01:31Z (GMT). No. of bitstreams: 1 ntu-96-R94943005-1.pdf: 1668139 bytes, checksum: 71abbf34acad985b7690d29fc9e011a1 (MD5) Previous issue date: 2007	en
dc.description.tableofcontents	1 Introduction 1 1.1 Video Coding Systems and standards 2 1.2 Concept of Motion Estimation 3 1.3 Power Issues 6 1.4 Research Contributions 8 1.4.1 Power-scalable Reconfigurable Motion Estimation for HDTV application 10 1.4.2 Reconfigurable Memory System 10 1.5 Thesis Organization 11 2 Related Motion Estimation and Memory System Researches 13 2.1 Motion Estimation 13 2.1.1 Existing Block-Matching Algorithms of Motion Estimation 14 2.1.2 Previous Architectures of Motion Estimation 18 2.2 Existing Research for Memory System 22 2.2.1 Search Window Data Reuse 23 2.2.2 Memory Organization for Parallel Access 24 3 Algorithm and Architecture Design of Reconfigurable Motion Estimation 27 3.1 Fundamental and Problem Definition 28 3.1.1 Power Aware Computing and Content Awareness 28 3.1.2 Power Reduction Techniques 29 3.2 Proposed Hardware-oriented Fast Algorithm 31 3.2.1 Data Reuse Consideration 32 3.2.2 Adaptive Moving Window Expansion 32 3.2.3 Content-Aware Moving Window Shrinking 34 3.2.4 Content-Aware ME Pre-Skip Algorithm 35 3.2.5 Procedure of content-adaptive parallel-VBS fast search 37 3.2.6 Summary 38 3.3 Reconfigurable IME Architecture Design 39 3.3.1 Parallel Hardware with Inter-candidate Data Reuse 39 3.3.2 Proposed Techniques for Inter-Candidate DR 41 3.3.3 Architecture Design with ROM-Based Control Core 43 3.4 Simulation Results 44 3.4.1 Performance of the Proposed Hardware Oriented Fast Algorithm 44 3.4.2 Performance of the Proposed Architecture for Inter-Candidate Data Reuse 48 4 Design of Reconfigurable Memory System 49 4.1 Basic Concept of Reconfigurable Memory 50 4.2 Reconfigurable and Scalable Parallel 2D Access 52 4.2.1 Ladder-shaped SW arrangement 52 4.2.2 Architecture Design of Reconfigurable 2D Access Control 54 4.3 Block-based Translation Cache 57 4.3.1 Preliminary 59 4.3.2 Caching Algorithm 59 4.3.3 Experimental Results 60 5 Chip Implementation 65 5.1 Low Power Techniques 65 5.1.1 Power Evaluation and Measurement 65 5.1.2 Module-wise Clock Gating 66 5.1.3 Voltage-scaling 67 5.2 Design Flow 68 5.3 Design for Test Considerations 73 5.3.1 Test Consideration 73 5.4 Implementation Results 74 5.4.1 Reconfigurable IME Engine for HDTV Applications 74 5.4.2 System Implementation at Low-power and Power-aware Encoder 79 6 Conclusion 83
dc.language.iso	en
dc.title	高畫質視訊可重組式移動估計之架構分析及實作	zh_TW
dc.title	Analysis and Implementation of Reconfigurable Architecture of Motion Estimation for HDTV Applications	en
dc.type	Thesis
dc.date.schoolyear	95-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	劉志尉(Chih-Wei Liu),蔡宗漢(Tsung-Han Tsai),陳永昌(Yung-Chang Chen),賴永康(Yeong-Kang Lai)
dc.subject.keyword	移動估計,視訊編碼,可重組架構,功率可調系統,高畫質視訊,	zh_TW
dc.subject.keyword	Motion Estimation,Video Coding,Reconfigurable Architecture,Power aware system,HDTV,	en
dc.relation.page	120
dc.rights.note	未授權
dc.date.accepted	2007-07-29
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 目前未授權公開取用	1.63 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。