感知導向視訊編碼器:人眼感知分析引擎之硬體架構設計及其於H.264 視訊編碼器之應用

Tung-Hsing Wu; 吳東興

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42958

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Tung-Hsing Wu	en
dc.contributor.author	吳東興	zh_TW
dc.date.accessioned	2021-06-15T01:30:26Z	-
dc.date.available	2011-07-23
dc.date.copyright	2009-07-23
dc.date.issued	2009
dc.date.submitted	2009-07-21
dc.identifier.citation	[1] Z. Wang, L. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. Image Proc., vol. 13, no. 4, Apr. 2004. [2] R.-L. Hsu, M. A.-M., and A. K. Jain, “Face Detection in Color Images,”IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 5, May 2002. [3] A.-C. Tsai, J.-F. Wang, J.-F. Yang, and W.-G. Lin, “Effective Subblock-Based Pixel-Based Fast Direction Detections for H.264 Intra Prediction,” IEEE Trans. Circuits Syst. Video Techn., vol. 18, no. 7, July 2008. [4] User Manual of SMIMS VeriEnterprise PCI Verification Board, SMIMS Technology Corporation. [5] Y.-W. Huang, T.-C. Chen, C.-H. Tsai, C.-Y. Chen, T.-W. Chen, C.-S. Chen, C.-F. Shen, S.-Y. Ma, T.-C. Wang, B.-Y. Hsieh, H.-C. Fang, and L.-G. Chen, “A 1.3TOPS H.264/AVC Single-Chip Encoder for HDTV Applications,” in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC2005), Feb. 2005. [6] T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W. Chen, and L.-G. Chen, “Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder,” IEEE Trans. Circuits Syst. Video Techn., vol. 16, pp. 673–688, June 2006. [7] J. G. Robson, “Spatial and temporal contrast sensitivity functions of the visual system,” J. Opt. Soc. Amer., vol. 56, pp. 1141–1142, 1966. [8] S. Daly, “Engineering Observations from Spatiovelocity and Spatiotemporal Visual Models,” IS&T/SPIE Conference on Human Vision and Electronic Imaging, vol. 3299, Jan. 1998. [9] C.-H. Chou and Y.-C. Li, “A Perceptually Tuned Subband Image Coder Based on the Measure of Just-Noticeable-Distortion Profile,” IEEE Trans. Circuits Syst. Video Techn., vol. 5, no. 6, Dec. 1995. [10] C.-H. Chou and C.-W. Chen, “A Perceptually Optimized 3-D Subband Codec for Video Communication over Wireless Channels,” IEEE Trans. Circuits Syst. Video Techn., vol. 6, no. 2, Apr. 1996. [11] A. B. Watson, G. Y. Yang, J. A. Solomon, and J. Villasenor, “Visibility of wavelet quantization noise,” IEEE Trans. Image Proc., vol. 6, no. 8, pp. 1164–1175, Aug. 1997. [12] J. Malo, J. Gutierrez, I. Epifanio, F. Ferri, and J. M. Artigas, “Perceptual feedback in multigrid motion estimation using an improved dct quantization,” IEEE Trans. Image Proc., vol. 10, no. 10, pp. 1411–1427, Oct. 2001. [13] C.-W. Tang, C.-H. Chen, Y.-H. Yu, and C.-J. Tsai, “Visual Sensitivity Guided Bit Allocation for Video Coding,” IEEE Trans. Multimedia, vol. 8, no. 1, Feb. 2006. [14] ITU-R Recommendation BT.500-11, Methodology for the subjective assessment of the quality of television pictures, ITU, 2002. [15] M. H. Pinson and S. Wolf, “Comparing subjective video quality testing methodologies,” Visual Communications and Image Processing, vol. 5150, pp. 573–582, 2003. [16] F. X. J. Lukas and Z. L. Budrikis, “Picture quality prediction based on a visual model,” IEEE Commun. Lett., vol. 30, no. COMM-7, pp. 1679–1692, July 1982. [17] S.-C. Pei and C.-L. Lai, “Very low bit-rate coding algorithm for stereo video with spatio-temporal HVS model and binary correlation disparity estimator,” IEEE J. Select. Areas Commun., vol. 16, no. 1, pp. 98–107, Jan. 1998. [18] S. Pattanaik H. Yee and D. P. Greenberg, “Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments,” ACM Trans. Comput. Graph., vol. 20, no. 1, pp. 39–65, Jan. 2001. [19] D. H. Kelly, “Motion and Vision II. Stabilized Spatio-temporal Threshold Surface,” Journal of the Optical Society of America 69, pp. 1340–1349, 1979. [20] T. Adiono, T. Isshiki, K. Ito, T. Ohtsuka, D. Li, C. Honsawek, and H. Kunieda,“Face focus coding under H.263+ video coding standard,” in Proc. Int. Conf. Asia-Pacific Circuits and Systems, Dec. 2000, pp. 461–464. [21] C.-W. Wong, O. C. Au, B. Meng, and H.-K. Lam, “Perceptual rate control for low-delay video communications,” in Proc. Int. Conf. Multimedia and Expo., July 2003, vol. 3, pp. 361–364. [22] S. Lee, M. S. Pattichis, and A. C. Bovik, “Foveated video compression with optimal rate control,” IEEE Trans. Image Proc., vol. 10, no. 7, pp. 977–992, July 2001. [23] W. James, “The principles of psychology,” New York: Holt, pp. 403–404, 1890. [24] H.-C. Nothdurft, “Salience from feature contrast: additivity across dimensions,” Vis. Res., vol. 40, no. 10-12, pp. 1183–1021, June 2000. [25] L. Itti, C. Koch, and E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 11, Nov. 1998. [26] H. R. Wu, “A new distortion measure for video coding blocking artifacts,” in Proc. ICCT, May 1996, vol. 2, pp. 365–374. [27] A. R. Koene and L. Zhaoping, “Feature-specific interactions in salience from combined feature contrasts: Evidence for a bottom-up saliency map in V1,” Journal of Vision, vol. 7, no. 6, July 2007. [28] C. Koch and S. Ullman, “Shifts in selective visual-attention towards the underlying neural circuitry,” Human Neurobiology, vol. 4, no. 4, pp. 219–227, 1985. [29] S. Treue and J. C. M. Trujillo, “Feature-based attention influences motion processing gain in macaque visual cortex,” Nature, vol. 399, pp. 575–579, June 1999. [30] A. B. Watson, J. Hu, and J. F. McGowan III, “DVQ: A digital video quality metric based on human vision,” J. Elect. Imag., vol. 10, no. 1, pp. 20–29, Jan. 2001. [31] A. J. Ahumada and H. A. Peterson, “Luminance-model-based DCT quantization for color image compression,” in Proc. SPIE Human Vision, Visual Processing, and Digital Display III, 1992, vol. 1666, pp. 365–374. [32] Z. Wang, H. R. Sheikh, and A. C. Bovik, “No-reference perceptual quality assessment of jpeg compressed images,” in Proceedings of IEEE 2002 International Conferencing on Image Processing, 2002, pp. 477–480. [33] Aydin, O. Tunc,Mantiuk, Rafal, Myszkowski, Karol, and H.-P. Seidel, “Dynamic Range Independent Image Quality Assessment,” in Proceedings of ACM SIGGRAPH 2008, 2008, vol. 27, pp. 1–10. [34] W. Lin Z. Lu, X. Yang, E. Ong, and S. Yao, “Modeling Visual Attention’s Modulatory Aftereffect on Visual Sensitivity and Quality Evaluation,” IEEE Trans. Image Proc., vol. 14, no. 11, Nov. 2005. [35] E. Ong, X. Yang, W. Lin, Z. Lu, S. Yao, X. Lin, S. Rahardja, and C. Boon,“Perceptual Quality and Objective Quality Measurements of Compressed Videos,” Journal of Visual Communication and Image Representation, vol. 17, no. 4, pp. 717–737, Aug. 2006. [36] X.K. Yang, W.S. Ling, Z.K. Lu, E.P. Ong, and S.S. Yao, “Just noticeable distortion model and its applications in video coding,” Signal Processing: Image Communication, vol. 20, no. 7, pp. 662–680, August 2005. [37] X.K. Yang, W.S. Ling, Z.K. Lu, E.P. Ong, and S.S. Yao, “Motion-Compensated Residue Preprocessing in Video Coding Based on Just-Noticeable-Distortion Profile,” IEEE Trans. Circuits Syst. Video Techn., vol. 15, no. 6, pp. 742–752, June 2005. [38] Y. Liu, Z. G. Li, and Y. C. Soh, “A Novel Rate Control Scheme for Low Delay Video Communication of H.264/AVC Standard,” IEEE Trans. Circuits Syst. Video Techn., vol. 17, no. 1, Jan. 2007. [39] J.M. Wolfe and T.S. Horowitz, “What Attributes Guide the Deployment of Visual Attention and How Do They Do It?,” Nature Rev. Neuroscience, vol. 5, pp. 1–7, 2004. [40] “Subjective test results for the CfP on Scalable Video Coding Technology,” in MPEG Meeting Doc. N6383, Mar. 2004. [41] “Subjective test results for the CfP on Multi-view Video Coding,” in MPEG Meeting Doc. N7779, Jan. 2009. [42] Tung-Hsing Wu, Guan-Lin Wu, and Shao-Yi Chien, “Bio-Inspired Perceptual Video Encoding Based on H.264/AVC,” in IEEE International Symposium on Circuits and Systems, May 2009.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/42958	-
dc.description.abstract	現存的多媒體技術已經深深影響人的生活，由於對計算複雜性和傳輸頻寬的限制，高品質的影像視訊處理還有進步的空間，雖然最新的視訊壓縮標準H.264/AVC可提供數十到數百壓縮比率，但由於視訊影片最後的接收者是人的眼睛，傳統的標準只使用峰值信號雜訊比(Peak Signal to Noise Ratio, PSNR)來當成壓縮視訊影像的品質指標，並沒有考慮太多在人類視覺系統(Human Visual System, HVS)的特性，因此視訊壓縮的位元分配也就沒有對人眼的感知做最佳化的處理。因此，如何在有限的頻寬內有效地為不同的視訊內容分配位元率是很重要的，使用適當的位元分配，例如在畫面中重要的區域分配到更多的位元率，相對的，不重要的區域則分配到的較少的位元，如此一來位元率將可被有效的降低並提供更好的壓縮視訊品質。此一做法的關鍵點在於如何考慮並使用HVS中人眼感知的特性，但是一個可以模擬人眼感知視覺特性進一步使用在編碼系統來降低位元率的系統，通常需要龐大計算複雜性和系統頻寬，因此它在編碼系統中常常無法達到即時處理的效果。因此，我們提出人眼知覺評估引擎的演算法，用來加強視訊編碼器在位元分配率上的功能，最後並更進一步提出有效率的硬體架構設計。本篇論文的主要目標在於模擬HVS中的人眼感知特性，提出一知覺評估引擎，其必須能分析目前系統正在處理的視訊畫面的內容，並且決定這些數據可以被分配到的位元率多寡。	zh_TW
dc.description.abstract	The existing multimedia has been affecting the life of human beings nowadays. Due to the limitation of computation complexity and transmission bandwidth, the data processing of the high quality video needs improvement. The newest video compression standard, H.264/AVC, offers tens of to hundreds of compression ratio. The final receiver of the video information is human eyes. However, the traditional standard only uses Peak Signal-to-Noise-Ratio (PSNR) as the quality index for compressed video bit stream. PSNR index does not consider the properties in human visual system (HVS). The bit allocation of the video bit stream is usually not optimized for the perception of human eyes. How to allocate the bit rate for different content of the video effectively within a limited bandwidth is important. With proper allocation of bits, such as more bits for important area in one frame and fewer bits for indifferent area, the bit rate can be reduced. In other words, the compressed video shows better perceptual quality compared with the other compressed video in the same bit rate. The key point in bit allocation is considering the human eye perception in HVS. But the system which can model properties in the human eye perception and reduce the bit rate of the video bit stream with offering the same perceptual quality often needs a huge computation complexity and system bandwidth. It cannot satisfy the real time requirements in video encoding systems. Therefore, we proposed a bio-inspired human eye perception evaluation algorithm, which can improve the functionality of bit allocation of video encoders, and we further proposed an efficient hardware architecture. The main target of this thesis is modeling the properties in HVS. One perception evaluation engine must analyze the content of current video frame data and determine the bit allocation for these data.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T01:30:26Z (GMT). No. of bitstreams: 1 ntu-98-R96943006-1.pdf: 7322488 bytes, checksum: 051fade32e5612867777985f03f8d8ae (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	Abstract ix 1 INTRODUCTION 1 1.1 The Compression in Video Coding Standard . . . . . . . . . . . . 1 1.2 Human Visual System . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Quality Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 THE RELATED RESEARCH WORKS OF VISUAL CONSIDERATION FOR VIDEO CODING 9 2.1 Efficient Coders with Visual Consideration . . . . . . . . . . . . . 9 2.2 Visual Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Visual Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Visual Quality Assessment . . . . . . . . . . . . . . . . . . . . . 14 3 PROPOSED PERCEPTUAL MODEL 15 3.1 Overview of the Proposed Algorithm . . . . . . . . . . . . . . . . 16 3.2 Perceptual Model for Encoding Intra Frames . . . . . . . . . . . . 17 3.3 Perceptual Model for Encoding Inter Frames . . . . . . . . . . . . 19 3.3.1 Spatial Consideration . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Temporal Consideration . . . . . . . . . . . . . . . . . . 28 3.4 Fusion of the Perceptual Models . . . . . . . . . . . . . . . . . . 32 3.4.1 QP of MB of Intra Frames . . . . . . . . . . . . . . . . . 32 3.4.2 QP of MB of Inter Frames . . . . . . . . . . . . . . . . . 32 4 HARDWARE IMPLEMENTATIONOF THE PROPOSED PERCEPTUAL MODEL 35 4.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Color Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Simplified Skin Color Detection . . . . . . . . . . . . . . . . . . 44 4.4 SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5 JND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.6 Chip Design Flow and Specification . . . . . . . . . . . . . . . . 49 5 FPGA VERIFICATION AND EMULATION 55 6 EXPERIMENTAL RESULTS OF THE PROPOSED PERCEPTUAL MODEL 61 6.1 Bit-rate Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Subjective Perception Experiment . . . . . . . . . . . . . . . . . 62 7 CONCLUSIONS 77 Reference 79
dc.language.iso	zh-TW
dc.title	感知導向視訊編碼器:人眼感知分析引擎之硬體架構設計及其於H.264 視訊編碼器之應用	zh_TW
dc.title	Perception-Aware Video Encoder: Hardware Architecture Design of Bio-Inspired Human Eyes Perception Evaluation Engine for H.264 Video Encoder	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳美娟(Mei-Juan Chen),張添烜(Tian-Sheuan Chang),童怡新(Yi-Shin Tung),黃毓文(Yu-Wen Huang)
dc.subject.keyword	H.264視訊編碼,人眼感知系統,注視模型,感知模型,對比敏感函數,對比,	zh_TW
dc.subject.keyword	Video encoder,H.264,Inter,Intra,Human visual system,Attention model,Perceptual model,SSIM,JND,Motion,Contrast sensitivity function,Contrast,	en
dc.relation.page	84
dc.rights.note	有償授權
dc.date.accepted	2009-07-21
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 目前未授權公開取用	7.15 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。