運用機器學習結合影像時空特徵之視覺注意力模型研究

Wen-Fu Lee; 李文甫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47777

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳宏銘(Homer H. Chen)
dc.contributor.author	Wen-Fu Lee	en
dc.contributor.author	李文甫	zh_TW
dc.date.accessioned	2021-06-15T06:17:58Z	-
dc.date.available	2013-08-12
dc.date.copyright	2010-08-12
dc.date.issued	2010
dc.date.submitted	2010-08-10
dc.identifier.citation	[1] C. Maioli, I. Benaglio, S. Siri, K. Sosta, and S. Cappa, “The integration of parallel and serial processing mechanisms in visual search: Evidence from eye movement recordings,” European Journal of Neuroscience, vol. 13, pp. 364–372, Jan. 2001. [2] J. M. Findlay, “Saccade target selection during visual search,” Vision Research, vol. 37, pp. 617–631, 1997. [3] G. Rizzolatti, L. Riggio, I. Dascola, and C. Umilta, “Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention,” Neuropsychologia, vol. 25, no. lA, pp. 31-40, 1987. [4] Y. S. Wang, C. L. Tai, O. Sorkine, T. Y. Lee, “Optimized scale-and-stretch for image resizing,” ACM Trans. Graph, vol. 27, no. 5, 2008. [5] H. Li and K. N. Ngan, “Saliency model-based face segmentation and tracking in head-and-shoulder video sequences,” Journal of Visual Communication and Image Representation, vol. 19, no. 5, 2008. [6] C. W. Tang, C. H. Chen, Y. H. Yu, and C. J. Tsai, “Visual sensitivity guided bit allocation for video coding,” IEEE Trans. Multimedia, vol. 8, no. 1, pp. 11-18, Feb. 2006. [7] S. M. Jin, I. B. Lee, J. M. Han, J. M. Seo, K. S. Park, “Context-based pixelization model for the artificial retina using saliency map and skin color detection algorithm,” Proc. SPIE Human Vision and Electronic Imaging XIII, vol. 6806, Feb. 2008. [8] A. P. Hillstrom and S. Yantis, “Visual motion and attentional capture,” Perception & Psychophysics, vol. 55, no. 4, pp. 399-411, Apr. 1994. [9] J. K. Tsotsos, Y. Liu, J. C. Martinez-Trujillo, M. Pomplun, E. Simine, and K. Zhou, “Attending to visual motion,” Computer Vision and Image Understanding, vol.100, no. 1-2, pp. 3-40, 2005. [10] K. R. Ansgar and Z. Li, “Feature-specific interactions in salience from combined feature contrasts: evidence for a bottom-up saliency map in V1,” Journal of Vision, vol. 7, pp. 1-14, 2007. [11] H. J. Müller and P. M. A. Rabbitt, “Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption,” Journal of Experimental Psychology: Human Perception and Performance, vol. 15, no. 2, pp. 315-330, May 1989. [12] R. Carmi and L. Itti, “Visual causes versus correlates of attentional selection in dynamic scenes” Vision Research, vol. 46, no. 26, pp. 4333-4345, Dec. 2006. [13] U. Engelke, H. J. Zepernick, and A. Maeder, “Visual attention modeling: region-of-interest versus fixation patterns,” IEEE Proc. 27th Picture Coding Symp., pp. 521-524, 2009. [14] A. Torralba, A. Oliva, M. S. Castelhano, and J. M. Henderson, “Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search,” Psychological Review, vol. 113, pp. 766–786, 2006. [15] T. Liu, J. Sun, N. N. Zheng, X. Tang, and H. Y. Shum, “Learning to detect a salient object,” IEEE CVPR, pp. 1-8, 2007. [16] Y. F. Ma, X. S. Hua, L. Lu, H. J. Zhang. 'A generic framework of user attention model and its application in video summarization', IEEE Trans. on Multimedia, vol. 7, no. 5, pp. 907--919, 2005. [17] O. L. Meur, P. L. Callet, and D. Barba, “Predicting visual fixations on video based on low-level visual features,” Vision Research, vol. 47, pp. 2483-2498, Sep. 2007. [18] L. Itti, N. Dhavale, and F. Pighin, “Realistic avatar eye and head animation using a neurobiological model of visual attention,” Proc. SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64-78, Aug. 2003. [19] H. Liu, S. Jiang, Q. Huang, and C. Xu, “A generic virtual content insertion system based on visual attention analysis,” Proc. of the 16th ACM international conference on Multimedia, pp. 379-388, 2008. [20] Y. Zhai and M. Shah, “Visual attention detection in video sequences using spatiotemporal cues,” Proc. of the 14th annual ACM international conference on Multimedia, pp. 815-824, Oct. 2006. [21] O. L. Meur, P. L. Callet, D. Barba, and D. Thoreau, “A spatio-temporal model of the selective human visual attention,” Proc. of the IEEE International Conference on Image Processing, pp. 1188–1191, 2005. [22] S. Li and M. C. Lee, “An efficient spatiotemporal attention model and its application to shot matching,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 10, pp. 1383–1387, 2007. [23] M. Z. Aziz and B. Mertsching, “Fast and robust generation of feature maps for region-based visual attention,” IEEE Trans. on Image Processing, vol. 17, no. 5, pp. 633–644, 2008. [24] R. Palermo and G. Rhodes, “Are you always on my mind? A review of how face perception and attention interact,” Neuropsychologia, vol. 45, no. l, pp. 75-92, 2007. [25] J. Krauskopf, D. R. Williams, and D. W. Heeley, “Cardinal directions of color space,” Vision Research, vol. 22, no. 9, pp. 1123-1131, 1982. [26] R. L. De Valois, “Behavioral and electrophysiological studies of primate vision,” In Contributions to Sensory Physiology (edited by Neff W. D.), vol. 1, pp. 137–138, 1965. [27] H. Senane, A. Saadane and, D. Barba, “The computation of visual bandwiths and their impact in image decomposition and coding”, International Conference and Signal Processing Applications and Technology, Santa-Clara, pp. 776-770, 1993. [28] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” IEEE Transactions on Image Processing, vol. 4, no. 1, pp. 105-107, 1995. [29] M. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography,” Commun. Assoc. Comp. Mach, vol. 24, pp. 381-395, 1981. [30] A. G. Leventhal, The neural basis of visual function: Vision and visual dysfunction, vol. 4, Boca Raton, FL: CRC Press, 1991. [31] M. Nilsson, J. Nordberg, and I. Claesson, “Face detection using local SMQT features and split up SNoW classifier,” IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 589–592, 2007. [32] C. C. Chang and C. J. Lin, LIBSVM: A library for support vector machines, 2001. [33] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. New York: John Wiley & Sons, Inc., 2000. [34] T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” IEEE 12th International Conference on Computer Vision, 2009. [35] O. L. Meur, P. L. Callet, D. Barba, and D. Thoreau, “A coherent computational approach to model bottom-up visual attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, 2006.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47777	-
dc.description.abstract	視覺注意力是一項人類視覺系統中的重要特色，它可以幫助影像處理與壓縮的技術做的更好。在本論文中，我們提出一個可從影像中擷取低階和高階特徵之計算機模型，並以機器學習的方式結合這兩階特徵，進而達到預測影像中人眼視覺注意力之分布情形，其中，低階特徵(色彩、方向、運動)之採用是基於對人類視覺細胞的研究，而作為高階特徵的人臉之採用是基於對人類溝通模式的研究，實驗結果證實此兩階特徵整合性模型之整體表現會比只用單階特徵之模型來的更為穩定。在過去預測注意力分布之模型中，被預測顯著的區域會有和實際人眼注視位置有所誤差的情形發生，對此我們提出之模型能夠學習特徵和被視覺注意區塊之間的關係，並利用此關係進而減少潛在誤差情形的發生，另一方面，為了增進此模型之學習效能，我們會根據人眼注視分佈的情況，選出具代表性之訓練樣本。實驗結果證實本篇研究所提出之模型可有效地預測人眼注意之分布。	zh_TW
dc.description.abstract	Visual attention is an important characteristic of human visual system, useful for image processing and compression. This paper proposes a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal. The low-level and high-level features are fused by using machine learning. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, whereas the adoption of human face as a high-level feature is based on the study of media communications. We show that such a scheme is more robust than those using purely single low- or high-level features. Unlike conventional techniques, our scheme is able to learn the relationship between features and visual attention to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We also show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T06:17:58Z (GMT). No. of bitstreams: 1 ntu-99-R97942039-1.pdf: 2639498 bytes, checksum: 04cedbbb026a5aa96abe31264323453b (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	誌謝 i 中文摘要 iii ABSTRACT v CONTENTS vii LIST OF FIGURES ix LIST OF TABLES xiii Chapter 1 Introduction 1 1.1 Visual Attention Model 2 1.2 Problem Statement 3 1.3 Research Contribution 4 1.4 Thesis Organization 5 Chapter 2 Methodology for Modeling Attention 7 2.1 Fixation Data Collection 7 2.1.1 Eye Tracking Apparatus 8 2.1.2 Subjects 9 2.1.3 Stimuli 9 2.1.4 Experimental Setup 10 2.1.5 Results 11 2.2 Fixation Density Estimation 12 2.3 Feature Extraction 14 2.3.1 Color 14 2.3.2 Motion 16 2.3.3 Orientation 18 2.3.4 Face 18 2.4 Training Sample Selection 18 2.4.1 Frame Selection 19 2.4.2 Pixel Selection 20 2.5 Regressor Training 22 2.6 Saliency Estimation 23 Chapter 3 Experimental Results 25 3.1 Obective Evaluation 25 3.1.1 ROC Analysis 25 3.1.2 LCC Analysis 26 3.2 Subjective Evaluation 30 3.3 The Particular Case of a Center Filter 33 3.4 Discussion 34 Chapter 4 Conclusion 37 REFERENCE 39
dc.language.iso	en
dc.title	運用機器學習結合影像時空特徵之視覺注意力模型研究	zh_TW
dc.title	Learning-Based Fusion of Spatiotemporal Visual Attention Cues for Video	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳良基,簡韶逸,葉素玲,王家祥
dc.subject.keyword	視覺注意力,顯著性圖像,人類視覺系統,眼球追蹤實驗,注視分佈,回歸,	zh_TW
dc.subject.keyword	Visual attention,saliency map,human visual system,eye tracking experiment,fixation distribution,regression,	en
dc.relation.page	43
dc.rights.note	有償授權
dc.date.accepted	2010-08-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 目前未授權公開取用	2.58 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。