Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74817
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor雷欽隆(Chin-Laung Lei)
dc.contributor.authorChih-Fan Hsuen
dc.contributor.author許之凡zh_TW
dc.date.accessioned2021-06-17T09:08:09Z-
dc.date.available2019-12-02
dc.date.copyright2019-12-02
dc.date.issued2019
dc.date.submitted2019-11-20
dc.identifier.citation[1] B. A. Wandell, Foundations of Vision, Sinauer Associates, 1995.
[2] M. Fiedler, T. Hossfeld and P. Tran-Gia, 'A generic quantitative relationship between quality of experience and quality of service,' IEEE Network, vol. 24, no. 2, pp. 36-41, 2010.
[3] M. Cook, 'Gaze and Mutual Gaze in Social Encounters,' American Scientist, vol. 65, no. 3, pp. 328-333, 1977.
[4] M. Argyle and J. Dean, 'Eye-Contact, Distance and Affiliation,' Sociometry, vol. 28, no. 3, p. 289–304, 1965.
[5] M. T. Schultheis and A. A. Rizzo, 'The application of virtual reality technology in rehabilitation,' Rehabilitation Psychology, vol. 46, no. 3, pp. 296-311, 2001.
[6] T. S. Mujber, T. Szecsi and M. S. J. Hashmi, 'Virtual reality applications in manufacturing process simulation,' Journal of Materials Processing Technology, Vols. 155-156, no. 30, pp. 1834-1838, 2004.
[7] C. Youngblut, 'Educational Uses of Virtual Reality Technology,' Institute for Defence Analyses, 1998.
[8] D. A. Guttentag, 'Virtual reality: Applications and implications for tourism,' Tourism Management, vol. 31, no. 5, pp. 637-651, 2010.
[9] D. A. Bowman and R. P. McMahan, 'Virtual Reality: How Much Immersion Is Enough?,' Computer, vol. 40, no. 7, pp. 36-43, 2007.
[10] D. Beeler and A. Gosalia, 'Asynchronous Timewarp on Oculus Rift,' Oculus, 2016. [Online]. Available: https://developer.oculus.com/blog/asynchronous-timewarp-on-oculus-rift/.
[11] 'FOVE,' [Online]. Available: https://www.getfove.com/.
[12] 'HTC Vive Pro Eye,' [Online]. Available: https://www.vive.com/eu/pro-eye/.
[13] B. Guenter, M. Finch, S. Drucker, D. Tan and J. Snyder, 'Foveated 3D Graphics,' ACM Transactions on Graphics, vol. 31, no. 6, pp. 164:1-164:10, 2012.
[14] N. T. Swafford, J. A. Iglesias-Guitian, C. Koniaris, B. Moon, D. Cosker and K. Mitchell, 'User, metric, and computational evaluation of foveated rendering methods,' Proceedings of the ACM Symposium on Applied Perception, pp. 7-14, 2016.
[15] A. Patney, M. Salvi, J. Kim, A. Kaplanyan, C. Wyman, N. Benty, D. Luebke and A. Lefohn, 'Towards foveated rendering for gaze-tracked virtual reality,' ACM Transactions on Graphics, vol. 35, no. 6, pp. 179:1-179:12, 2016.
[16] S. Lee, M. S. Pattichis and A. C. Bovik, 'Foveated video quality assessment,' IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 129-132, 2002.
[17] K. K. Ball, B. L. Beard, D. L. Roenker, R. L. Miller and D. S. Griggs, 'Age and visual search: Expanding the useful,' Journal of the Optical Society of America, vol. 5, no. 12, pp. 2210-2219, 1988.
[18] E. M. Reingold, L. C. Loschky, G. W. McConkie and D. M. Stampe, 'Gaze-Contingent Multiresolutional Displays: An Integrative Review,' Human Factors, vol. 45, no. 2, pp. 307-328, 2003.
[19] H. Strasburger, I. Rentschler and M. Jüttner, 'Peripheral vision and pattern recognition: A review,' Journal of Vision, vol. 11, no. 5, pp. 13:1-13:82, 2011.
[20] P. Lungaro and K. Tollmar, 'Eye-gaze Based Service Provision and QoE Optimization,' 5th ISCA/DEGA Workshop on Perceptual Quality of Systems, pp. 29-31, 2016.
[21] 'Webinar and Webcast Market : Global Demand, Growth Analysis & Opportunity Outlook 2023,' 2019. [Online]. Available: https://www.researchnester.com/reports/webinarand-webcast-market-global-demand-growth-analysis-opportunity-outlook-2023/237.
[22] 'Video Streaming Market Research Report- Global Forecast 2023,' Market research future, 2023. [Online]. Available: https://www.marketresearchfuture.com/reports/video-streaming-market-3150.
[23] P. S. N. Lee, L. Leung, V. Lo, C. Xiong and T. Wu, 'Internet Communication Versus Face-to-face Interaction in Quality of Life,' Social Indicators Research, vol. 100, no. 3, pp. 375-389, 2011.
[24] 'Harry styles video chats with james corden,' The Late Late Show with James Corden, 2017. [Online]. Available: https://www.youtube.com/watch?v=H7ZjRna4ZK4.
[25] L. S. Bohannon, A. M. Herbert, J. B. Pelz and E. M. Rantanen, 'Eye contact and video-mediated communication: A review,' Displays, vol. 34, no. 2, pp. 177-185, 2013.
[26] G. Doherty-Sneddon, A. Anderson, C. O'Malley, S. Langton, S. Garrod and V. Bruce, 'Face-to-face and video-mediated communication: A comparison of dialogue structure and task performance,' Journal of Experimental Psychology: Applied, vol. 3, no. 2, pp. 105-125, 1997.
[27] R. Vertegaal, I. Weevers, C. Sohn and C. Cheung, 'GAZE-2: conveying eye contact in group video conferencing using eye-controlled camera direction,' Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 521-528, 2003.
[28] E. M. Tapia, S. S. Intille, J. Rebula and S. Stoddard, 'Concept and Partial Prototype Video: Ubiquitous Video Communication with the Perception of Eye Contact,' Proceedings of UBICOMP 2003 Video Program, 2003.
[29] A. Jones, M. Lang, G. Fyffe, X. Yu, J. Busch, I. McDowall, M. Bolas and P. Debevec, 'Achieving eye contact in a one-to-many 3D video teleconferencing system,' ACM Transactions on Graphics, vol. 28, no. 3, p. 64:1–64:8, 2009.
[30] B. M. Rappoport, C. J. Stringer, F. R. Rothkopf, J. C. Franklin, J. P. Ternus, J. C. Hoenig, S. A. MYERS, S. B. Lynch and R. P. Howarth, 'Devices and methods for providing access to internal component'. United States Patent US20130135328A1, 2013.
[31] T. OGITA, S. Takanashi and S. Takatsuka, 'Sensor-equipped display apparatus and electronic apparatus'. United States Patent US20120069042A1, 2012.
[32] F. Solina and R. Ravnik, 'Fixing missing eye-contact in video conferencing systems,' Proceedings of the ITI 2011, 33rd International Conference on Information Technology Interfaces, pp. 233-236, 2011.
[33] A. Jaklič, F. Solina and L. Šajn, 'User interface for a better eye contact in videoconferencing,' Displays, vol. 46, pp. 25-36, 2017.
[34] J. Gemmell, K. Toyama, C. L. Zitnick, T. Kang and S. Seitz, 'Gaze awareness for video-conferencing: a software approach,' IEEE MultiMedia, vol. 7, no. 4, pp. 26-38, 2000.
[35] M. Dumont, S. Rogmans, S. Maesen and P. Bekaert, 'Optimized Two-Party Video Chat with Restored Eye Contact Using Graphics Hardware,' International Conference on E-Business and Telecommunications, pp. 358-372, 2008.
[36] R. Yang and Z. Zhang, 'Eye Gaze Correction with Stereovision for Video-Teleconferencing,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 7, pp. 956-960, 2004.
[37] E.-T. Baek and Y.-S. Ho, 'Gaze correction using feature-based view morphing and performance evaluation,' Signal, Image and Video Processing, vol. 11, no. 1, pp. 187-194, 2017.
[38] A. Criminisi, J. Shotton, A. Blake and P. H. S. Torr, 'Gaze manipulation for one-to-one teleconferencing,' Proceedings Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 191-198, 2003.
[39] C. Kuster, T. Popa, J.-C. Bazin, C. Gotsman and M. Gross, 'Gaze correction for home video conferencing,' ACM Transactions on Graphics, vol. 31, no. 6, p. 174:1–174:6, 2012.
[40] D. Giger, J.-C. Bazin, C. Kuster, T. Popa and M. Gross, 'Gaze correction with a single webcam,' IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6, 2014.
[41] D. Weiner and N. Kiryati, 'Virtual gaze redirection in face images,' International Conference on Image Analysis and Processing, pp. 76-81, 2003.
[42] L. Wolf, Z. Freund and S. Avidan, 'An eye for an eye: A single camera gaze-replacement method,' IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 817-824, 2010.
[43] Y. Qin, K.-C. Lien, M. Turk and T. Höllerer, 'Eye Gaze Correction with a Single Webcam Based on Eye-Replacement,' International Symposium on Visual Computing, pp. 599-609, 2015.
[44] E. Wood, T. Baltrušaitis, L.-P. Morency, P. Robinson and A. Bulling, 'GazeDirector: Fully Articulated Eye Gaze Redirection in Video,' EUROGRAPHICS, vol. 37, pp. 2:1-2:9, 2018.
[45] E. Wood, T. Baltrusaitis, L.-P. Morency, P. Robinson and A. Bulling, 'Learning an appearance-based gaze estimator from one million synthesised images,' Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, pp. 131-138, 2016.
[46] D. Kononenko and V. Lempitsky, 'Learning to look up: Realtime monocular gaze correction using machine learning,' IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4667-4675, 2015.
[47] Y. Ganin, D. Kononenko, D. Sungatullina and V. Lempitsky, 'DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation,' European Conference on Computer Vision, pp. 311-326, 2016.
[48] 'Recommendation ITU-R BT.500-13,' 2012.
[49] R. K. Mantiuk, A. Tomaszewska and R. Mantiuk, 'Comparison of Four Subjective Methods for Image Quality Assessment,' Computer Graphics Forum, vol. 31, no. 8, pp. 2478-2491, 2012.
[50] G. A. Gescheider , Psychophysics: The Fundamentals, Psychology Press, 1975.
[51] W. Wu, A. Arefin, G. Kurillo, P. Agarwal, K. Nahrstedt and R. Bajcsy, 'Color-plus-depth level-of-detail in 3D tele-immersive video: a psychophysical approach,' ACM international conference on Multimedia, pp. 13-22, 2011.
[52] V. Kazemi and J. Sullivan, 'One millisecond face alignment with an ensemble of regression trees,' IEEE Conference on Computer Vision and Pattern Recognition, p. 1867–1874, 2014.
[53] D. E. King, 'Dlib-ml: A Machine Learning Toolkit,' Journal of Machine Learning Research, vol. 10, p. 1755–1758, 2009.
[54] B. Xu, N. Wang, T. Chen and M. Li, 'Empirical Evaluation of Rectified Activations in Convolutional Network,' International Conference on Machine Learning Deep Learning Workshop, pp. 6-11, 2015.
[55] S. Ioffe and C. Szegedy, 'Batch normalization: accelerating deep network training by reducing internal covariate shift,' Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 448-456, 2015.
[56] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu and X. Zheng, 'TensorFlow: Large-scale machine learning on heterogeneous systems,' 2015.
[57] D. P. Kingma and J. Ba, 'Adam: A method for stochastic optimization,' Proceedings of the 3rd International Conference on Learning Representations, 2015.
[58] A. Villanueva, V. Ponz, L. Sesma-Sanchez, M. Ariz, S. Porta and R. Cabeza, 'Hybrid method based on topography for robust detection of iris center and eye corners,' ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 9, no. 4, pp. 25:1-25:20, 2013.
[59] Q. He, X. Hong, X. Chai, J. Holappa, G. Zhao, X. Chen and M. Pietikäinen, 'OMEG: Oulu Multi-Pose Eye Gaze Dataset,' Scandinavian Conference on Image Analysis, pp. 418-427, 2015.
[60] B. A. Smith, Q. Yin, S. K. Feiner and S. K. Nayar, 'Gaze locking: passive eye contact detection for human-object interaction,' ACM symposium on User interface software and technology, pp. 271-280, 2013.
[61] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Pearson Education, Inc., 2003.
[62] N. A. Dodgson, 'Variation and extrema of human interpupillary distance,' Proceedings of SPIE - The International Society for Optical Engineering, Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 36-46, 2004.
[63] Z. Zhang, P. Luo, C. C. Loy and X. Tang, 'Facial Landmark Detection by Deep Multi-task Learning,' European Conference on Computer Vision, pp. 94-108, 2014.
[64] A. Zadeh, Y. C. Lim , T. Baltrušaitis and L.-P. Morency, 'Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection,' IEEE International Conference on Computer Vision Workshops, pp. 2519-2528, 2017.
[65] R. Ranjan, S. Sankaranarayanan, C. D. Castillo and R. Chellappa, 'An All-In-One Convolutional Neural Network for Face Analysis,' IEEE International Conference on Automatic Face & Gesture Recognition, pp. 17-24, 2017.
[66] Y. Wu, T. Hassner, K. Kim, G. Medioni and P. Natarajan, 'Facial Landmark Detection with Tweaked Convolutional Neural Networks,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3067-3074, 2018.
[67] W. Wu, Q. Chen, S. Yang, Q. Wang, Y. Cai and Q. Zhou, 'Look at Boundary: A Boundary-Aware Face Alignment Algorithm,' IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129-2138, 2018.
[68] R. Ranjan, V. M. Patel and R. Chellappa, 'HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 121-135, 2019.
[69] J. Yang, Q. Liu and K. Zhang, 'Stacked Hourglass Network for Robust Facial Landmark Localisation,' IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2025-2033, 2017.
[70] A. Bulat and G. Tzimiropoulos, 'Two-Stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge,' European Conference on Computer Vision, pp. 616-624, 2016.
[71] Y. Sun, X. Wang and X. Tang, 'Deep Convolutional Network Cascade for Facial Point Detection,' IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476-3483, 2013.
[72] E. Zhou, H. Fan, Z. Cao, Y. Jiang and Q. Yin, 'Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade,' IEEE International Conference on Computer Vision Workshops, pp. 386-391, 2013.
[73] H. Fan and E. Zhou, 'Approaching human level facial landmark localization by deep learning,' Image and Vision Computing, vol. 47, pp. 27-35, 2016.
[74] J. Lv, X. Shao, J. Xing, C. Cheng and X. Zhou, 'A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection,' IEEE Conference on Computer Vision and Pattern Recognition, pp. 3691-3700, 2017.
[75] J. Zhang, S. Shan, M. Kan and X. Chen, 'Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment,' European Conference on Computer Vision, pp. 1-16, 2014.
[76] M. Kowalski, J. Naruniec and T. Trzcinski, 'Deep Alignment Network: A convolutional neural network for robust face alignment,' IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 2034-2043, 2017.
[77] K. Zhang, Z. Zhang , Z. Li and Y. Qiao, 'Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,' IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[78] Z. He, J. Zhang, M. Kan, S. Shan and X. Chen, 'Robust FEC-CNN: A High Accuracy Facial Landmark Detection System,' IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2044-2050, 2017.
[79] X. Chai, Q. Wang, Y. Zhao and Y. Li, 'Robust facial landmark detection based on initializing multiple poses,' International Journal of Advanced Robotic Systems, vol. 13, no. 5, p. 1729881416662793, 2016.
[80] X. Dong, Y. Yan, W. Ouyang and Y. Yang, 'Style Aggregated Network for Facial Landmark Detection,' IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 379-388, 2018.
[81] Y. Wu and Q. Ji, 'Facial Landmark Detection: a Literature Survey,' International Journal of Computer Vision, vol. 127, no. 2, pp. 115-142, 2019.
[82] E. Shelhamer, J. Long and T. Darrell, 'Fully Convolutional Networks for Semantic Segmentation,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, 2017.
[83] A. Bulat and G. Tzimiropoulos, 'Super-FAN: Integrated Facial Landmark Localization and Super-Resolution of Real-World Low Resolution Faces in Arbitrary Poses with GANs,' IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 109-117, 2018.
[84] J. P. Robinson, Y. Li, N. Zhang, Y. Fu and S. Tulyakov, 'Laplace Landmark Localization,' CoRR, 2019.
[85] A. Bulat and G. Tzimiropoulos, 'How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks),' International Conference on Computer Vision, pp. 1021-1030, 2017.
[86] A. S. Jackson, M. Valstar and G. Tzimiropoulos, 'A CNN Cascade for Landmark Guided Semantic Part Segmentation,' European Conference on Computer Vision Workshops, pp. 143-155, 2016.
[87] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, 'Generative Adversarial Nets,' Advances in Neural Information Processing Systems, vol. 27, pp. 2672-2680, 2014.
[88] S. Yang, P. Luo, C. C. Loy and X. Tang, 'WIDER FACE: A Face Detection Benchmark,' IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525-5533, 2016.
[89] V. Le, J. Brandt, Z. Lin, L. Bourdev and T. S. Huang, 'Interactive Facial Feature Localization,' European Conference on Computer Vision, pp. 679-692, 2012.
[90] P. N. Belhumeur, D. W. Jacobs , D. J. Kriegman and N. Kumar, 'Localizing Parts of Faces Using a Consensus of Exemplars,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2930-2940, 2013.
[91] Z. Zhang, P. Luo, C. C. Loy and X. Tang, 'Learning Deep Representation for Face Alignment with Auxiliary Attributes,' IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 38, pp. 918-930, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74817-
dc.description.abstract伴隨著網路頻寬、硬體設備以及演算法的發展,多媒體應用程式逐漸地參與人們的日常活動。以往由開發者決定的系統設計以不符合現今消費者對於多媒體應用服務之需求。以使用者為中心設計,使系統能提供更好的服務已然成為現今系統設計的主要目標。在眾多的設計面向中,令使用者能沉浸在多媒體系統服務內,使其有身於現實的感受,被視為能有效地提升使用者體驗(user experience, UX)的手段。因此,如何讓使用者在使用服務時產生沉浸感已然成為重要的研究主題。
本論文的研究重點在利用視線方向令消費者產生沉浸感以提升多媒體應用服務之使用者體驗的技術,尤其是改變影像之技術。除此之外,對於多媒體系統的使用者而言,使用多媒體系統通常是為了滿足日常娛樂或是人際溝通的需求。以此為依據,我們著重在兩個殺手級的應用服務:虛擬實境(virtual reality)與即時視訊溝通系統(live video communication)。而模糊現實與虛擬世界的邊界,令使用者沉浸其中,也正是這兩種應用服務的主要目的。
在虛擬實境的研究中,我們探討一嶄新的技術:注視點重點成像技術。此技術依據人類視覺中央清晰且外側模糊的特性,重新分配運算資源給較重要的影像區域。因此能在運算資源不變的環境下,提升使用者感受到之影像畫質。由於此技術正處於初期發展階段,沒有一套有系統的方法來量測注視點重點成像技術的體驗特質(Quality of Experience, QoE)。因此,我們利用四種主流的主觀評估方法評估注視點重點成像影像,並提出一種統一的衡量標準:察覺率(perceptual ratio),用以評估影像的品質。我們更利用察覺率測量主觀評估方法之效率(efficiency)與穩定性(consistency)為日後開發注視點重點成像技術時能依據實驗需求選擇最適切之評估方法,進行技術的評估與改進。
在即時視訊溝通系統的研究中,我們嘗試解決系統仍無法建立視線交會的問題。視線交會在溝通上是主要傳遞訊息的手段之一,其能透露交談者對於當前交談的專注程度與信心,藉此以提升溝通的質量。然而,現今的視訊溝通系統受限於無法建立視線交會,使得線上交談與面對面交談之間仍然有一定程度的差距。我們提出一套基於深度學習模型校正使用者視線之系統。此系統能夠即時與動態地依據使用者與攝影機之間的相對位置進行影像後製,以校正使用者之視線,在線上溝通中建立視線交會。我們利用主觀與客觀評估方法檢驗模型與系統之能力,並開源系統原始碼使其能夠幫助未來即時視訊溝通系統之發展。
zh_TW
dc.description.abstractFollowing the increase of the network bandwidth, the improvement of hardware devices, and the development of algorithms, multimedia applications gradually participate in our daily lives. Traditional system design determined by developers becomes insufficient to meet consumers’ demands. Improving the user experience (UX) of applications according to users’ opinions has become important. In numerous design considerations of multimedia systems, making consumers immersed in the services is a convincing means to improve the UX of systems. Hence, immersing users in services becomes an important research topic.
In this dissertation, we explore the technologies that use gaze direction to make users immersed to improve the UX, especially on the technologies that alter the image. Two killer applications, virtual reality (VR) and live video communication, are selected as our research target because the uses of multimedia applications are directly related to entertainment and communication from the viewpoint of consumers. Besides, making users immersed is the goal of both applications.
In the study of VR, we study a new technology-foveated rendering. Foveated rendering technologies leverage the human visual system to increase the perceived video quality under limited computing resources. However, no general and systematic framework subjectively evaluates foveated images. To deal with the problem, we measure the quality of foveated rendering images by four common subjective assessments and propose a unified quality of experience (QoE) metric, perceptual ratio, to evaluate the image quality. Furthermore, we measure the consistency and the efficiency of the subjective assessment methods according to the perceptual ratio. Our research results can be a foundation for the future development of foveated rendering technologies.
In the study of live video communication, we study a well-known but challenging problem, missing eye contact in live video commination. Eye contact is an important nonverbal language to convey the attentiveness and confidence in communication. However, current systems cannot establish eye contact in video communication. Hence, a gap between video and face-to-face communications still exists. We solve the problem by proposing a deep learning-based gaze correction system to real-time postprocess the image to correct users’ gaze directions for establishing eye contact in the video communication. The correction is based on the positions of both interlocutors’ head and the camera. The effectiveness of the proposed gaze correction system is objectively and subjectively evaluated. Furthermore, the implemented system is open-source and expect to improve the future development of live video communication.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T09:08:09Z (GMT). No. of bitstreams: 1
ntu-108-D04921011-1.pdf: 4583274 bytes, checksum: d73028c19dab7577afaf447c122750e2 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents中文摘要 i
Abstract ii
誌 謝 iv
List of Contents v
List of Figures viii
List of Tables xi
Chapter 1 Introduction 1
Chapter 2 Preliminary 6
2.1 Foveated Rendering in Virtual Reality 6
2.2 Eye Contact in Live Video Communication 8
2.2.1 Hardware-based Approaches 10
2.2.2 Software-based Approaches 11
Chapter 3 Evaluating Foveated Rendering 15
3.1 Experiment Materials 15
3.1.1 Testbed 15
3.1.2 Foveated Images 15
3.1.3 Tested Scenes 16
3.1.4 Fixation Points 17
3.2 Experiment Procedure 19
3.2.1 Stages 20
3.2.2 Foveal Parameters 21
3.3 Experiment Results 24
3.3.1 Single-Stimulus Absolute Category Rating 25
3.3.2 Double-Stimulus Quality Comparison 26
3.3.3 Descending Method 27
3.3.4 Ascending Method 27
3.4 Comparative Analysis 28
3.4.1 Perceptual Ratio: A Unified QoE Metric 28
3.4.2 Efficiency and Consistency of Quality Assessment Methods 31
3.4.3 Performance of Subjective Assessment Methods 32
3.4.4 Modeling of Perceptual Ratios 35
Chapter 4 A Gazes Redirection Model based on Convolutional Neural Network 37
4.1 Preprocessing 38
4.1.1 A Warping-based Convolutional Neural Network 40
4.1.2 Advanced Loss Functions 41
4.1.3 Training the Network 43
4.2 Dataset Collection 44
4.2.1 The DIRL Gaze Dataset 45
4.2.2 Collection Setup and Procedure 48
4.3 Experiment Results 49
4.3.1 Redirecting Gaze on Heterogeneous Datasets 50
4.3.2 Ablation Study 51
4.4 Image Quality Evaluations 53
4.5 People with Eyeglasses 57
4.6 Discussions 58
Chapter 5 A Gaze Correction System 59
5.1 Gaze Correction in Live Video Communication 59
5.2 Estimating the Eyeball Rotation Angles 60
5.2.1 Redirecting Gazes by A Convolutional Neural Network 63
5.2.2 Implementation Details 64
5.3 Establishing Eye Contact in Live Video Communication 65
5.4 Quality Evaluation for Our Gaze Correction System 68
5.4.1 Tested Scenarios 68
5.4.2 Experiment Procedure 69
5.4.3 Experiment Results 71
5.5 Discussions 79
Chapter 6 Improving the Detection Accuracy of CNN-Based Facial Landmark Detection Approaches 80
6.1 Facial Landmark Detection Approaches 81
6.1.1 Regression Approaches 81
6.1.2 Heatmap Approaches 84
6.2 A Hybrid Facial Landmark Detection Model 88
6.3 Model Implementation and Training 91
6.4 Experiment Results 92
6.4.1 Quantitative Evaluations 92
6.4.2 Ablation Study 98
6.5 Discussions 99
Chapter 7 Conclusions and Future Research Opportunities 102
Reference 107
dc.language.isoen
dc.subject視線交會zh_TW
dc.subject即時視訊溝通zh_TW
dc.subject注視點重點成像zh_TW
dc.subject體驗特質zh_TW
dc.subject使用者經驗zh_TW
dc.subject深度學習zh_TW
dc.subjectuser experienceen
dc.subjectquality of experienceen
dc.subjectfoveated renderingen
dc.subjectlive video communicationen
dc.subjecteye contacten
dc.subjectdeep learningen
dc.title朝用戶沉浸之多媒體應用邁進:以視線為研究zh_TW
dc.titleToward User Immersed Multimedia Applications: Studies on Eye Gazeen
dc.typeThesis
dc.date.schoolyear108-1
dc.description.degree博士
dc.contributor.coadvisor陳昇瑋(Sheng-Wei Chen)
dc.contributor.oralexamcommittee郭斯彥(Sy-Yen Kuo),顏嗣鈞(Hsu-chun Yen),王勝德(Sheng-De Wang),蕭旭君(Hsu-Chun Hsiao),廖弘源(Mark Liao)
dc.subject.keyword使用者經驗,體驗特質,注視點重點成像,即時視訊溝通,視線交會,深度學習,zh_TW
dc.subject.keyworduser experience,quality of experience,foveated rendering,live video communication,eye contact,deep learning,en
dc.relation.page115
dc.identifier.doi10.6342/NTU201904300
dc.rights.note有償授權
dc.date.accepted2019-11-21
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
4.48 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved