朝用戶沉浸之多媒體應用邁進：以視線為研究

Chih-Fan Hsu; 許之凡

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74817

標題:	朝用戶沉浸之多媒體應用邁進：以視線為研究 Toward User Immersed Multimedia Applications: Studies on Eye Gaze
作者:	Chih-Fan Hsu 許之凡
指導教授:	雷欽隆(Chin-Laung Lei)
共同指導教授:	陳昇瑋(Sheng-Wei Chen)
關鍵字:	使用者經驗,體驗特質,注視點重點成像,即時視訊溝通,視線交會,深度學習, user experience,quality of experience,foveated rendering,live video communication,eye contact,deep learning,
出版年 :	2019
學位:	博士
摘要:	伴隨著網路頻寬、硬體設備以及演算法的發展，多媒體應用程式逐漸地參與人們的日常活動。以往由開發者決定的系統設計以不符合現今消費者對於多媒體應用服務之需求。以使用者為中心設計，使系統能提供更好的服務已然成為現今系統設計的主要目標。在眾多的設計面向中，令使用者能沉浸在多媒體系統服務內，使其有身於現實的感受，被視為能有效地提升使用者體驗（user experience, UX）的手段。因此，如何讓使用者在使用服務時產生沉浸感已然成為重要的研究主題。本論文的研究重點在利用視線方向令消費者產生沉浸感以提升多媒體應用服務之使用者體驗的技術，尤其是改變影像之技術。除此之外，對於多媒體系統的使用者而言，使用多媒體系統通常是為了滿足日常娛樂或是人際溝通的需求。以此為依據，我們著重在兩個殺手級的應用服務：虛擬實境（virtual reality）與即時視訊溝通系統（live video communication）。而模糊現實與虛擬世界的邊界，令使用者沉浸其中，也正是這兩種應用服務的主要目的。在虛擬實境的研究中，我們探討一嶄新的技術：注視點重點成像技術。此技術依據人類視覺中央清晰且外側模糊的特性，重新分配運算資源給較重要的影像區域。因此能在運算資源不變的環境下，提升使用者感受到之影像畫質。由於此技術正處於初期發展階段，沒有一套有系統的方法來量測注視點重點成像技術的體驗特質（Quality of Experience, QoE）。因此，我們利用四種主流的主觀評估方法評估注視點重點成像影像，並提出一種統一的衡量標準：察覺率（perceptual ratio），用以評估影像的品質。我們更利用察覺率測量主觀評估方法之效率（efficiency）與穩定性（consistency）為日後開發注視點重點成像技術時能依據實驗需求選擇最適切之評估方法，進行技術的評估與改進。在即時視訊溝通系統的研究中，我們嘗試解決系統仍無法建立視線交會的問題。視線交會在溝通上是主要傳遞訊息的手段之一，其能透露交談者對於當前交談的專注程度與信心，藉此以提升溝通的質量。然而，現今的視訊溝通系統受限於無法建立視線交會，使得線上交談與面對面交談之間仍然有一定程度的差距。我們提出一套基於深度學習模型校正使用者視線之系統。此系統能夠即時與動態地依據使用者與攝影機之間的相對位置進行影像後製，以校正使用者之視線，在線上溝通中建立視線交會。我們利用主觀與客觀評估方法檢驗模型與系統之能力，並開源系統原始碼使其能夠幫助未來即時視訊溝通系統之發展。 Following the increase of the network bandwidth, the improvement of hardware devices, and the development of algorithms, multimedia applications gradually participate in our daily lives. Traditional system design determined by developers becomes insufficient to meet consumers’ demands. Improving the user experience (UX) of applications according to users’ opinions has become important. In numerous design considerations of multimedia systems, making consumers immersed in the services is a convincing means to improve the UX of systems. Hence, immersing users in services becomes an important research topic. In this dissertation, we explore the technologies that use gaze direction to make users immersed to improve the UX, especially on the technologies that alter the image. Two killer applications, virtual reality (VR) and live video communication, are selected as our research target because the uses of multimedia applications are directly related to entertainment and communication from the viewpoint of consumers. Besides, making users immersed is the goal of both applications. In the study of VR, we study a new technology－foveated rendering. Foveated rendering technologies leverage the human visual system to increase the perceived video quality under limited computing resources. However, no general and systematic framework subjectively evaluates foveated images. To deal with the problem, we measure the quality of foveated rendering images by four common subjective assessments and propose a unified quality of experience (QoE) metric, perceptual ratio, to evaluate the image quality. Furthermore, we measure the consistency and the efficiency of the subjective assessment methods according to the perceptual ratio. Our research results can be a foundation for the future development of foveated rendering technologies. In the study of live video communication, we study a well-known but challenging problem, missing eye contact in live video commination. Eye contact is an important nonverbal language to convey the attentiveness and confidence in communication. However, current systems cannot establish eye contact in video communication. Hence, a gap between video and face-to-face communications still exists. We solve the problem by proposing a deep learning-based gaze correction system to real-time postprocess the image to correct users’ gaze directions for establishing eye contact in the video communication. The correction is based on the positions of both interlocutors’ head and the camera. The effectiveness of the proposed gaze correction system is objectively and subjectively evaluated. Furthermore, the implemented system is open-source and expect to improve the future development of live video communication.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74817
DOI:	10.6342/NTU201904300
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	4.48 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。