基於深度學習之注視點預測用於360°全景影片注視點串流

楊凱翔; Kai-Siang Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83109

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸	zh_TW
dc.contributor.advisor	Shao-Yi Chien	en
dc.contributor.author	楊凱翔	zh_TW
dc.contributor.author	Kai-Siang Yang	en
dc.date.accessioned	2023-01-08T17:07:56Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-01-06	-
dc.date.issued	2022	-
dc.date.submitted	2022-11-16	-
dc.identifier.citation	“Schematic diagram of the human eye.” [Online]. Available: https://en.wikipedia.org/wiki/Fovea centralis J. Ryoo, K. Yun, D. Samaras, S. R. Das, and G. Zelinsky, “Design and evaluation of a foveated video streaming service for commodity client devices,” in Proceedings of the 7th International Conference on Multimedia Systems, 2016, pp. 1–11. “Vergence-accommodation conflict (vac).” [Online]. Available: https://en.wikipedia.org/wiki/Vergence-accommodation conflict M. F. Romero-Rond´on, L. Sassatelli, F. Precioso, and R. Aparicio-Pardo, “Foveated streaming of virtual reality videos,” in Proceedings of the 9th ACM Multimedia Systems Conference, 2018, pp. 494–497. Y. Xu, Y. Dong, J. Wu, Z. Sun, Z. Shi, J. Yu, and S. Gao, “Gaze prediction in dynamic 360 immersive videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5333–5342. H.-T. Cheng, C.-H. Chao, J.-D. Dong, H.-K. Wen, T.-L. Liu, and M. Sun, “Cube padding for weakly-supervised saliency prediction in 360 videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1420–1429. A. Aguirre, “This is why your internet slows down at night,” 2022. [Online]. Available: https://www.highspeedinternet.com/resources/why-does-my-internet-slow-down-at-night C.-Y. Huang, C.-H. Hsu, Y.-C. Chang, and K.-T. Chen, “Gaminganywhere: An open cloud gaming system,” in Proceedings of the 4th ACM Multimedia Systems Conference, 2013, pp. 36–47. D. Li, R. Du, A. Babu, C. D. Brumar, and A. Varshney, “A log-rectilinear transformation for foveated 360-degree video streaming,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 5, pp. 2638–2647, 2021. A. Patney, M. Salvi, J. Kim, A. Kaplanyan, C. Wyman, N. Benty, D. Luebke, and A. Lefohn, “Towards foveated rendering for gaze-tracked virtual reality,” ACM Transactions on Graphics (TOG), vol. 35, no. 6, pp. 1–12, 2016. S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” in Proceedings of 2017 International Conference on Engineering and Technology (ICET). IEEE, 2017, pp. 1–6. W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint arXiv:1409.2329, 2014. Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural networks: Lstm cells and network architectures,” Neural Computation, vol. 31, no. 7, pp. 1235–1270, 2019. “Gazepointer.” [Online]. Available: https://sourceforge.net/projects/gazepointer/ W.-T. Lee, H.-I. Chen, M.-S. Chen, I.-C. Shen, and B.-Y. Chen, “Highresolution 360 video foveated stitching for real-time vr,” in Computer Graphics Forum, vol. 36, no. 7. Wiley Online Library, 2017, pp. 115–123. G. Illahi, M. Siekkinen, and E. Masala, “Foveated video streaming for cloud gaming,” in Proceedings of 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2017, pp. 1–6. G. K. Illahi, M. Siekkinen, T. K¨am¨ar¨ainen, and A. Yl¨a-J¨a¨aski, “Foveated streaming of real-time graphics,” in Proceedings of the 12th ACM Multimedia Systems Conference, 2021, pp. 214–226. O. Wiedemann, V. Hosu, H. Lin, and D. Saupe, “Foveated video coding for real-time streaming applications,” in Proceedings of 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2020, pp. 1–6. Z. Hu, S. Li, C. Zhang, K. Yi, G.Wang, and D. Manocha, “Dgaze: Cnn-based gaze prediction in dynamic scenes,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 5, pp. 1902–1911, 2020. A. Borji, “Saliency prediction in the deep learning era: Successes and limitations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 679–700, 2019. S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 10, pp. 1915–1926, 2011. Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What do different evaluation metrics tell us about saliency models?” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 3, pp. 740–757, 2018. H. Hadizadeh and I. V. Baji´c, “Saliency-aware video compression,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 19–33, 2013. K. Koehler, F. Guo, S. Zhang, and M. P. Eckstein, “What do saliency models predict?” Journal of Vision, vol. 14, no. 3, pp. 14–14, 2014. C. Koch and S. Ullman, “Shifts in selective visual attention: towards the underlying neural circuitry,” in Matters of Intelligence. Springer, 1987, pp. 115–141. R. Liu, J. Cao, Z. Lin, and S. Shan, “Adaptive partial differential equation learning for visual saliency detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3866–3873. A. Gabadinho, G. Ritschard, N. S. Mueller, and M. Studer, “Analyzing and visualizing state sequences in r with traminer,” Journal of Statistical Software, vol. 40, no. 4, pp. 1–37, 2011. S. S. Hacisalihzade, L. W. Stark, and J. S. Allen, “Visual perception and sequences of eye movement fixations: A stochastic modeling approach,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, no. 3, pp. 474–481, 1992. F. Guo, J. Shen, and X. Li, “Learning to detect stereo saliency,” in Proceedings of 2014 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2014, pp. 1–6. S. S. Kruthiventi, V. Gudisa, J. H. Dholakiya, and R. V. Babu, “Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5781–5790. S. S. Kruthiventi, K. Ayush, and R. V. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4446–4456, 2017. N. Liu, J. Han, D. Zhang, S. Wen, and T. Liu, “Predicting eye fixations using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 362–370. J. Pan, E. Sayrol, X. Giro-i Nieto, K. McGuinness, and N. E. O’Connor, “Shallow and deep convolutional networks for saliency prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 598–606. X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 262–270. C. H. Vo, J.-C. Chiang, D. H. Le, T. T. Nguyen, and T. V. Pham, “Saliency prediction for 360-degree video,” in Proceedings of 2020 5th International Conference on Green Technology and Sustainable Development (GTSD). IEEE, 2020, pp. 442–448. E. Bernal-Berdun, D. Mart´ın, D. Guti´errez, and B. Masi´a, “Sst-sal: A spherical spatio-temporal approach for saliency prediction in 360 videos,” Computers & Graphics, vol. 106, pp. 200–209, 2022. Z. Zhang, Y. Xu, J. Yu, and S. Gao, “Saliency detection in 360 videos,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 488–503. Y.-C. Su and K. Grauman, “Learning spherical convolution for fast features from 360 imagery,” Advances in Neural Information Processing Systems, vol. 30, 2017. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proceedings of International Conference on Machine Learning. PMLR, 2019, pp. 6105–6114. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83109	-
dc.description.abstract	由於近年來虛擬實境（VR）的發展，VR頭戴式顯示器（HMD）配備越來越高解析度的顯示螢幕，以達到更好的沉浸式體驗。在VR HMD中觀看影片時，使用者同一時間只可以看到一個很小的區域，稱為視埠。這種現象會造成在視埠外的區域串流高解析度的影像時，因為使用者無法看到視埠外的區域，而浪費傳送高解析度畫面所使用的頻寬。這個問題的解決方案之一就是注視點串流。注視點串流以高解析度傳輸我們所注視的區域、以低解析度傳輸剩餘的全景圖。注視點串流不僅可以節省網路頻寬，還可以模擬人眼的生理機能以改善視覺輻輳調節衝突。在本論文中，我們提出了第一個具有未來注視點預測的注視點串流系統。我們的注視點預測模型使用深度學習來提前獲得未來注視點的位置，以加速端到端延遲並提高串流的影片刷新率。此外，我們的注視預測模型不會影響用戶的觀看體驗。我們還提出了分層注視點渲染，它在我們的注視點渲染中結合了Unity的物件。分層注視點渲染是將裁剪後的高解析度區域渲染到Unity中的平面物件，而不是調整全景區域的大小並進行圖像拼接。該技術可以降低計算成本，減少GPU的使用量，在沒有視訊壓縮協定的前提下，我們提出的注視點串流系統可以節省至少90%的頻寬。與其他的注視點串流系統相比，我們實現更低的端到端延遲，更高的串流影片刷新率，並節省更高的頻寬。	zh_TW
dc.description.abstract	Owing to the development of virtual reality (VR), VR head-mounted displays (HMDs) are equipped with higher and higher resolution display monitors to achieve a better immersive experience. While viewing videos in VR HMDs, humans can only see a tiny region, as known as viewport. This phenomenon leads to the waste of Internet bandwidth when streaming the high-resolution region, which is outside the viewport, and users cannot see when streaming 360° panoramic videos in VR HMDs. One of the solutions to deal with this problem is foveated streaming. Foveated streaming is proposed to stream the high-resolution region where we are looking and stream the remaining panoramic region with low resolution. Foveated streaming can not only save Internet bandwidth but also simulate the physiological structure of the human eyes to avoid vergence-accommodation conflict (VAC). In this thesis, we propose the first foveated streaming system with a gaze prediction model. Our gaze prediction model integrates deep learning into predicting the future gaze information in advance to accelerate the end-to-end latency and increase the operating frame rate. Also, our gaze prediction model does not affect the user's viewing experience. Additionally, We also propose layered foveated rendering, which combines Unity objects and foveated rendering in our foveated streaming system. Layered foveated rendering is to render the cropped high-resolution on the plane object in Unity rather than resizing the panoramic region and performing image stitching. This technique can reduce the computational cost, decrease the rendering burden of GPU, and save at least 90% bandwidth if no video compression protocol is implemented. Compared to other foveated streaming systems, ours achieves lower end-to-end latency, higher operating frame rate, and saves higher bandwidth.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-01-08T17:07:56Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-01-08T17:07:56Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Abstract i List of Figures v List of Tables vii 1 Introduction 1 1.1 Foveated Streaming 1 1.2 Gaze Prediction 5 1.3 Contribution 6 1.4 Thesis Organization 7 2 Related Work 9 2.1 Foveated Streaming 9 2.1.1 2D rectangular videos 9 2.1.2 360° panoramic videos 10 2.2 Gaze Prediction 11 2.2.1 Saliency Detection 12 3 Methodology 15 3.1 Overview of 360° Foveated Streaming system 15 3.2 Interactive Unity Interface Design 17 3.3 Equirectangular to Perspective Image Projection 18 3.3.1 Create mesh at the origin in 3D space 19 3.3.2 Rotate all points from the origin to the front of our gaze 19 3.3.3 Convert 3D points to 2D vectors on image coordinate system 20 3.3.4 Mapping backward to obtain expected perspective image from the original equirectangular image. 21 3.4 Overview of Gaze Prediction 22 3.5 Notations 22 3.6 Gaze Prediction Network Architecture 24 3.6.1 Saliency and Motion Feature Encoder 24 3.6.2 Gaze Tracking Encoder 24 3.6.3 Gaze Displacement Prediction 25 4 Experiments 27 4.1 Dataset 27 4.2 Training and Implementation Details 28 4.2.1 Gaze Prediction 28 4.2.2 Foveated Streaming System 28 4.3 Performance Evaluation 29 4.4 Inference Speed Evaluation 31 4.5 Bandwidth Analysis 32 4.6 Ablation Study 34 4.7 User Experiments 35 5 Conclusion 41 Reference 43	-
dc.language.iso	en	-
dc.subject	虛擬實境	zh_TW
dc.subject	360°影片	zh_TW
dc.subject	全景影片	zh_TW
dc.subject	影片串流	zh_TW
dc.subject	注視點串流	zh_TW
dc.subject	注視點渲染	zh_TW
dc.subject	360° Videos	en
dc.subject	Foveated Streaming	en
dc.subject	Foveated Rendering	en
dc.subject	Video Streaming	en
dc.subject	Virtual Reality	en
dc.subject	Panoramic Videos	en
dc.title	基於深度學習之注視點預測用於360°全景影片注視點串流	zh_TW
dc.title	Foveated Streaming with Deep Learning Gaze Prediction on 360° Videos	en
dc.title.alternative	Foveated Streaming with Deep Learning Gaze Prediction on 360° Videos	-
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳炳宇;莊永裕;曹昱	zh_TW
dc.contributor.oralexamcommittee	Bing-Yu Chen;Yung-Yu Chuang;Yu Tsao	en
dc.subject.keyword	注視點渲染,注視點串流,影片串流,全景影片,360°影片,虛擬實境,	zh_TW
dc.subject.keyword	Foveated Rendering,Foveated Streaming,Video Streaming,Panoramic Videos,360° Videos,Virtual Reality,	en
dc.relation.page	48	-
dc.identifier.doi	10.6342/NTU202210004	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-11-17	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-0156221026503014.pdf	3.1 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。