量體視訊串流下之多視角轉碼最佳化

吳奕寶; Yi-Pao Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91088

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝宏昀	zh_TW
dc.contributor.advisor	Hung-Yun Hsieh	en
dc.contributor.author	吳奕寶	zh_TW
dc.contributor.author	Yi-Pao Wu	en
dc.date.accessioned	2023-10-24T17:03:27Z	-
dc.date.available	2024-12-31	-
dc.date.copyright	2023-10-24	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-09	-
dc.identifier.citation	[1] B. Sullivan and A. Kaszynski, “PyVista: 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK),” Journal of Open Source Software, vol. 4, no. 37, p. 1450, May 2019. Online Available at: https://doi.org/10.21105/joss.01450 [2] P. Sturm, “Pinhole camera model,” 2014. [3] Y. Liu, B. Han, F. Qian, A. Narayanan, and Z.-L. Zhang, “Vues: practical mobile volumetric video streaming through multiview transcoding,” in Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 2022, pp. 514–527. [4] S. Mystakidis, “Metaverse,” Encyclopedia, vol. 2, no. 1, pp. 486–497, 2022. [5] Z. Liu, Q. Li, X. Chen, C. Wu, S. Ishihara, J. Li, and Y. Ji, “Point cloud video streaming: Challenges and solutions,” IEEE Network, vol. 35, no. 5, pp. 202–209, 2021. [6] A. Yaqoob, T. Bi, and G.-M. Muntean, “A survey on adaptive 360 video streaming: Solutions, challenges and opportunities,” IEEE Communications Surveys & Tutorials, vol. 22, no. 4, pp. 2801–2838, 2020. [7] K. Lee, J. Yi, Y. Lee, S. Choi, and Y. M. Kim, “GROOT: a real-time streaming system of high-fidelity volumetric videos,” in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020, pp. 1–14. [8] S. Gul, D. Podborski, J. Son, G. S. Bhullar, T. Buchholz, T. Schierl, and C. Hellge, “Cloud rendering-based volumetric video streaming system for mixed reality services,” in Proceedings of the 11th ACM multimedia systems conference, 2020, pp. 357–360. [9] S. Gul, D. Podborski, T. Buchholz, T. Schierl, and C. Hellge, “Low-latency cloud-based volumetric video streaming using head motion prediction,” in Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, 2020, pp. 27–33. [10] “Google draco,” https://google.github.io/draco/. [11] W. R. Mark, L. McMillan, and G. Bishop, “Post-rendering 3d warping,” in Proceedings of the 1997 symposium on Interactive 3D graphics, 1997, pp. 7–ff. [12] C. Fehn, “Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3d-tv,” in Stereoscopic displays and virtual reality systems XI, vol. 5291. SPIE, 2004, pp. 93–104. [13] W.-Y. Chen, Y.-L. Chang, S.-F. Lin, L.-F. Ding, and L.-G. Chen, “Efficient depth image based rendering with edge dependent depth filter and interpolation,” in 2005 IEEE International Conference on Multimedia and Expo. IEEE, 2005, pp. 1314–1317. [14] P. Ndjiki-Nya, M. Koppel, D. Doshkov, H. Lakshman, P. Merkle, K. Muller, and T. Wiegand, “Depth image-based rendering with advanced texture synthesis for 3-d video,” IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 453–465, 2011. [15] M. Tanimoto, M. P. Tehrani, T. Fujii, and T. Yendo, “Free-viewpoint TV,” IEEE signal processing magazine, vol. 28, no. 1, pp. 67–76, 2010. [16] A. Smolic, “3d video and free viewpoint video—from capture to display,” Pattern recognition, vol. 44, no. 9, pp. 1958–1968, 2011. [17] Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “View generation with 3D warping using depth information for ftv,” Signal Processing: Image Communication, vol. 24, no. 1-2, pp. 65–72, 2009. [18] I. Daribo and H. Saito, “A novel inpainting-based layered depth video for 3DTV,” IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 533–541, 2011. [19] H.-Y. Huang and S.-Y. Huang, “Fast hole filling for view synthesis in free viewpoint video,” Electronics, vol. 9, no. 6, p. 906, 2020. [20] S. Fachada, D. Bonatto, M. Teratani, and G. Lafruit, “View synthesis tool for VR immersive video,” 2022. [21] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003. [22] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–14, 2017. [23] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5505–5514. [24] S. Li, K. Wang, Y. Gao, X. Cai, and M. Ye, “Geometric warping error aware CNN for DIBR oriented view synthesis,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1512–1521. [25] M. Tatarchenko, A. Dosovitskiy, and T. Brox, “Multi-view 3D models from single images with a convolutional network,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11– 14, 2016, Proceedings, Part VII 14. Springer, 2016, pp. 322–337. [26] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, “View synthesis by appearance flow,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 286–301. [27] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021. [28] Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser, “Ibrnet: Learning multiview image-based rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4690–4699. [29] F. Qian, L. Ji, B. Han, and V. Gopalakrishnan, “Optimizing 360 video delivery over cellular networks,” in Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, 2016, pp. 1–6. [30] Y. Bao, H. Wu, T. Zhang, A. A. Ramli, and X. Liu, “Shooting a moving target: Motion-prediction-based transmission for 360-degree videos,” in 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016, pp. 1161–1170. [31] X. Liu, Q. Xiao, V. Gopalakrishnan, B. Han, F. Qian, and M. Varvello, “360 innovations for panoramic video streaming,” in Proceedings of the 16th ACM Workshop on Hot Topics in Networks, 2017, pp. 50–56. [32] F. Qian, B. Han, Q. Xiao, and V. Gopalakrishnan, “Flare: Practical viewportadaptive 360-degree video streaming for mobile devices,” in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 2018, pp. 99–114. [33] S. Sengupta, N. Ganguly, S. Chakraborty, and P. De, “Hotdash: Hotspot aware adaptive video streaming using deep reinforcement learning,” in 2018 IEEE 26th International Conference on Network Protocols (ICNP). IEEE, 2018, pp. 165–175. [34] M. Hosseini and C. Timmerer, “Dynamic adaptive point cloud streaming,” in Proceedings of the 23rd Packet Video Workshop, 2018, pp. 25–30. [35] J. Van Der Hooft, T.Wauters, F. De Turck, C. Timmerer, and H. Hellwagner, “Towards 6DoF http adaptive streaming through point cloud compression,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 2405–2413. [36] B. Han, Y. Liu, and F. Qian, “Vivo: Visibility-aware mobile volumetric video streaming,” in Proceedings of the 26th annual international conference on mobile computing and networking, 2020, pp. 1–13. [37] S. G¨ul, S. Bosse, D. Podborski, T. Schierl, and C. Hellge, “Kalman filter-based head motion prediction for cloud-based mixed reality,” in Proceedings of the 28th ACM International Conference on Multimedia, ser. MM ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 3632–3641. Online Available at: https://doi.org/10.1145/3394171.3413699 [38] L. Xie, Z. Xu, Y. Ban, X. Zhang, and Z. Guo, “360probdash: Improving QoE of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 315–323. [39] Z. Xu, X. Zhang, K. Zhang, and Z. Guo, “Probabilistic viewport adaptive streaming for 360-degree videos,” in 2018 IEEE international symposium on circuits and systems (ISCAS). IEEE, 2018, pp. 1–5. [40] J. Zou, C. Li, C. Liu, Q. Yang, H. Xiong, and E. Steinbach, “Probabilistic tile visibility-based server-side rate adaptation for adaptive 360-degree video streaming,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 1, pp. 161–176, 2019. [41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. [42] N. Somraj, “Pose-warping for view synthesis / DIBR,” 2020. Online Available at: https://github.com/NagabhushanSN95/Pose-Warping [43] M. Krivokuca, P. A. Chou, and P. Savill, “8i voxelized surface light field (8ivslf) dataset,” ISO/IEC JTC1/SC29/WG11 MPEG, input document m42914, 2018. [44] S. Graf and H. Luschgy, Foundations of quantization for probability distributions. Springer, 2007. [45] G. Pages and J. Printems, “Optimal quadratic quantization for numerics: the gaussian case,” 2003. [46] P. Zador, “Asymptotic quantization error of continuous signals and the quantization dimension,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 139–149, 1982. [47] H. Luschgy and G. Pages, “Greedy vector quantization,” Journal of Approximation Theory, vol. 198, pp. 111–131, 2015. [48] J. Max, “Quantizing for minimum distortion,” IRE Trans. Inf. Theory, vol. 6, pp. 7–12, 1960. [49] S. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982. [50] T. Kohonen, Self-organization and associative memory. Springer Science & Business Media, 2012, vol. 8. [51] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990. [52] S. Ghadimi and G. Lan, “Stochastic first-and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341–2368, 2013.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91088	-
dc.description.abstract	實現元宇宙的主要研究議題之一是量體視訊的串流，其需要極高的頻寬消耗、極低的延遲要求以及顯著的解碼負擔。本研究探索利用邊緣渲染 (edge rendering) 的串流系統，系統中根據視野預測結果對量體視訊2D視角進行轉碼。然而，視野預測的不準確性可能會因為其在偏移視點上降產生畫面而降低轉碼影像的品質。在最先進的邊緣輔助量體視訊串流系統中，選擇生成多個轉碼視角的位置是根據均勻步長移動預測位置，這種方法沒有將視野預測模型不同的準確性納入考慮，可能會顯著降低預渲染視圖的品質。本研究將虛擬視角合成技術納入串流系統並建立分析轉碼畫面的品質模型，該模型代表畫面品質與位置偏移之間的關係。基於充分的模擬結果，將模擬結果得到的品質模型作為目標，本研究提出建立在最佳量化問題之上的最佳化框架，用於選擇生成多個轉碼視角的位置，以最佳化期望品質，考慮了實證模擬結果而非僅依賴歐氏距離。基於這個最佳化框架，本研究設計了一種結合無梯度最佳化方法與競爭性學習向量量化的演算法，該演算法考慮用戶位置的機率分佈和品質模擬結果，動態地決定最佳的視角轉碼位置。我們的模擬結果顯示，我們提出的演算法相比最先進的量體視訊串流系統方法，在影片串流過程中可以在55%至83% 的時間帶來畫面品質的提升。	zh_TW
dc.description.abstract	One of the major research topics to enable the metaverse is the streaming of volumetric video, which comes with ultra-high bandwidth consumption, ultra-low latency requirement, and significant decoding overhead. This work explores the utilization of a streaming system with edge rendering, where 2D views of volumetric video are transcoded at the edge server according to the viewport prediction result. However, inherent inaccuracy of viewport prediction may degrade the quality of transcoded frames that are rendered at a deviated viewpoint. In the state-of-the-art edge-assisted volumetric streaming system, positions to generate multiple transcoded views are selected by shifting predicted position with uniform step size, in which a multiview generation approach without considering varying viewport prediction accuracy could significantly degrade the quality of pre-rendered views. This work incorporates virtual view synthesis techniques into the streaming system and establishes a quality model representing the relation between quality and position deviation based on thorough simulation results. With the quality model as a target, an optimization framework built upon the optimal quantization problem is formulated to select the positions for generating multiple transcoded views that optimize expected quality, taking into account empirical simulation results instead of relying solely on Euclidean distance. We propose an algorithm integrating concepts of gradient-free optimization with competitive learning vector quantization process to achieve maximal expected quality. The algorithm judiciously determines the best positions to transcode the views in consideration of the probability distribution of user's position and empirical simulation results. Our evaluations indicate that our proposed algorithms outperform baseline methods proposed by state-of-the-art transcoded volumetric streaming system with an improvement ratio ranging from 55% to 83% on a segment-by-segment basis.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-24T17:03:27Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-10-24T17:03:27Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BACKGROUND AND RELATED WORK . . . . . 6 2.1 Virtual View Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Depth Image . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Depth Image Based Rendering . . . . . . . . . . . . . . . . 6 2.1.3 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . 7 2.1.4 3D Image Warping . . . . . . . . . . . . . . . . . . . . . . 8 2.1.5 Deep Learning-Based Method . . . . . . . . . . . . . . . . 9 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 360-Degree Video Streaming . . . . . . . . . . . . . . . . . 10 2.2.2 Direct Volumetric Video Streaming . . . . . . . . . . . . . 10 2.2.3 Transcoded Volumetric Video Streaming . . . . . . . . . . 11 2.2.4 Multiview Generation Method in Transcoded Volumetric Streaming System . . . . . . . . . . . . . . . . . . . . . . . 12 CHAPTER 3 SYSTEM MODEL . . . . . . . . . . . . . . . . . . . . 14 3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Viewport Prediction Model . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Probability Distribution of User’s Position . . . . . . . . . 17 3.3 Multiview Generation . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 View Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5 Virtual View Synthesis Model . . . . . . . . . . . . . . . . . . . . 20 CHAPTER 4 OBSERVATION ON REALISTIC SIMULATION FOR TRANSCODING . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Quality Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Simulation Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Generation of SSIM Map for Neighbor Views . . . . . . . . 22 4.2.2 Generation of SSIM Map for Synthesized Views . . . . . . 23 4.3 Observation on SSIM Maps . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 Observation on SSIM Maps for Neighbor Views . . . . . . . 26 4.3.2 Observation on SSIM Maps for Synthesized Views . . . . . 28 4.3.3 Observation on Different Center Views . . . . . . . . . . . 29 4.3.4 Observation on Different Distance to the Object . . . . . . 29 4.4 Insight on SSIM Maps for Optimization . . . . . . . . . . . . . . . 29 CHAPTER 5 PROBLEM FORMULATION . . . . . . . . . . . . . 34 5.1 Optimal Quantization Problem . . . . . . . . . . . . . . . . . . . . 34 5.2 Formulation of Multiview Generation Problem . . . . . . . . . . . 36 5.2.1 1-Dimensional Quantization with Uniform Interval . . . . . 37 5.2.2 Independent Optimal 1-Dimensional Quantization . . . . . 38 5.2.3 2-Dimensional Quantization . . . . . . . . . . . . . . . . . 38 CHAPTER 6 OPTIMAL MULTIVIEW GENERATION ALGORITHMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1 1D Quantization Method . . . . . . . . . . . . . . . . . . . . . . . 40 6.1.1 1D Quantization with Uniform Interval . . . . . . . . . . . 41 6.1.2 Independent 1D Optimal Quantization Method . . . . . . . 42 6.2 2D Quantization Method . . . . . . . . . . . . . . . . . . . . . . . 44 6.2.1 Competitive Learning Vector Quantization (CLVQ) . . . . 44 6.2.2 Gradient-Free Competitive Learning Vector Quantization . 46 6.2.3 Cross-frame Optimization . . . . . . . . . . . . . . . . . . . 48 CHAPTER 7 EVALUATION AND ANALYSIS . . . . . . . . . . . 52 7.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2 Performance of Incorporating Virtual View Synthesis . . . . . . . . 53 7.3 Performance of Multiview Generation Algorithms . . . . . . . . . . 53 7.3.1 Baseline Algorithm for Comparison . . . . . . . . . . . . . 54 7.3.2 Performance of Proposed Methods . . . . . . . . . . . . . . 54 7.4 Analysis of System Performance . . . . . . . . . . . . . . . . . . . 59 7.4.1 Observation on Step Size in 1D Quantization . . . . . . . . 60 7.4.2 Comparison between Different Probability Distributions . . 64 7.4.3 Observation on Algorithm Design . . . . . . . . . . . . . . 65 CHAPTER 8 CONCLUSION AND FUTURE WORK . . . . . . 69 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.2.1 Optimality Analysis . . . . . . . . . . . . . . . . . . . . . . 69 8.2.2 Practical Implementation and Experiment . . . . . . . . . . 70 8.2.3 Multi-user scenario . . . . . . . . . . . . . . . . . . . . . . 70 APPENDIX A — PROOF OF EQUATION (6.2) . . . . . . . . . . 71 APPENDIX B — COMPUTATION OF EQUATION (6.3) . . . 72 APPENDIX C — COMPUTATION OF EQUATION (6.4) . . . 75 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76	-
dc.language.iso	en	-
dc.title	量體視訊串流下之多視角轉碼最佳化	zh_TW
dc.title	Optimization of Multiview Generation for Volumetric Video Streaming with Edge Transcoding	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	廖婉君;高榮鴻	zh_TW
dc.contributor.oralexamcommittee	Wanjiun Liao;Rung-Hung Gau	en
dc.subject.keyword	虛擬實境,量體視訊,影像串流,邊緣渲染,競爭式學習向量量化,	zh_TW
dc.subject.keyword	virtual reality,volumetric video,video streaming,edge rendering,competitive learning vector quantization,	en
dc.relation.page	79	-
dc.identifier.doi	10.6342/NTU202303932	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-11	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	2026-01-01	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 此日期後於網路公開 2026-01-01	11.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。