請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79602完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 吳沛遠(Pei-Yuan Wu) | |
| dc.contributor.author | Peng-Wen Chen | en |
| dc.contributor.author | 陳芃彣 | zh_TW |
| dc.date.accessioned | 2022-11-23T09:04:56Z | - |
| dc.date.available | 2021-11-05 | |
| dc.date.available | 2022-11-23T09:04:56Z | - |
| dc.date.copyright | 2021-11-05 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-10-15 | |
| dc.identifier.citation | [1] M. Almquist, V. Almquist, V. Krishnamoorthi, N. Carlsson, and D. Eager. The Prefetch Aggressiveness Tradeoff in 360° Video Streaming, page 258–269. Association for Computing Machinery, New York, NY, USA, 2018. [2] Y. Bai and D. Wang. On the comparison of trilinear, cubic spline, and fuzzy interpolation methods in the highaccuracy measurements. IEEE Transactions on fuzzy Systems, 18(5):1016–1022, 2010. [3] A. Borji, H. R. Tavakoli, D. N. Sihite, and L. Itti. Analysis of scores, datasets, and models in visual saliency prediction. In Proceedings of the IEEE international conference on computer vision, pages 921–928, 2013. [4] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. What do different evaluation metrics tell us about saliency models? IEEE transactions on pattern analysis and machine intelligence, 41(3):740–757, 2018. [5] Q. Chang, S. Zhu, and L. Zhu. Temporalspatial feature pyramid for video saliency detection. arXiv preprint arXiv:2105.04213, 2021. [6] H.T. Cheng, C.H. Chao, J.D. Dong, H.K. Wen, T.L. Liu, and M. Sun. Cube padding for weaklysupervised saliency prediction in 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1420-1429, 2018. [7] B. Coors, A. P. Condurache, and A. Geiger. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 518–533, 2018. [8] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. Predicting human eye fixations via an lstmbased saliency attentive model. IEEE Transactions on Image Processing, 27(10):5142–5154, 2018. [9] E. J. David, J. Gutiérrez, A. Coutrot, M. P. Da Silva, and P. L. Callet. A dataset of head and eye movements for 360 videos. In Proceedings of the 9th ACM Multimedia Systems Conference, pages 432–437, 2018. [10] E. J. David, J. Gutiérrez, A. Coutrot, M. P. Da Silva, and P. L. Callet. A dataset of head and eye movements for 360 videos. In Proceedings of the 9th ACM Multimedia Systems Conference, pages 432–437, 2018. [11] R. Droste, J. Jiao, and J. A. Noble. Unified image and video saliency modeling. In European Conference on Computer Vision, pages 419–435. Springer, 2020. [12] J. Gutiérrez, E. J. David, A. Coutrot, M. P. Da Silva, and P. Le Callet. Introducing un salient360! benchmark: A platform for evaluating visual attention models for 360 contents. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–3. IEEE, 2018. [13] H.N. Hu, Y.C. Lin, M.Y. Liu, H.T. Cheng, Y.J. Chang, and M. Sun. Deep 360 pilot: Learning a deep agent for piloting through 360 sports videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1396– 1405. IEEE, 2017. [14] S. Jain, P. Yarlagadda, S. Jyoti, S. Karthik, R. Subramanian, and V. Gandhi. Vinet: Pushing the limits of visual modality for audiovisual saliency prediction. arXiv preprint arXiv:2012.06170, 2020. [15] S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012. [16] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017. [17] H. Lee, T. H. Jung, M. C. tom Dieck, and N. Chung. Experiencing immersive virtual reality in museums. Information management, 57(5):103229, 2020. [18] K. Min and J. J. Corso. Tasednet: Temporallyaggregating spatial encoderdecoder network for video saliency detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2394–2403, 2019. [19] U. Neumann, T. Pintaric, and A. Rizzo. Immersive panoramic video. In Proceedings of the eighth ACM international conference on Multimedia, pages 493–494, 2000. [20] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017. [21] S. Peleg and M. BenEzra. Stereo panorama with a single camera. In Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), volume 1, pages 395–401. IEEE, 1999. [22] R. J. Peters, A. Iyer, L. Itti, and C. Koch. Components of bottomup gaze allocation in natural images. Vision research, 45(18):2397–2416, 2005. [23] J. Radianti, T. A. Majchrzak, J. Fromm, and I. Wohlgenannt. A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda. Computers Education, 147:103778, 2020. [24] F. Škola, S. Rizvić, M. Cozza, L. Barbieri, F. Bruno, D. Skarlatos, and F. Liarokapis. Virtual reality with 360video storytelling in cultural heritage: Study of presence, engagement, and immersion. Sensors, 20(20):5851, 2020. [25] W. Wang, J. Shen, F. Guo, M.M. Cheng, and A. Borji. Revisiting video saliency: A largescale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4894–4903, 2018. [26] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989. [27] X. Wu, Z. Wu, J. Zhang, L. Ju, and S. Wang. Salsac: A video saliency prediction model with shuffled attentions and correlationbased convlstm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12410–12417, 2020. [28] S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy. Rethinking spatiotemporal feature learning: Speedaccuracy tradeoffs in video classification. In Proceedings of the European conference on computer vision (ECCV), pages 305–321, 2018. [29] S. Xingjian, Z. Chen, H. Wang, D.Y. Yeung, W.K. Wong, and W.c. Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, pages 802–810, 2015. [30] M. Xu, Y. Song, J. Wang, M. Qiao, L. Huo, and Z. Wang. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE transactions on pattern analysis and machine intelligence, 41(11):2693–2708, 2018. [31] Y. Zhang, F. Dai, Y. Ma, H. Li, Q. Zhao, and Y. Zhang. Saliency prediction network for 360◦ videos. IEEE Journal of Selected Topics in Signal Processing, 14(1):27–37, 2019. [32] Z. Zhang, Y. Xu, J. Yu, and S. Gao. Saliency detection in 360 videos. In Proceedings of the European conference on computer vision (ECCV), pages 488–503, 2018. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79602 | - |
| dc.description.abstract | 全景影片已經被廣泛應用於沈浸式內容、虛擬導覽和監控系統等許多領域,相較於平面影片,全景影片涵蓋了更多的資訊,要在資訊爆炸的全景影像中預測出顯著性區域更為困難。本文中,我們提出了一個視覺顯著性預測模型,它可以直接預測等距長方投影影片中的顯著性區域。過去的方法採用循環神經網路的架構作為視覺顯著性預測模型,不同於過去的方法,我們使用三維卷積於編碼器並泛化SphereNet卷積核以構建解碼器。我們進一步分析存在於不同全景影片資料集以及不同類型全景影片中視覺偏差的資料統計性,這為我們提供了對融合機制設計的見解,該融合機制以自適應方式將預測的顯著圖與視覺偏差相融合。我們提出的模型在各個資料集(例如:Salient360!,PVS,Sport360)都有最佳的結果。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-23T09:04:56Z (GMT). No. of bitstreams: 1 U0001-1209202114564600.pdf: 6314627 bytes, checksum: ca8dce9a0de19b785a2d460cbad397ec (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee (i) Acknowledgements (iii) 摘要 (v) Abstract (vii) Contents (ix) List of Figures (xi) List of Tables (xiii) Chapter 1 Introduction (1) Chapter 2 Related Work (5) 2.1 Visual Saliency on Planar Videos (5) 2.2 Visual Saliency on 360◦ Videos (6) Chapter 3 Method (9) 3.1 Network Structure (9) 3.1.1 SpatialTemporal Encoder (9) 3.1.2 360 Kernel Decoder (10) 3.2 Initial Frame Center Bias (11) 3.3 Potential Center Bias (13) 3.3.1 Center Bias Analysis (13) 3.3.2 Learned Center Bias Fusing (16) Chapter 4 Experiment 19 4.1 Dataset (19) 4.2 Implementation Detail (20) 4.2.1 Loss Function (20) 4.2.2 Training and Testing (21) 4.2.3 Evaluation Metric (22) 4.3 Experimental Result (23) 4.3.1 Ablation Study (23) 4.3.2 Comparison Result (24) Chapter 5 Conclusion 27 References 29 Appendix A — Supplementary Material 35 A.1 Testing result on Salient360! (35) A.2 Reproducing Experiment (35) A.2.1 SPN model architecture implementation (36) A.2.2 Training Detail (38) A.2.2.1 Input Data (38) A.2.2.2 Loss function (39) A.2.2.3 Training (39) A.2.3 Testing Result (40) A.2.4 Training and Inference Time (40) | |
| dc.language.iso | zh-TW | |
| dc.title | 全景影片視覺顯著性預測與視覺偏差 | zh_TW |
| dc.title | Viewing Bias Matters in 360◦ Videos Visual Saliency Prediction | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳銘憲(Hsin-Tsai Liu),王鈺強(Chih-Yang Tseng),林昌鴻 | |
| dc.subject.keyword | 視覺顯著性預測,深度學習,全景影片, | zh_TW |
| dc.subject.keyword | Visual Saliency Detection,deep learning,panorama videos, | en |
| dc.relation.page | 40 | |
| dc.identifier.doi | 10.6342/NTU202103129 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2021-10-15 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1209202114564600.pdf | 6.17 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
