360MVSNet:基於360°影像之多視角立體視覺深度模型

Ching-Ya Chiu; 邱靖雅

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80664

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕(Yung-Yu Chuang)
dc.contributor.author	Ching-Ya Chiu	en
dc.contributor.author	邱靖雅	zh_TW
dc.date.accessioned	2022-11-24T03:12:13Z	-
dc.date.available	2022-01-01
dc.date.available	2022-11-24T03:12:13Z	-
dc.date.copyright	2021-11-04
dc.date.issued	2021
dc.date.submitted	2021-10-21
dc.identifier.citation	H. Aanæs, R. R. Jensen, G. Vogiatzis, E. Tola, and A. B. Dahl. Largescale data for multipleview stereopsis. International Journal of Computer Vision, pages 1–16, 2016. I. Armeni, S. Sax, A. R. Zamir, and S. Savarese. Joint 2d3dsemantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017. N. D. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla. Using multiple hypotheses to improve depthmaps for multiview stereo. In European Conference on Computer Vision, pages 766–779. Springer, 2008. D. Cernea. OpenMVS: Multiview stereo reconstruction library. 2020. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgbd data in indoor environments. International Conference on 3D Vision (3DV), 2017. R. Chen, S. Han, J. Xu, and H. Su. Pointbased multiview stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1538–1547, 2019. S. Cheng, Z. Xu, S. Zhu, Z. Li, L. E. Li, R. Ramamoorthi, and H. Su. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524–2534, 2020. T. S. Cohen, M. Geiger, J. Köhler, and M. Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018. B. O. Community. Blender a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018. J. S. De Bonet and P. Viola. Poxels: Probabilistic voxelized volume reconstruction. In Proceedings of International Conference on Computer Vision (ICCV), pages 418–425, 1999. C. Esteves, C. AllenBlanchette, A. Makadia, and K. Daniilidis. 3d object classification and retrieval with spherical cnns. CoRR, abs/1711.06721, 2017. Y. Furukawa and C. Hernández. Multiview stereo: A tutorial. Found. Trends. Comput. Graph. Vis., 9(1–2):1–148, June 2015. Y. Furukawa and J. Ponce. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence, 32(8):1362–1376, 2009. S. Galliani, K. Lasinger, and K. Schindler. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, pages 873–881, 2015. X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan. Cascade cost volume for highresolution multiview stereo and stereo matching. 2019. M. Ji, J. Gall, H. Zheng, Y. Liu, and L. Fang. Surfacenet: An endtoend 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2307–2315, 2017. S. B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense multiview stereo. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I. IEEE, 2001. A. Kar, C. Häne, and J. Malik. Learning a multiview stereo machine. 2017. A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry. Endtoend learning of geometry and context for deep stereo regression, 2017. V. Kolmogorov and R. Zabih. Multicamera scene reconstruction via graph cuts. In European conference on computer vision, pages 82–96. Springer, 2002. K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving. International journal of computer vision, 38(3):199–218, 2000. M. Lhuillier and L. Quan. A quasidense approach to surface reconstruction from uncalibrated images. IEEE transactions on pattern analysis and machine intelligence, 27(3):418–433, 2005. S. Li. Binocular spherical stereo. IEEE Transactions on Intelligent Transportation Systems, 9(4):589–600, 2008. P. Moulon. Image datasets. https://github.com/openMVG/Image_datasets, 2019. P. Moulon, P. Monasse, R. Perrot, and R. Marlet. Openmvg: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition, pages 60–74. Springer, 2016. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, highperformance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015. J. L. Schönberger, E. Zheng, M. Pollefeys, and J.M. Frahm. Pixelwise view selection for unstructured multiview stereo. In European Conference on Computer Vision (ECCV), 2016. S. M. Seitz and C. R. Dyer. Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision, 35(2):151–173, 1999. Y.C. Su and K. Grauman. Learning spherical convolution for fast features from 360 imagery. Advances in Neural Information Processing Systems, 30:529–539, 2017. Y.C. Su and K. Grauman. Kernel transformer networks for compact spherical convolution.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9442–9451, 2019. E. Tola, C. Strecha, and P. Fua. Efficient largescale multiview stereo for ultra highresolution image sets. Machine Vision and Applications, 23(5):903–920, 2012. F.E.Wang, Y.H.Yeh, M. Sun, W.C. Chiu, and Y.H. Tsai. Bifuse: Monocular 360 depth estimation via biprojection fusion. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. N.H. Wang, B. S. andYi Hsuan Tsai, W.C. Chiu, and M. Sun. 360sdnet: 360° stereo depth estimation with learnable cost volume. In International Conference on Robotics and Automation (ICRA), 2020. C. Won, J. Ryu, and J. Lim. Omnimvs: Endtoend learning for omnidirectional stereo matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. C. Won, J. Ryu, and J. Lim. Sweepnet: Widebaseline omnidirectional depth estimation. 2019 International Conference on Robotics and Automation (ICRA), May 2019. J. Yang, W. Mao, J. M. Alvarez, and M. Liu. Cost volume pyramid based depth inference for multiview stereo. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multiview stereo. European Conference on Computer Vision (ECCV), 2018. N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras. Omnidepth: Dense depth estimation for indoors spherical panoramas. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80664	-
dc.description.abstract	多視角立體視覺的目標是透過多張影像以及相對應的相機參數，還原場景的三維資訊。近年來，隨著深度學習的發展，許多論文在多視角立體視覺的題目上取得優異的成果。然而，在還原大型場景時，現有的方法會需要較多的勞力以確保取得的影像之間有足夠的重疊。因此，我們提出一個新的想法，使用全景影像作為多視角立體視覺的輸入以推斷場景的三維幾何資訊。全景圖的優點在於它們能夠獲得完整的環境訊息，並在單張影像中提供較廣泛且連續的資訊。為此，我們提出360MVSNet，一個用於360°影像的多視角立體視覺深度學習模型。為了使訓練的過程能夠考量到360°相機提供的幾何資訊，我們提出球型掃描的方法，根據所假設的深度將影像特徵投影到不同半徑的球體上做計算。透過多尺度的立體成本容積以及測量每個尺度模型的不確定性，我們能夠階段性的預測影像的深度，並生成高解析度的深度圖。除此之外，我們建立一個大型的合成資料集EQMVS，它包含50000張左右的RGB影像、深度圖以及相機參數。透過實驗結果證明，我們的模型在測試資料集以及真實世界的場景都能較完整的還原整個場景，同時在數據上超越其他的方法。	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-24T03:12:13Z (GMT). No. of bitstreams: 1 U0001-2010202100242300.pdf: 3884653 bytes, checksum: 2b7a8bc111a9d8f020424f230473b116 (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Denotation xv Chapter 1 Introduction 1 1.1 Introduction 1 Chapter 2 Related Work 5 2.1 Tradition Multiview Stereo 5 2.2 Deep Learning Multiview Stereo 6 2.3 Omnidirectional Depth Estimation and Stereo Matching 7 Chapter 3 Spherical Multiview Stereo 9 3.1 Feature Extraction 9 3.2 Spherical Sweeping 11 3.3 Multi Scale Cost Volume 17 3.4 Depth Regression and Loss Function 19 Chapter 4 Synthetic Dataset: EQMVS 21 4.1 Data Acquisition 21 Chapter 5 Experiments and Results 25 5.1 Implementation Details 25 5.2 Performance 26 5.3 Ablation Study 29 Chapter 6 Conclusion 33 References 35
dc.language.iso	en
dc.subject	電腦視覺	zh_TW
dc.subject	多視角立體視覺	zh_TW
dc.subject	360度影像	zh_TW
dc.subject	三維場景重建	zh_TW
dc.subject	全景影像	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Equirectangular Image	en
dc.subject	Computer Vision	en
dc.subject	Deep Learning	en
dc.subject	Multi-View Stereo	en
dc.subject	3D Scene Reconstruction	en
dc.subject	360° Image	en
dc.title	360MVSNet:基於360°影像之多視角立體視覺深度模型	zh_TW
dc.title	360MVSNet: Deep Multi-View Stereo Network with 360° Images	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	朱宏國(Hsin-Tsai Liu),李明穗(Chih-Yang Tseng)
dc.subject.keyword	多視角立體視覺,360度影像,三維場景重建,全景影像,深度學習,電腦視覺,	zh_TW
dc.subject.keyword	Multi-View Stereo,360° Image,3D Scene Reconstruction,Equirectangular Image,Deep Learning,Computer Vision,	en
dc.relation.page	39
dc.identifier.doi	10.6342/NTU202103906
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2021-10-22
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-2010202100242300.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	3.79 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。