幾何感知表示學習用於非監督式單眼深度估計

Chih-Hsuan Lo; 羅志軒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66482

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳良基(Liang-Gee Chen)
dc.contributor.author	Chih-Hsuan Lo	en
dc.contributor.author	羅志軒	zh_TW
dc.date.accessioned	2021-06-17T00:38:20Z	-
dc.date.available	2021-02-17
dc.date.copyright	2020-02-17
dc.date.issued	2019
dc.date.submitted	2020-02-06
dc.identifier.citation	[1] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, Deepordinal regression network for monocular depth estimation,' 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp.2002-2011, 2018. [2] D. Eigen, C. Puhrsch, and R. Fergus, Depth map prediction from a single image using a multi-scale deep network,' in Advances in neural information processing systems, 2014, pp.2366-2374. [3] F. Liu, C. Shen, G. Lin, and I. Reid, Learning depth from single monocular images using deep convolutional neural fields,' IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp.2024-2039, 2015. [4] C. Godard, O. Mac Aodha, and G. J. Brostow, Unsupervised monocular depth estimation with left-right consistency,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017,pp. 270-279. [5] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, Unsupervised learning of depth and ego-motion from video,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.1851-1858 [6] P.-Y. Chen, A. H. Liu, Y.-C. Liu, and Y.-C. F. Wang, Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.2624-2632. [7] A. Atapour-Abarghouei and T. P. Breckon, Veritatem dies aperit temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3373-3384. [8] Z. Zhang, Z. Cui, C. Xu, Z. Jie, X. Li, and J. Yang, Joint task recursive learning for semantic segmentation and depth estimation,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 235-251. [9] Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe, and J. Yang, Pattern-affinitive propagation across depth, surface normal and semantic segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4106-4115. [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets,' in Advances in neural information processing systems, 2014, pp. 2672-2680. [11] D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,' International journal of computer vision, vol. 47, no. 1-3, pp. 7-42, 2002. [12] C. Wu et al., Visualsfm: A visual structure from motion system,'2011. [13] B. Li, C. Shen, Y. Dai, A. Van Den Hengel, and M. He, Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1119-1127. [14] D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe, Multi-scale continuous crfs as sequential deep networks for monocular depth estimation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5354-5362. [15] R. Mahjourian, M. Wicke, and A. Angelova, Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5667-5675. [16] Z. Yin and J. Shi, Geonet: Unsupervised learning of dense depth, optical flow and camera pose,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1983-1992. [17] Z. Yang, P. Wang, Y. Wang, W. Xu, and R. Nevatia, Lego: Learning edge with geometry all at once by watching videos,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 225-234. [18] Y. Kuznietsov, J. Stuckler, and B. Leibe, Semi-supervised deep learning for monocular depth map prediction,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6647-6655. [19] A. CS Kumar, S. M. Bhandarkar, and M. Prasad, Monocular depth prediction using generative adversarial networks,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 300-308. [20] F. Aleotti, F. Tosi, M. Poggi, and S. Mattoccia, Generative adversarial networks for unsupervised monocular depth prediction,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp.0-0. [21] A. Pilzer, D. Xu, M. Puscas, E. Ricci, and N. Sebe, Unsupervised adversarial depth estimation using cycled generative networks,' in 2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 587-595. [22] R. Chen, F. Mahmood, A. Yuille, and N. J. Durr, Rethinking monocular depth estimation with adversarial training,' arXiv preprint arXiv:1808.07528, 2018. [23] M. Mirza and S. Osindero, Conditional generative adversarial nets,' arXiv preprint arXiv:1411.1784, 2014. [24] J. Nath Kundu, P. Krishna Uppala, A. Pahuja, and R. Venkatesh Babu, Adadepth: Unsupervised content congruent adaptation for depth estimation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2656-2665. [25] A. Mousavian, H. Pirsiavash, and J. Ko seck a, Joint semantic segmentation and depth estimation with deep convolutional networks,' in 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016, pp. 611-619. [26] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The kitti dataset,' The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, 2013. [27] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, The cityscapes dataset for semantic urban scene understanding,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223. [28] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3234-3243. [29] Z. Yang, P. Wang, W. Xu, L. Zhao, and R. Nevatia, Unsupervised learning of geometry from videos with edge-aware depth-normal consistency,' in Thirty-Second AAAI Conference on Arti cial Intelligence, 2018. [30] E. Tzeng, J. Ho man, K. Saenko, and T. Darrell, Adversarial discriminative domain adaptation,' in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [31] A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wul , and M. J. Black, Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12240-12249.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66482	-
dc.description.abstract	對於場景理解來說，單眼深度預測(monocular depth estimation)是一個重要的判斷依據，雖然目前大量的監督式和非監督式機器學習方法被提出，並在單眼深度預測上取得長足的進展，但通常大部分的方法在物體邊界以及細節上無法獲得很好的結果，而這些部份的深度資訊在生活應用上卻是相對重要部份。在這篇論文當中，我們提出一個全新的「幾何結構表示學習方法」 (geometry-aware representation learning)，透過加入語意分割的資訊將物體幾何結構納入單眼深度預測中，搭配上一系列的特殊條件判別器，用於統整物體結構和視覺外觀，最終有效幫助非監督式單眼深度預測，改善之前大部分方法於物體邊界和細節上不準確的問題。透過在公開資料集上定量和定性分析，證明我們的表示學習方法在非監督式單眼深度預測上比肩於目前其餘最先進方法的結果，並於特定物體上取得明顯的進步。	zh_TW
dc.description.abstract	Monocular depth estimation plays an important role in scene understanding. While a number of supervised and unsupervised learning approaches for monocular depth estimation have been proposed, their promising quantitative performance might not necessarily reflect satisfactory quality of the depth outputs due to inaccurate object boundaries. In this paper, we propose a novel approach of geometry-aware representation learning, which takes object geometry into account with the aid of semantic scene understanding (ie semantic segmentation). With a series of unique combinatorial conditional discriminators deployed over visual appearance and geometric representations, improved unsupervised depth estimation can be achieved. Experiments on the KITTI dataset successively verify that our model performs favorably against recent approaches.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T00:38:20Z (GMT). No. of bitstreams: 1 ntu-108-R06943020-1.pdf: 2901697 bytes, checksum: 961ddaf20bdd4645cb263631639ca370 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Abstract vii 1 Introduction 1 2 Background 5 2.1 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Adversarial Learning . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Multi-task Learning for Visual Anlaysis . . . . . . . . . . . . 7 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 The Proposed Method 9 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Joint Exploitation of Depth and Object Semantics for Representation Learning . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Geometry-Aware Representation Learning . . . . . . . . . . 13 3.4 Learning Across Data Domains . . . . . . . . . . . . . . . . 15 3.5 Full Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Experiment 19 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 20 4.3 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.5 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . 25 5 Conclusion 27 Bibliography 28
dc.language.iso	en
dc.title	幾何感知表示學習用於非監督式單眼深度估計	zh_TW
dc.title	Geometry-Aware Representation Learning For Unsupervised Monocular Depth Estimation	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.coadvisor	王鈺強(Yu-Chiang Wang)
dc.contributor.oralexamcommittee	陳駿丞(Jun-Cheng Chen),陳祝嵩(Chu-Song Chen)
dc.subject.keyword	場景理解,單眼深度預測,語義分割,領域自適應,多任務學習,非監督學習,表示學習,	zh_TW
dc.subject.keyword	scene understanding,monocular depth estimation,semantic segmentation,domain adaptation,multi-task learning,unsupervised learning,representation learning,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU202000321
dc.rights.note	有償授權
dc.date.accepted	2020-02-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	2.83 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。