基於永續深度學習機制之六自由度相機定位研究

Yi Chiu; 邱熠

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16380

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王凡(Farn Wang)
dc.contributor.author	Yi Chiu	en
dc.contributor.author	邱熠	zh_TW
dc.date.accessioned	2021-06-07T18:12:13Z	-
dc.date.copyright	2020-08-04
dc.date.issued	2020
dc.date.submitted	2020-07-29
dc.identifier.citation	[1] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars. Mem- ory aware synapses: Learning what (not) to forget. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision – ECCV 2018, pages 144–161, Cham, 2018. Springer International Publishing. [2] R. Aljundi, P. Chakravarty, and T. Tuytelaars. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3366–3375, 2017. [3] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [4] V. Balntas, S. Li, and V. Prisacariu. Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018. [5] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother. Dsac - differentiable ransac for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [6] E. Brachmann and C. Rother. Learning less is more - 6d camera localization via 3d surface regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [7] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz. Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [8] A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. S. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018. [9] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. S. Torr, and M. Ranzato. Continual learning with tiny episodic memories. CoRR, abs/1902.10486, 2019. [10] D. M. Chen, G. Baatz, K. Ko ̈ser, S. S. Tsai, R. Vedantham, T. Pylva ̈na ̈inen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identification on mobile devices. In CVPR 2011, pages 737–744, 2011. [11] J. Delhumeau, P.-H. Gosselin, H. J ́egou, and P. P ́erez. Revisiting the vlad image representation. In Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, page 653–656, New York, NY, USA, 2013. Association for Computing Machinery. [12] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. [13] D. DeTone, T. Malisiewicz, and A. Rabinovich. Superpoint: Self-supervised interest point detection and description. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337– 33712, 2018. [14] P. Dhar, R. V. Singh, K.-C. Peng, Z. Wu, and R. Chellappa. Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [15] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017. [16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014. [17] C. Hazirbas. Poselstm and posenet implementation in pytorch. https:// github.com/hazirbas/poselstm-pytorch, Accessed 16 Jul 2020. [18] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. [19] C.-Y. Hung, C.-H. Tu, C.-E. Wu, C.-H. Chen, Y.-M. Chan, and C.-S. Chen. Compacting, picking and growing for unforgetting continual learning. In Ad- vances in Neural Information Processing Systems, pages 13647–13657, 2019. [20] S. C. Hung, J.-H. Lee, T. S. Wan, C.-H. Chen, Y.-M. Chan, and C.-S. Chen. In- creasingly packing multiple facial-informatics modules in a unified deep-learning model via lifelong learning. In Proceedings of the 2019 on International Con- ference on Multimedia Retrieval, pages 339–343. ACM, 2019. [21] D. Isele and A. Cosgun. Selective experience replay for lifelong learning. In Thirty-second AAAI conference on artificial intelligence, 2018. [22] H. Jung, J. Ju, M. Jung, and J. Kim. Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122, 2016. [23] H. J ́egou, M. Douze, C. Schmid, and P. P ́erez. Aggregating local descriptors into a compact image representation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3304–3311, 2010. [24] A. Kendall and R. Cipolla. Modelling uncertainty in deep learning for cam- era relocalization. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4762–4769, 2016. [25] A. Kendall and R. Cipolla. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [26] A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the International Conference on Computer Vision (ICCV), 2015. [27] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [28] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forget- ting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017. [29] M. D. Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. arXiv: Computer Vision and Pattern Recognition, 2020. [30] Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala. Camera relocalization by computing pairwise relative poses using convolutional neural network. In Pro- ceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Oct 2017. [31] S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang. Overcoming catas- trophic forgetting by incremental moment matching. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, editors, Advances in Neural Information Processing Systems 30, pages 4652–4662. Curran Associates, Inc., 2017. [32] Y. Li, N. Snavely, D. Huttenlocher, and P. Fua. Worldwide pose estimation using 3D point clouds. In European Conf. on Computer Vision, 2012. [33] Z. Li and D. Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2018. [34] X. Liu, M. Masana, L. Herranz, J. Van de Weijer, A. M. Lo ́pez, and A. D. Bagdanov. Rotate your networks: Better weight consolidation and less catas- trophic forgetting. In 2018 24th International Conference on Pattern Recogni- tion (ICPR), pages 2262–2268, 2018. [35] A. Mallya, D. Davis, and S. Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–82, 2018. [36] A. Mallya and S. Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7765–7773, 2018. [37] I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu. Relative camera pose estimation using convolutional neural networks. In J. Blanc-Talon, R. Penne, W. Philips, D. Popescu, and P. Scheunders, editors, Advanced Concepts for Intelligent Vision Systems, pages 675–687, Cham, 2017. Springer International Publishing. [38] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardo ́s. ORB-SLAM: a versa- tile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. [39] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54 – 71, 2019. [40] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In Advances in Neural Information Processing Systems Workshops, 2017. [41] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De- Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning li- brary. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ́e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. [42] A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars. Encoder based lifelong learning. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 1320–1328, 2017. [43] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [44] D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne. Experience replay for continual learning. In Advances in Neural Information Processing Systems, pages 350–360, 2019. [45] A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence, 2018. [46] A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3):651– 663, 2020. [47] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell. Progressive neural networks. CoRR, abs/1606.04671, 2016. [48] S. Saha, G. Varma, and C. Jawahar. Improved visual relocalization by discov- ering anchor points. BMVC, 2018. [49] T. Sattler, B. Leibe, and L. Kobbelt. Efficient effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9):1744–1756, 2017. [50] T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taix ́e. Understanding the limi- tations of cnn-based absolute camera pose regression. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3302–3312. Computer Vision Foundation / IEEE, 2019. [51] J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell. Progress compress: A scalable framework for continual learning. In ICML, 2018. [52] J. Serra, D. Suris, M. Miron, and A. Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423, 2018. [53] H. Shin, J. K. Lee, J. Kim, and J. Kim. Continual learning with deep generative replay. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2990–2999. Curran Associates, Inc., 2017. [54] Sivic and Zisserman. Video google: a text retrieval approach to object matching in videos. In Proceedings Ninth IEEE International Conference on Computer Vision, pages 1470–1477 vol.2, 2003. [55] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Com- puter Vision and Pattern Recognition (CVPR), 2015. [56] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), June 2015. [57] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers. Image-based localization using lstms for structured feature correlation. In Pro- ceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017. [58] C. Wu. Towards linear-time incremental structure from motion. In 2013 Inter- national Conference on 3D Vision-3DV 2013, pages 127–134. IEEE, 2013. [59] C. Wu, L. Herranz, X. Liu, y. wang, J. van de Weijer, and B. Raducanu. Memory replay gans: Learning to generate new categories without forgetting. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5962–5972. Curran Associates, Inc., 2018. [60] J. Xu and Z. Zhu. Reinforced continual learning. In Advances in Neural Infor- mation Processing Systems, pages 899–908, 2018. [61] J. Yoon, E. Yang, J. Lee, and S. J. Hwang. Lifelong learning with dynamically expandable networks. In International Conference on Learning Representations, 2018. [62] F. Zenke, B. Poole, and S. Ganguli. Continual learning through synaptic intel- ligence. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th Interna- tional Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3987–3995, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. [63] J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo. Class-incremental learning via deep model consolidation. In The IEEE Winter Conference on Applications of Computer Vision, pages 1131–1140, 2020. [64] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. [65] M. Zhu and S. Gupta. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16380	-
dc.description.abstract	視覺定位對於許多電腦視覺的應用是相當重要的，例如：自動駕駛、虛擬實境、導航等。深度學習技術已經在多種電腦視覺領域（包含物件偵測、影像分類、動作識別等）展現顯著的成效。近年來，深度神經網路也被運用在從單一影像當中估計六自由度的相機姿態（位置和方向）。傳統上，需要一個深度模型去學習一個場景的定位資訊。然而不同場景之間時常會有共同的特徵，假若單獨訓練各個模型在各場景上，模型之間將無法彼此分享這些從特徵當中抽取出的知識。這些學習過的資訊有機會能幫助新場景的定位預測。除此之外，這些多個獨立訓練的模型對於行動裝置上的應用來說是一種負擔。本論文提出一個持續學習對於視覺定位問題的方法，能有效學習單一個緊實的模型來預測多個場景的相機姿態。在學習的過程中，已經學習過的場景資訊並不會被遺忘，而從前面場景抽取出的知識，可以重複被利用。就我們所知，本論文是第一篇將永續學習和視覺定位做結合的研究。實驗結果顯示可以有效漸進地學習多個場景姿態，且獲得更好的定位精準度相較於個別場景訓練。為了更貼近真實情況，我們在臺灣大學校園中蒐集一個大規模的資料集來測試我們的方法。	zh_TW
dc.description.abstract	Visual localization, which aims to acquire accurate camera pose estimation in a known scene, is important in many computer vision applications such as autonomous driving, virtual reality, navigation. Deep learning have achieved dominant success on variety of computer vision researches including object detection, image classification, action recognition and so on. Recently, deep neural networks are also applied for estimating six degrees of freedom (6-DOF) camera pose (position and orientation) from a single image. Traditionally, one deep model is required to learn a scene location. However, it is common that multiple scenes share some common features. By individually training each scene, the models are unable to share the knowledge extracted from features with one another. These learned information may assist in the prediction on new scenes. Additionally, these independent models may be burdens for mobile devices. This thesis propose a continual learning approach for visual localization problem, which can effectively learn a compact model for estimating the pose among multiple scenes without forgetting. During the training process, the knowledge learned from the scenes can also be reused. To the best of our knowledge, this thesis is the first work that combines continual learning and visual localization. Experimental results show that our method can incrementally learn a compact model for multiple scenes with more better accuracy than individual scene training. To be close to real situation, we gather a large scale dataset in National Taiwan University campus to benchmark our approach.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T18:12:13Z (GMT). No. of bitstreams: 1 U0001-2907202015333900.pdf: 19970106 bytes, checksum: b17b068c2e675f99d663876a8881f4d9 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii Abstract iv 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Organization 2 2 Related Work 4 2.1 Camera Relocalization 4 2.2 Continual Lifelong Learning 5 2.3 Summary 7 3 Continual Lifelong Learning for Camera Relocalization 8 3.1 Pose Regreesor 8 3.1.1 Architecture 8 3.1.2 Loss Function 10 3.2 Continual Learning Method 10 3.2.1 Method Overview 10 3.2.2 The CPG approach with Column-wise Expansion 11 4 Experiments 14 4.1 Implementation Details 14 4.2 Datasets 14 4.3 Major Results 17 4.3.1 Experimental Results on Cambridge Landmarks 17 4.3.2 Experimental Results on NTU Campus Scenes 19 4.3.3 Experimental Results on Different Weather or Time Conditions 20 4.4 Ablation Studies 22 5 Conclusion 24 Bibliography 25
dc.language.iso	en
dc.subject	影像定位	zh_TW
dc.subject	視覺定位	zh_TW
dc.subject	永續學習	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	終身學習	zh_TW
dc.subject	相機重新定位	zh_TW
dc.subject	Continual Learning	en
dc.subject	Lifelong Learning	en
dc.subject	Image-based Localization	en
dc.subject	Deep Learning	en
dc.subject	Visual Localization	en
dc.subject	Camera Relocalization	en
dc.title	基於永續深度學習機制之六自由度相機定位研究	zh_TW
dc.title	Continual Lifelong Learning of Deep Convolutional Networks for 6-DOF Camera Relocalization	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.coadvisor	陳祝嵩(Chu-Song Chen)
dc.contributor.oralexamcommittee	洪一平(Yi-Ping Hung)
dc.subject.keyword	深度學習,永續學習,終身學習,相機重新定位,視覺定位,影像定位,	zh_TW
dc.subject.keyword	Deep Learning,Continual Learning,Lifelong Learning,Camera Relocalization,Visual Localization,Image-based Localization,	en
dc.relation.page	31
dc.identifier.doi	10.6342/NTU202002038
dc.rights.note	未授權
dc.date.accepted	2020-07-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
U0001-2907202015333900.pdf 未授權公開取用	19.5 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。