應用於3D道路物件偵測基於圖片及點雲之區域融合網路

Hung-Hao Chen; 陳紘豪

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50098

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	傅立成(Li-Chen Fu)
dc.contributor.author	Hung-Hao Chen	en
dc.contributor.author	陳紘豪	zh_TW
dc.date.accessioned	2021-06-15T12:29:40Z	-
dc.date.available	2023-08-20
dc.date.copyright	2020-09-17
dc.date.issued	2020
dc.date.submitted	2020-08-15
dc.identifier.citation	[1] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster r-cnn: Towards real-time object detection with region proposal networks,' in Advances in neural information processing systems, 2015, pp. 91-99. [2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [3] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [4] P. C. Ng and S. Henikoff, 'SIFT: Predicting amino acid changes that affect protein function,' Nucleic acids research, vol. 31, no. 13, pp. 3812-3814, 2003. [5] N. Dalal and B. Triggs, 'Histograms of oriented gradients for human detection,' in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 2005, vol. 1, pp. 886-893. [6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, 'Imagenet: A large-scale hierarchical image database,' in 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248-255. [7] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The pascal visual object classes (voc) challenge,' International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010. [8] T.-Y. Lin et al., 'Microsoft coco: Common objects in context,' in European conference on computer vision, 2014, pp. 740-755. [9] M. Cordts et al., 'The cityscapes dataset for semantic urban scene understanding,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet classification with deep convolutional neural networks,' in Advances in neural information processing systems, 2012, pp. 1097-1105. [11] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014. [12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich feature hierarchies for accurate object detection and semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587. [13] R. Girshick, 'Fast r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. [14] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, 'Focal loss for dense object detection,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988. [15] J. Redmon and A. Farhadi, 'Yolov3: An incremental improvement,' arXiv preprint arXiv:1804.02767, 2018. [16] J. Redmon and A. Farhadi, 'YOLO9000: better, faster, stronger,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271. [17] C. Chen, 'Extracting cognition out of images for the purpose of autonomous driving,' Princeton University, 2016. [18] M. Barnard. Tesla Google Disagree About LIDAR — Which Is Right? 2016. Access on: May 29, 2020. [Online]. Available: https://cleantechnica.com/2016/07/29/tesla-google-disagree-lidar-right/ [19] W. Bao, B. Xu, and Z. Chen, 'Monofenet: Monocular 3d object detection with feature enhancement networks,' IEEE Transactions on Image Processing, vol. 29, pp. 2753-2765, 2019. [20] M. Ding et al., 'Learning depth-guided convolutions for monocular 3d object detection,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 1000-1001. [21] Y. Chen, S. Liu, X. Shen, and J. Jia, 'Dsgn: Deep stereo geometry network for 3d object detection,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12536-12545. [22] J. Sun et al., 'Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10548-10557. [23] B. Yang, M. Liang, and R. Urtasun, 'Hdnet: Exploiting hd maps for 3d object detection,' in Conference on Robot Learning, 2018, pp. 146-155. [24] B. Yang, W. Luo, and R. Urtasun, 'Pixor: Real-time 3d object detection from point clouds,' in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7652-7660. [25] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, 'Pointpillars: Fast encoders for object detection from point clouds,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12697-12705. [26] Z. Wu et al., '3d shapenets: A deep representation for volumetric shapes,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912-1920. [27] Y. Zhou and O. Tuzel, 'Voxelnet: End-to-end learning for point cloud based 3d object detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490-4499. [28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, 'Pointnet: Deep learning on point sets for 3d classification and segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652-660. [29] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, 'Pointnet++: Deep hierarchical feature learning on point sets in a metric space,' in Advances in neural information processing systems, 2017, pp. 5099-5108. [30] S. Shi, X. Wang, and H. Li, 'Pointrcnn: 3d object proposal generation and detection from point cloud,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 770-779. [31] A. Geiger, P. Lenz, and R. Urtasun, 'Are we ready for autonomous driving? the kitti vision benchmark suite,' in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354-3361. [32] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, 'Joint 3d proposal generation and object detection from view aggregation,' in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1-8. [33] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, 'Frustum pointnets for 3d object detection from rgb-d data,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 918-927. [34] M. Liang, B. Yang, S. Wang, and R. Urtasun, 'Deep continuous fusion for multi-sensor 3d object detection,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 641-656. [35] D.-S. Hong, H.-H. Chen, P.-Y. Hsiao, L.-C. Fu, and S.-M. Siao, 'CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving,' Image and Vision Computing, vol. 100, p. 103955, 2020. [36] P. Viola and M. Jones, 'Rapid object detection using a boosted cascade of simple features,' in Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, 2001, vol. 1, pp. I-I. [37] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, 'Object detection with discriminatively trained part-based models,' IEEE transactions on pattern analysis machine intelligence, vol. 32, no. 9, pp. 1627-1645, 2009. [38] O. Ronneberger, P. Fischer, and T. Brox, 'U-net: Convolutional networks for biomedical image segmentation,' in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241. [39] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, 'Enet: A deep neural network architecture for real-time semantic segmentation,' arXiv preprint arXiv:.02147, 2016. [40] J. Long, E. Shelhamer, and T. Darrell, 'Fully convolutional networks for semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. [41] V. Badrinarayanan, A. Kendall, and R. Cipolla, 'Segnet: A deep convolutional encoder-decoder architecture for image segmentation,' IEEE transactions on pattern analysis machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017. [42] I. Sobel and G. Feldman, 'A 3x3 isotropic gradient operator for image processing,' Pattern Classification and Scene Analysis, pp. 271-272, 1973. [43] V. Nair and G. E. Hinton, 'Rectified linear units improve restricted boltzmann machines,' in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814. [44] J. Kiefer and J. Wolfowitz, 'Stochastic estimation of the maximum of a regression function,' The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462-466, 1952. [45] D. P. Kingma and J. Ba, 'Adam: A method for stochastic optimization,' Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014. [46] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, 'Gradient-based learning applied to document recognition,' Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [47] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, 'Feature pyramid networks for object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125. [48] M. D. Zeiler and R. Fergus, 'Visualizing and understanding convolutional networks,' in European conference on computer vision, 2014, pp. 818-833. [49] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [50] S. Lloyd, 'Least squares quantization in PCM,' IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, 1982. [51] X. Chen et al., '3d object proposals for accurate object class detection,' in Advances in Neural Information Processing Systems, 2015, pp. 424-432. [52] A. Paszke et al., 'Automatic differentiation in pytorch,' 2017. [53] A. Simonelli, S. R. Bulo, L. Porzi, M. López-Antequera, and P. Kontschieder, 'Disentangling monocular 3d object detection,' in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1991-1999.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50098	-
dc.description.abstract	近年來，在自動駕駛領域的相關研究及發展日趨流行且成熟，且自動駕駛決策的整體表現也因為硬體和軟體技術的增長而變得更加的準確，許多科技公司開始著手於開發先進駕駛輔助系統(ADAS)，而3D物件偵測是先進駕駛輔助中最重要且不可或缺的一環，有了3D物件偵測的幫助，系統可以準確的定位到道路上的障礙物並進行最正確的決策。在現今的自動駕駛汽車上，大部分裝的感測器為相機以及光達(LiDAR)，有鑒於這兩種感測器先天上具有各自的優缺點，本研究旨在提出一個利用資料融合之方法去結合兩種感測器的優點的3D物件偵測網路。本研究中，首先分析了現今最先進的3D物件偵測方法，這些方法皆以深度學習和類神經網路去習得RGB影像和LiDAR點雲中具代表性的特徵去進行預測，然而，這些方法對於點雲的資料都會先進行資料壓縮成鳥瞰圖(BEV)，再輔以傳統的卷積神經網路進行特徵擷取。基於此觀察，本研究提出一個新穎的3D物件偵測網路，此網路是直接採以原始的點雲資料作為輸入以保留資料的原始性，此外，也提出了基於感興趣區域之資料融合方法，藉由區域上的資料融合，可以減少掉花費在不感興趣區域融合的時間成本以增加運行速度。為了驗證本研究提出的方法，我們於目前公認最具挑戰性的3D物件偵測數據集KITTI資料庫去進行評估，評估的結果顯示本研究的平均準確度超過80%。	zh_TW
dc.description.abstract	Over the past few years, the research and development of autonomous driving technology has been prevalent, and the performance has been significantly improved on both hardware and software. Especially, 3D object detection is an indispensable key technique to autonomous driving. This thesis targets at proposing a 3D object detector for detecting on-road vehicles, taking both LiDAR point clouds and RGB images as inputs and providing accurate 3D bounding boxes for the vehicles. In this thesis, we present a novel two-stream fusion-based 3D object detection network, called Regional Fusion Network (RF-Net), which includes multi-scale feature aggregation module and regional fusion layer to provide region-of-interest-level (RoI-level) fusion between RGB images and LiDAR cameras. The salient feature of our work is that RF-Net uses raw LiDAR point clouds directly as the input without any quantization process to avoid loss of information. Firstly, the rough estimations of foreground objects are generated through both LiDAR stream and RGB stream simultaneously. Our proposed multi-scale feature aggregation module is leveraged to exploit both high-level and low-level RGB features to capture objects from small-size to large-size. In addition, the proposed regional fusion layer utilizes point-wise features from the LiDAR stream and multi-scale spatial features from the RGB stream to generate the fully fused features for further 3D box refinement. Experimental results on the challenging KITTI Vision Benchmark show that the proposed RF-Net outperforms other state-of-the-art methods in mean average precision (mAP). Also, the ablation studies demonstrate that our approaches can improve the quality of 3D object detection.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:29:40Z (GMT). No. of bitstreams: 1 U0001-1108202015495700.pdf: 17943405 bytes, checksum: 561e22c8b0d10544b87b9eb7b9729a54 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Work 6 1.2.1 Image-based 3D object detector 6 1.2.2 LiDAR-based 3D object detector 7 1.2.3 Fusion-based 3D object detector 8 1.3 Contribution 8 1.4 Thesis Organization 9 Chapter 2 Preliminaries 11 2.1 Convolutional Neural Network 11 2.1.1 Convolutional Layers 12 2.1.2 Pooling Layers 15 2.1.3 Activation Functions 17 2.1.4 Fully Connected Layers 19 2.1.5 Stochastic Gradient Descent (SGD) Optimizer 20 2.1.6 Adam Optimizer 20 2.1.7 AlexNet 21 2.1.8 VGGNet 22 2.1.9 ResNet 23 2.1.10 Fine-tuning 24 2.1.11 Attention Mechanism 25 2.2 Object Detection Frameworks 26 2.2.1 Faster R-CNN 27 2.2.2 PointNet++ 28 Chapter 3 Regional Fusion Network 30 3.1 Problem formulation 30 3.2 Regional Fusion Network Overview 31 3.3 Regional Fusion Network Architecture Design 32 3.3.1 Preprocessing of Point Clouds 33 3.3.2 Backbone Network 35 3.3.3 Multi-scale Feature Aggregation Module 40 3.3.4 Regional Fusion Layer 41 3.4 Loss Function 44 Chapter 4 Experiments 47 4.1 KITTI Vision Benchmark 47 4.2 Experimental Setup 49 4.3 Evaluation Metric 50 4.3.1 Intersection over Union (IoU) 50 4.3.2 Mean Average Precision (mAP) 53 4.4 Experimental Result on KITTI Testing Set 54 4.4.1 3D Detection Benchmark 55 4.4.2 BEV Detection Benchmark 56 4.5 Ablation Studies 58 4.6 Qualitative Results and Discussion 60 Chapter 5 Conclusions 63 REFERENCE 64
dc.language.iso	en
dc.title	應用於3D道路物件偵測基於圖片及點雲之區域融合網路	zh_TW
dc.title	RF-Net: Regional Fusion Network for 3D On-road Object Detection Based on RGB Images and Point Clouds	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.coadvisor	蕭培墉(Pei-Yung Hsiao)
dc.contributor.oralexamcommittee	黃世勳(Shih-Shinh Huang),方瓊瑤(Chiung-Yao Fang),傅楸善(Chiou-Shann Fuh)
dc.subject.keyword	深度學習,資料融合,3D物件偵測,自動駕駛,	zh_TW
dc.subject.keyword	Deep learning,Data fusion,3D object detection,Autonomous driving,	en
dc.relation.page	69
dc.identifier.doi	10.6342/NTU202002968
dc.rights.note	有償授權
dc.date.accepted	2020-08-17
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1108202015495700.pdf 目前未授權公開取用	17.52 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。