自監督使用單光達生成多光達物件點雲視角

Yi-Hung Kuo; 郭羿宏

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84641

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	施吉昇(Chi-Sheng Shih)
dc.contributor.author	Yi-Hung Kuo	en
dc.contributor.author	郭羿宏	zh_TW
dc.date.accessioned	2023-03-19T22:18:42Z	-
dc.date.copyright	2022-09-19
dc.date.issued	2022
dc.date.submitted	2022-09-14
dc.identifier.citation	[1] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 529–10 538. [2] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” International Journal of Robotics Research (IJRR), 2013. [3] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “Density-based spatial clustering of applications with noise,” in Int. Conf. Knowledge Discovery and Data Mining, vol. 240, no. 6, 1996. [4] Q. Xu, Y. Zhong, and U. Neumann, “Behind the curtain: Learning occluded shapes for 3d object detection,” arXiv preprint arXiv:2112.02205, 2021. [5] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 697–12 705. [6] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015. [7] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “High-resolution shape completion using deep neural networks for global structure and local geometry inference,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 85–93. [8] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in European conference on computer vision. Springer, 2016, pp. 628–644. [9] M. Gadelha, R. Wang, and S. Maji, “Multiresolution tree networks for 3d point cloud processing,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 103–118. [10] G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan, “Pointflow: 3d point cloud generation with continuous normalizing flows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4541–4550. [11] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “Pcn: Point completion network,” in 2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 728–737. [12] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660. [13] Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov, “Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 446–15 456. [14] N. M. F. Salem PhD, “A survey on various image inpainting techniques,” Future Engineering Journal, vol. 2, no. 2, p. 1, 2021. [15] X. Zhan, X. Pan, B. Dai, Z. Liu, D. Lin, and C. C. Loy, “Self-supervised scene deocclusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3784–3792. [16] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4490–4499. [17] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [18] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018. [19] B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232. [20] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016. [21] S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. MarínJiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014. [22] O. D. Team, “Openpcdet: An open-source toolbox for 3d object detection from point clouds,” https://github.com/open-mmlab/OpenPCDet, 2020. [23] Y. Yan, “traveller59 / spconv : Spconv: Spatially sparse convolution library.” [Online]. Available: https://github.com/xacrimon/dashmap [24] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. [25] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [26] A. Simonelli, S. R. Bulo, L. Porzi, M. López-Antequera, and P. Kontschieder, “Disentangling monocular 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1991–1999. [27] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84641	-
dc.description.abstract	使用光達點雲的三維物件偵測被廣泛的應用於自駕車以及路側系統的物件偵測與追蹤。然而使用單光達點雲進行偵測的表現會受到物件遮蔽的影響。許多過往的研究提出將物件點雲補齊來提升三維物件偵測的表現。多數研究的模型輸入為單一物件的點雲，需要額外的前處理將物件點雲從光達掃描中取出。有些研究則是以整個光達掃描點雲作為模型輸入，但仍是需要有物件的三維標記，而新場景的標記的取得耗時且困難。本研究利用路側系統的多光達掃描之特性，設計了一個自主標記的流程。該流程透過去背景之多光達融合點雲，使用DBSCAN演算法得到物件三維標記，並以此在多光達融合點雲上取得物件之多光達視角。補點模型的訓練則是將多光達點雲拆成單一光達點雲作為輸入，並以物件之多光達視角點雲作為訓練的輸出。此流程使得補點模型得以在沒有人工標記下完成補點，在體積像素(Voxel)IoU相對於沒補點前增加約17%，體積像素(Voxel) recall則是相對於沒補點前提升約22%。 DBSCAN的3D AP@IoU=0.25在經過補點後提升約20%，並超越使用雙光達作為輸入的結果。深度學系模型的偵測表現亦有提升，如:PointPillar及PV-RCNN在KITTI資料集中Easy類別的3DAP@IoU=0.5分別提升約3%和1%。	zh_TW
dc.description.abstract	The 3D object detection using point clouds scanned by LiDARs has been widely used in applications such as self-driving vehicles and roadside vehicle detection and tracking. However, the detection performances using singleLiDAR-scanned point clouds suffer from occlusion. Several previous works proposed to complete the point cloud to improve 3D detection. Most works take the input of the object point clouds, which requires additional steps for processing the LiDAR-scanned point clouds. Some works complete object point clouds by taking the input of full LiDAR-scanned point clouds, but they require human-labeled 3D object bounding boxes, which are difficult and costly to obtain for new scenarios. This work designed a self-labeling method that exploits the characteristics of multiple LiDAR-scanned point clouds collected on roadside units. The self-labeling method labels the objects with DBSCAN using the fused point cloud scanned by multiple LiDARs that have been background-removed. The labels are then used to extract object points from the fused point clouds. The point completion model is trained using the single-LiDAR-scanned point cloud as input and using the extracted object points from the fused point cloud as the training target. This labeling method allows the point completion model to be trained without humanlabeled bounding boxes. The voxel IoU of the point-completed point clouds increased by 17% compared to that of the raw point clouds. The voxel recall of the point-completed point clouds increased by 22% compared to that of the raw point clouds. Additionally, the detection performance of DBSCAN, i.e., 3D AP@IoU=0.25 increased by 20% after applying the point completion algorithm. The deep-learning-based models also benefit from applying the algorithm. The 3D AP@IoU=0.5 for the easy targets in the KITTI dataset of PointPillar and PV-RCNN increased by about 3% and about 1%, respectively.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T22:18:42Z (GMT). No. of bitstreams: 1 U0001-1409202214243600.pdf: 90251231 bytes, checksum: 9917ed4aafdf57d776a8dc61f6e07bc5 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 iii 摘要 iv Abstract v 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background and Related Works 6 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 PointFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Semantic Point Generation . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Behind the Curtain Detector . . . . . . . . . . . . . . . . . . . . 9 3 System Architecture and Problem Definition 10 3.1 System Architecture and Term Definition . . . . . . . . . . . . . . . . . 10 3.2 Goals and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Design and Implementation 15 4.1 Networks Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Non-occluded 3 Dimension Point Cloud Data . . . . . . . . . . . . . . . 17 4.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.2 Self-labeling Dataset Generation . . . . . . . . . . . . . . . . . . 25 4.3 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Training Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.3 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 Experiment Evaluation 38 5.1 Evaluation Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 Evaluation Metrics and Methodology . . . . . . . . . . . . . . . . . . . . 40 5.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.1 Voxel-wise point completion performance . . . . . . . . . . . . . 45 5.3.2 DBSCAN detector improvements with point completion model . 45 5.3.3 Point completion performance compared with PointFlow . . . . . 46 5.3.4 Deep-learning-based detector improvement with point completion 47 5.3.5 Impact on Weak Classification Performance by point completion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.4 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.1 Performance comparison between different point cloud inputs to the cluster-based algorithm . . . . . . . . . . . . . . . . . . . . . 50 5.4.2 Visual comparison between two different labeling-rule-labeled datasets 51 5.4.3 Performance of the Model in untrained RoI . . . . . . . . . . . . 53 5.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.5.1 Performance of Adding Point Cloud from Multiple LiDAR . . . . 56 5.5.2 Affect of different distance kernels on Performance . . . . . . . . 59 5.5.3 Affect of different voxel sizes on Performance . . . . . . . . . . 60 5.5.4 The Metrics Impact Resulting from Labeling Rules . . . . . . . . 61 6 Conclusion 64 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Bibliography 66
dc.language.iso	zh-TW
dc.subject	點雲補點	zh_TW
dc.subject	多光達	zh_TW
dc.subject	路側系統	zh_TW
dc.subject	自監督	zh_TW
dc.subject	基於體積像素	zh_TW
dc.subject	Roadside unit	en
dc.subject	Self-supervised	en
dc.subject	Point cloud completion	en
dc.subject	Voxel-based	en
dc.subject	Multiple LiDARs	en
dc.title	自監督使用單光達生成多光達物件點雲視角	zh_TW
dc.title	Self-supervised Multi-LiDAR Object View Generation using Single LiDAR	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	傅立成(Li-Chen Fu),林忠緯(Chung-Wei Lin),涂嘉恒(Chia-Heng Tu)
dc.subject.keyword	點雲補點,自監督,多光達,基於體積像素,路側系統,	zh_TW
dc.subject.keyword	Point cloud completion,Self-supervised,Multiple LiDARs,Voxel-based,Roadside unit,	en
dc.relation.page	68
dc.identifier.doi	10.6342/NTU202203392
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2022-09-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-19	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1409202214243600.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	88.14 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。