運用消失點引導全景訊息預測

楊盛評; Sheng-Ping Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98742

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭文皇	zh_TW
dc.contributor.advisor	Wen-Huang Cheng	en
dc.contributor.author	楊盛評	zh_TW
dc.contributor.author	Sheng-Ping Yang	en
dc.date.accessioned	2025-08-18T16:18:43Z	-
dc.date.available	2025-08-19	-
dc.date.copyright	2025-08-18	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-08	-
dc.identifier.citation	S. Ardianto, H.-M. Hang, and W.-H. Cheng. Fast vehicle detection and tracking on fisheye traffic monitoring video using CNN and bounding box propagation. In ICIP, 2022. F. Barbato, E. Camuffo, S. Milani, and P. Zanuttigh. Continual road-scene semantic segmentation via feature-aligned symmetric multi-modal network. In ICIP, 2024. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In ICCV, 2019. A.-Q. Cao and R. De Charette. MonoScene: Monocular 3D semantic scene completion. In CVPR, 2022. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), 2018. T. Chen, X. Ying, J. Yang, R. Wang, R. Guo, B. Xing, and J. Shi. VPDetR: End-to-end vanishing point detection transformers. In AAAI, 2024. A. Das, S. Das, G. Sistu, J. Horgan, U. Bhattacharya, E. Jones, M. Glavin, and C. Eising. Revisiting modality imbalance in multimodal pedestrian detection. In ICIP, 2023. A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR, 2012. D. Guo, D.-P. Fan, T. Lu, C. Sakaridis, and L. Van Gool. Vanishing-point-guided video semantic segmentation of driving scenes. In CVPR, 2024. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. Y. Huang, W. Zheng, Y. Zhang, J. Zhou, and J. Lu. Tri-perspective view for vision-based 3D semantic occupancy prediction. In CVPR, 2023. H. Jiang, T. Cheng, N. Gao, H. Zhang, T. Lin, W. Liu, and X. Wang. Symphonize 3D semantic scene completion with contextual instance queries. In CVPR, 2024. S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-H. Lee, H. Seok Hong, S.-H. Han, and I. So Kweon. VPGNet: Vanishing Point Guided Network for lane and road marking detection and recognition. In ICCV, 2017. F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y. Shum. Mask DINO: Towards a unified transformer-based framework for object detection and segmentation. In CVPR, 2023. Y. Li, S. Li, X. Liu, M. Gong, K. Li, N. Chen, Z. Wang, Z. Li, T. Jiang, F. Yu, et al. SSCBench: A large-scale 3D semantic scene completion benchmark for autonomous driving. In IROS, 2024. Y. Li, Z. Yu, C. Choy, C. Xiao, J. M. Alvarez, S. Fidler, C. Feng, and A. Anandkumar. VoxFormer: Sparse voxel transformer for camera-based 3D semantic scene completion. In CVPR, 2023. Y. Liao, J. Xie, and A. Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2D and 3D. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 2022. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. In CVPR, 2017. Y. Wang and C. Tong. H2GFormer: Horizontal-to-global voxel transformer for 3D semantic scene completion. In AAAI, 2024. Z. Xia, Y. Liu, X. Li, X. Zhu, Y. Ma, Y. Li, Y. Hou, and Y. Qiao. SCPNet: Semantic scene completion on point cloud. In CVPR, 2023. J. Yao, C. Li, K. Sun, Y. Cai, H. Li, W. Ouyang, and H. Li. NDC-Scene: Boost monocular 3D semantic scene completion in normalized device coordinates space. In ICCV, 2023. Y. Zhang, Z. Zhu, and D. Du. OCCFormer: Dual-path transformer for vision-based 3D semantic occupancy prediction. In ICCV, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98742	-
dc.description.abstract	本論文旨在解決自動駕駛領域中，僅使用單目相機進行三維語義場景補全（Semantic Scene Completion, SSC）時所面臨的關鍵挑戰，特別是對遠距離與微小物件感知準確度不足的問題。現有方法在處理因透視投影而在影像中變得微小、特徵模糊的遠方物體時，常因注意力分散而導致性能下降，進而對行車安全構成潛在威脅。為解決此問題，本研究提出一個名為「消失點聚合器」（VanishingPoint Aggregator, VPA）的創新架構。該方法的核心觀察在於：於駕駛場景影像中，消失點周圍自然聚集了來自遠距離場景的重要視覺資訊。VPA 引入一種新型的「消失點查詢」，專門用以強化此關鍵區域的特徵提取；並透過跨來源注意力融合機制，將富含遠場細節的 VPQ 與擷取全域物件語義的「標準實例查詢」進行整合，進而構建出更具完整性與辨識力的場景特徵表徵。本研究於兩個具代表性的公開資料集——SemanticKITTI 與 SSCBench-KITTI-360 上進行系統性實驗與分析。實驗結果顯示，所提出的 VPA 模型在多項指標上皆達成目前最佳水準，尤其在遠距離區域與如行人、交通號誌等安全關鍵的微小物件類別上，顯著提升預測準確率。上述成果證實了本方法在提升單目 SSC 任務中遠場感知能力方面的有效性，對強化自動駕駛系統的環境感知穩定性與整體安全性具備實質貢獻。	zh_TW
dc.description.abstract	Semantic Scene Completion (SSC) aims to jointly predict semantic categories and 3D occupancy of a scene from coarse inputs, which is crucial for providing reliable perception in autonomous driving. In this paper, we enhance existing SSC models by unveiling the vanishing point region, specifically addressing challenges posed by tiny objects and voxels distant from the monocular camera. At the core of our method, we propose the Vanishing Point Aggregator (VPA) to prioritize features in high-density central areas. The proposed VPA seamlessly integrates the Vanishing Point Query (VPQ) with the vanilla instance query via a cross-attention fusion mechanism to refine feature representation. To evaluate the effectiveness of our method, we conduct comprehensive experiments on two standard SSC benchmarks and demonstrate that our method achieves SOTA performance. Our ap- proach significantly improves the performance across various semantic classes, including a notable gain of 0.37 mIoU on SemanticKITTI and 0.5 mIoU on SSCBench-KITTI-360 for tiny objects. Ablation studies further validate the efficacy of our innovative query fusion strategy, showcasing its capability in long-range predictions for SSC tasks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T16:18:43Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-18T16:18:43Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures ix List of Tables xii Chapter 1 Introduction 1 1.1 Publication 5 Chapter 2 Related Works 6 2.1 3D Semantic Scene Completion 6 2.2 Deformable Attention 7 2.3 Two-Stage Architecture: VoxFormer 10 2.4 Query-Centric Method: Symphonies 13 2.5 Vanishing Point-related Approaches 17 Chapter 3 Method 19 3.1 Problem Formulation 19 3.2 Motivations and Observations 20 3.3 Overview 22 3.4 Proposed Method 25 3.4.1 Vanishing Point Query (VPQ) Initialization 25 3.4.2 Vanishing Point Query Update with Regional Features 27 3.4.3 Cross-Source Query Update with Initial Voxel Features 29 3.4.4 Initial Scene Voxel Feature Generation 30 3.4.5 Iterative Feature Co-Refinement Module 33 3.4.6 Segmentation Head 34 3.4.7 Loss Functions 35 Chapter 4 Experiments 37 4.1 Datasets 37 4.2 Evaluation Metrics 38 4.3 Implementation Details 39 4.4 Baseline Methods 40 4.5 Quantitative Results 40 4.5.1 Results on SemanticKITTI Validation Set 40 4.5.2 Results on SSCBench-KITTI-360 Test Set 44 4.6 Ablation Study 45 4.6.1 Impact of Query Design and Interaction 45 4.6.2 Performance Analysis across Distance Ranges 46 4.6.3 Effect of Vanishing Point Region Size 48 Chapter 5 Conclusion 50 5.1 Conclusion 50 5.2 Future Work 51 References 52	-
dc.language.iso	en	-
dc.subject	自動駕駛	zh_TW
dc.subject	語義場景補全	zh_TW
dc.subject	消失點	zh_TW
dc.subject	小物件偵測	zh_TW
dc.subject	Semantic Scene Completion	en
dc.subject	Vanishing Point	en
dc.subject	Autonomous Driving	en
dc.subject	Tiny Object Detection	en
dc.title	運用消失點引導全景訊息預測	zh_TW
dc.title	Vanishing-point Guided Semantic Scene Completion	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	莊永裕;帥宏翰;簡韶逸;黃敬群	zh_TW
dc.contributor.oralexamcommittee	Yung-Yu Chuang;Hong-Han Shuai;Shao-Yi Chien;Ching-Chun Huang	en
dc.subject.keyword	自動駕駛,語義場景補全,消失點,小物件偵測,	zh_TW
dc.subject.keyword	Autonomous Driving,Semantic Scene Completion,Vanishing Point,Tiny Object Detection,	en
dc.relation.page	54	-
dc.identifier.doi	10.6342/NTU202503382	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2025-08-19	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	15.31 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。