無人機空拍影像之小物件偵測優化

劉品佑; Pin-Yu Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁肇隆	zh_TW
dc.contributor.advisor	Chao-Lung Ting	en
dc.contributor.author	劉品佑	zh_TW
dc.contributor.author	Pin-Yu Liu	en
dc.date.accessioned	2026-04-30T16:15:18Z	-
dc.date.available	2026-05-01	-
dc.date.copyright	2026-04-30	-
dc.date.issued	2026	-
dc.date.submitted	2026-04-10	-
dc.identifier.citation	[1] G. Jocher, J. Qiu, and A. Chaurasia, Ultralytics YOLO. (Jan. 2023). Python. [Online]. Available: https://github.com/ultralytics/ultralytics [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html [3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. [4] R. Girshick, “Fast R-CNN,” Sep. 27, 2015, arXiv: arXiv:1504.08083. doi: 10.48550/arXiv.1504.08083. [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Jan. 06, 2016, arXiv: arXiv:1506.01497. doi: 10.48550/arXiv.1506.01497. [6] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into High Quality Object Detection,” Dec. 03, 2017, arXiv: arXiv:1712.00726. doi: 10.48550/arXiv.1712.00726. [7] W. Liu et al., “SSD: Single Shot MultiBox Detector,” vol. 9905, 2016, pp. 21–37. doi: 10.1007/978-3-319-46448-0_2. [8] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully Convolutional One-Stage Object Detection,” Aug. 20, 2019, arXiv: arXiv:1904.01355. doi: 10.48550/arXiv.1904.01355. [9] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “CenterNet: Keypoint Triplets for Object Detection,” Apr. 19, 2019, arXiv: arXiv:1904.08189. doi: 10.48550/arXiv.1904.08189. [10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” May 09, 2016, arXiv: arXiv:1506.02640. doi: 10.48550/arXiv.1506.02640. [11] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” Apr. 08, 2018, arXiv: arXiv:1804.02767. doi: 10.48550/arXiv.1804.02767. [12] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 23, 2020, arXiv: arXiv:2004.10934. doi: 10.48550/arXiv.2004.10934. [13] G. Jocher, YOLOv5 by Ultralytics. (May 2020). Python. doi: 10.5281/zenodo.3908559. [14] T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” Feb. 21, 2015, arXiv: arXiv:1405.0312. doi: 10.48550/arXiv.1405.0312. [15] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, “A survey and performance evaluation of deep learning methods for small object detection,” Expert Syst. Appl., vol. 172, p. 114602, Jun. 2021, doi: 10.1016/j.eswa.2021.114602. [16] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Apr. 19, 2017, arXiv: arXiv:1612.03144. doi: 10.48550/arXiv.1612.03144. [17] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” Sep. 18, 2018, arXiv: arXiv:1803.01534. doi: 10.48550/arXiv.1803.01534. [18] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and Efficient Object Detection,” Jul. 27, 2020, arXiv: arXiv:1911.09070. doi: 10.48550/arXiv.1911.09070. [19] R. Sunkara and T. Luo, “No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects,” Aug. 07, 2022, arXiv: arXiv:2208.03641. doi: 10.48550/arXiv.2208.03641. [20] A. V. Etten, “You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery,” May 24, 2018, arXiv: arXiv:1805.09512. doi: 10.48550/arXiv.1805.09512. [21] F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection,” in 2022 IEEE International Conference on Image Processing (ICIP), Oct. 2022, pp. 966–970. doi: 10.1109/ICIP46576.2022.9897990. [22] “What is Non-Max Merging?,” Roboflow Blog. [Online]. Available: https://blog.roboflow.com/non-max-merging/ [23] D. Du et al., “VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results”. [24] N. Cohen, J. Gattuso, and K. MacLennan-Brown, CCTV operational requirements manual. St Albans: Home Office Scientific Development Branch, 2009. [25] R. B. Miller, “Response time in man-computer conversational transactions,” in Proceedings of the December 9-11, 1968, fall joint computer conference, part I on - AFIPS ’68 (Fall, part I), San Francisco, California: ACM Press, 1968, p. 267. doi: 10.1145/1476589.1476628. [26] M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma, and C. Piao, “UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective,” Sensors, vol. 20, no. 8, p. 2238, Jan. 2020, doi: 10.3390/s20082238. [27] Z. Zhang, “Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images,” Drones, vol. 7, no. 8, p. 526, Aug. 2023, doi: 10.3390/drones7080526. [28] Y. Li et al., “SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images,” Remote Sens., vol. 16, no. 16, p. 3057, Jan. 2024, doi: 10.3390/rs16163057.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286	-
dc.description.abstract	無人機（UAV）空拍影像具有視野廣闊、背景複雜及目標尺度微小等特性，使得傳統物件偵測模型在此類場景下常面臨特徵流失與漏檢率高的挑戰。針對此問題，本研究提出了一種基於 YOLOv11 的改良偵測框架，旨在提升無人機視角下的小物件偵測效能。研究首先在模型架構上進行優化，將原有的特徵融合層替換為雙向特徵金字塔網路（BiFPN），引入可學習的權重機制以強化跨尺度特徵的整合；同時，在骨幹網路淺層引入空間-深度轉換卷積（SPD-Conv），以無損下採樣方式替代傳統步長卷積，有效保留了微小目標的細粒度資訊。其次，針對高解析度影像處理，本研究在推論階段整合了切片輔助超推論（SAHI）策略，透過滑動視窗切片與非極大值合併（NMM）技術，進一步最大化小物件的偵測召回率。實驗結果顯示，在具挑戰性的 VisDrone-DET2019 基準資料集上，本研究提出的改良模型在標準全圖推論模式下的 mAP@50 達到 50.40%，優於基線模型 YOLOv11s 的 40.31%；在導入 SAHI 策略後，mAP@50 更大幅提升至 61.82%。與現有的先進模型相比，本方法在僅使用 12.02M 參數量的輕量化基礎上，達到了與重型模型（如 DRONE-YOLO，76.2M 參數）相當的偵測精度，證實了本研究在精度與運算效率之間取得了極佳的平衡，極具實際應用潛力。	zh_TW
dc.description.abstract	UAV imagery is characterized by wide fields of view, complex backgrounds, and minute target scales, posing significant challenges for traditional object detection models regarding feature loss and missed detections. This study proposes an improved YOLOv11 framework to enhance small object detection in drone-view scenarios. The architecture is optimized by replacing the feature fusion layer with a Bidirectional Feature Pyramid Network (BiFPN) to strengthen cross-scale feature integration. Simultaneously, Space-to-Depth Convolution (SPD-Conv) is introduced into the backbone’s shallow layers, utilizing lossless downsampling to preserve fine-grained information. To address high-resolution imagery, the Slicing Aided Hyper Inference (SAHI) strategy is integrated during inference, employing sliding window slicing and Non-Maximum Merging (NMM) to maximize recall. Experimental results on the VisDrone-DET2019 benchmark show that the proposed model achieves an mAP@50 of 50.40% in full-frame mode, outperforming the baseline YOLOv11s (40.31%). With SAHI, the mAP@50 further increases to 61.82%. Compared to state-of-the-art models, this method achieves accuracy comparable to heavy models (e.g., DRONE-YOLO) while using only 12.02M parameters, demonstrating a superior balance between accuracy and computational efficiency.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-30T16:15:18Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-04-30T16:15:18Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 i Abstract ii 目次 iii 圖次 v 表次 vi 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 1 1.3 研究目的 2 1.4 論文架構 2 第二章文獻回顧 3 2.1 物件偵測發展 3 2.1.1 兩階段偵測器（Two-stage Detectors） 3 2.1.2 單階段偵測器（One-stage Detectors） 4 2.2 YOLO 系列演算法 5 2.3 小物件偵測之挑戰 6 2.4 雙向特徵金字塔網路 (BiFPN) 8 2.5 空間-深度轉換卷積 (SPD-Conv) 9 2.6 高解析度影像處理策略 11 2.6.1 訓練階段：影像切片訓練策略 11 2.6.2 推論階段：切片輔助超推論 (SAHI) 13 第三章實驗方法 16 3.1 資料集選擇 16 3.2 改進之模型架構 19 3.2.1 YOLOv11 基線模型 20 3.2.2 BiFPN 應用於特徵融合機制之架構改良 21 3.2.3 SPD-Conv 應用於 Backbone 之改良 22 3.3 高解析度影像策略 24 3.3.1 切片訓練策略 24 3.3.2 SAHI 推論策略 26 第四章實驗結果與分析 28 4.1 實驗環境與評估指標 28 4.1.1 實驗環境配置 28 4.1.2 評估指標 30 4.2 架構改良消融實驗分析 32 4.2.1 BiFPN 之效益分析 33 4.2.2 SPD-Conv 之效益分析 33 4.2.3 綜合評估 34 4.3 切片輔助推論 (SAHI) 策略之效益分析 36 4.4 與主流模型之綜合比較 40 4.5 視覺化結果與分析 42 第五章結論與未來展望 47 5.1 結論 47 5.2 未來展望 48 參考資料 49	-
dc.language.iso	zh_TW	-
dc.subject	無人機	-
dc.subject	物件偵測	-
dc.subject	YOLOv11	-
dc.subject	小物件偵測	-
dc.subject	BiFPN	-
dc.subject	SPD-Conv	-
dc.subject	SAHI	-
dc.subject	UAV	-
dc.subject	Object Detection	-
dc.subject	YOLOv11	-
dc.subject	Small Object Detection	-
dc.subject	BiFPN	-
dc.subject	SPD-Conv	-
dc.subject	SAHI	-
dc.title	無人機空拍影像之小物件偵測優化	zh_TW
dc.title	Optimization of Small Object Detection in UAV Aerial Imagery	en
dc.type	Thesis	-
dc.date.schoolyear	114-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張恆華;陳昭宏;陳彥廷;謝傳璋	zh_TW
dc.contributor.oralexamcommittee	Herng-Hua Chang;Jau-Horng Chen;Yen-Ting Chen;Chuan-Cheung Tse	en
dc.subject.keyword	無人機,物件偵測YOLOv11小物件偵測BiFPNSPD-ConvSAHI	zh_TW
dc.subject.keyword	UAV,Object DetectionYOLOv11Small Object DetectionBiFPNSPD-ConvSAHI	en
dc.relation.page	51	-
dc.identifier.doi	10.6342/NTU202600891	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-04-10	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	工程科學及海洋工程學系	-
dc.date.embargo-lift	2026-05-01	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	4.37 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。