Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor丁肇隆zh_TW
dc.contributor.advisorChao-Lung Tingen
dc.contributor.author劉品佑zh_TW
dc.contributor.authorPin-Yu Liuen
dc.date.accessioned2026-04-30T16:15:18Z-
dc.date.available2026-05-01-
dc.date.copyright2026-04-30-
dc.date.issued2026-
dc.date.submitted2026-04-10-
dc.identifier.citation[1] G. Jocher, J. Qiu, and A. Chaurasia, Ultralytics YOLO. (Jan. 2023). Python. [Online]. Available: https://github.com/ultralytics/ultralytics
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
[3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”.
[4] R. Girshick, “Fast R-CNN,” Sep. 27, 2015, arXiv: arXiv:1504.08083. doi: 10.48550/arXiv.1504.08083.
[5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Jan. 06, 2016, arXiv: arXiv:1506.01497. doi: 10.48550/arXiv.1506.01497.
[6] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into High Quality Object Detection,” Dec. 03, 2017, arXiv: arXiv:1712.00726. doi: 10.48550/arXiv.1712.00726.
[7] W. Liu et al., “SSD: Single Shot MultiBox Detector,” vol. 9905, 2016, pp. 21–37. doi: 10.1007/978-3-319-46448-0_2.
[8] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully Convolutional One-Stage Object Detection,” Aug. 20, 2019, arXiv: arXiv:1904.01355. doi: 10.48550/arXiv.1904.01355.
[9] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “CenterNet: Keypoint Triplets for Object Detection,” Apr. 19, 2019, arXiv: arXiv:1904.08189. doi: 10.48550/arXiv.1904.08189.
[10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” May 09, 2016, arXiv: arXiv:1506.02640. doi: 10.48550/arXiv.1506.02640.
[11] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” Apr. 08, 2018, arXiv: arXiv:1804.02767. doi: 10.48550/arXiv.1804.02767.
[12] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 23, 2020, arXiv: arXiv:2004.10934. doi: 10.48550/arXiv.2004.10934.
[13] G. Jocher, YOLOv5 by Ultralytics. (May 2020). Python. doi: 10.5281/zenodo.3908559.
[14] T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” Feb. 21, 2015, arXiv: arXiv:1405.0312. doi: 10.48550/arXiv.1405.0312.
[15] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, “A survey and performance evaluation of deep learning methods for small object detection,” Expert Syst. Appl., vol. 172, p. 114602, Jun. 2021, doi: 10.1016/j.eswa.2021.114602.
[16] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Apr. 19, 2017, arXiv: arXiv:1612.03144. doi: 10.48550/arXiv.1612.03144.
[17] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” Sep. 18, 2018, arXiv: arXiv:1803.01534. doi: 10.48550/arXiv.1803.01534.
[18] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and Efficient Object Detection,” Jul. 27, 2020, arXiv: arXiv:1911.09070. doi: 10.48550/arXiv.1911.09070.
[19] R. Sunkara and T. Luo, “No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects,” Aug. 07, 2022, arXiv: arXiv:2208.03641. doi: 10.48550/arXiv.2208.03641.
[20] A. V. Etten, “You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery,” May 24, 2018, arXiv: arXiv:1805.09512. doi: 10.48550/arXiv.1805.09512.
[21] F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection,” in 2022 IEEE International Conference on Image Processing (ICIP), Oct. 2022, pp. 966–970. doi: 10.1109/ICIP46576.2022.9897990.
[22] “What is Non-Max Merging?,” Roboflow Blog. [Online]. Available: https://blog.roboflow.com/non-max-merging/
[23] D. Du et al., “VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results”.
[24] N. Cohen, J. Gattuso, and K. MacLennan-Brown, CCTV operational requirements manual. St Albans: Home Office Scientific Development Branch, 2009.
[25] R. B. Miller, “Response time in man-computer conversational transactions,” in Proceedings of the December 9-11, 1968, fall joint computer conference, part I on - AFIPS ’68 (Fall, part I), San Francisco, California: ACM Press, 1968, p. 267. doi: 10.1145/1476589.1476628.
[26] M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma, and C. Piao, “UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective,” Sensors, vol. 20, no. 8, p. 2238, Jan. 2020, doi: 10.3390/s20082238.
[27] Z. Zhang, “Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images,” Drones, vol. 7, no. 8, p. 526, Aug. 2023, doi: 10.3390/drones7080526.
[28] Y. Li et al., “SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images,” Remote Sens., vol. 16, no. 16, p. 3057, Jan. 2024, doi: 10.3390/rs16163057.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286-
dc.description.abstract無人機(UAV)空拍影像具有視野廣闊、背景複雜及目標尺度微小等特性,使得傳統物件偵測模型在此類場景下常面臨特徵流失與漏檢率高的挑戰。針對此問題,本研究提出了一種基於 YOLOv11 的改良偵測框架,旨在提升無人機視角下的小物件偵測效能。
研究首先在模型架構上進行優化,將原有的特徵融合層替換為雙向特徵金字塔網路(BiFPN),引入可學習的權重機制以強化跨尺度特徵的整合;同時,在骨幹網路淺層引入空間-深度轉換卷積(SPD-Conv),以無損下採樣方式替代傳統步長卷積,有效保留了微小目標的細粒度資訊。其次,針對高解析度影像處理,本研究在推論階段整合了切片輔助超推論(SAHI)策略,透過滑動視窗切片與非極大值合併(NMM)技術,進一步最大化小物件的偵測召回率。
實驗結果顯示,在具挑戰性的 VisDrone-DET2019 基準資料集上,本研究提出的改良模型在標準全圖推論模式下的 mAP@50 達到 50.40%,優於基線模型 YOLOv11s 的 40.31%;在導入 SAHI 策略後,mAP@50 更大幅提升至 61.82%。與現有的先進模型相比,本方法在僅使用 12.02M 參數量的輕量化基礎上,達到了與重型模型(如 DRONE-YOLO,76.2M 參數)相當的偵測精度,證實了本研究在精度與運算效率之間取得了極佳的平衡,極具實際應用潛力。
zh_TW
dc.description.abstractUAV imagery is characterized by wide fields of view, complex backgrounds, and minute target scales, posing significant challenges for traditional object detection models regarding feature loss and missed detections. This study proposes an improved YOLOv11 framework to enhance small object detection in drone-view scenarios.
The architecture is optimized by replacing the feature fusion layer with a Bidirectional Feature Pyramid Network (BiFPN) to strengthen cross-scale feature integration. Simultaneously, Space-to-Depth Convolution (SPD-Conv) is introduced into the backbone’s shallow layers, utilizing lossless downsampling to preserve fine-grained information. To address high-resolution imagery, the Slicing Aided Hyper Inference (SAHI) strategy is integrated during inference, employing sliding window slicing and Non-Maximum Merging (NMM) to maximize recall.
Experimental results on the VisDrone-DET2019 benchmark show that the proposed model achieves an mAP@50 of 50.40% in full-frame mode, outperforming the baseline YOLOv11s (40.31%). With SAHI, the mAP@50 further increases to 61.82%. Compared to state-of-the-art models, this method achieves accuracy comparable to heavy models (e.g., DRONE-YOLO) while using only 12.02M parameters, demonstrating a superior balance between accuracy and computational efficiency.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-30T16:15:18Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2026-04-30T16:15:18Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要 i
Abstract ii
目次 iii
圖次 v
表次 vi
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 2
1.4 論文架構 2
第二章 文獻回顧 3
2.1 物件偵測發展 3
2.1.1 兩階段偵測器(Two-stage Detectors) 3
2.1.2 單階段偵測器(One-stage Detectors) 4
2.2 YOLO 系列演算法 5
2.3 小物件偵測之挑戰 6
2.4 雙向特徵金字塔網路 (BiFPN) 8
2.5 空間-深度轉換卷積 (SPD-Conv) 9
2.6 高解析度影像處理策略 11
2.6.1 訓練階段:影像切片訓練策略 11
2.6.2 推論階段:切片輔助超推論 (SAHI) 13
第三章 實驗方法 16
3.1 資料集選擇 16
3.2 改進之模型架構 19
3.2.1 YOLOv11 基線模型 20
3.2.2 BiFPN 應用於特徵融合機制之架構改良 21
3.2.3 SPD-Conv 應用於 Backbone 之改良 22
3.3 高解析度影像策略 24
3.3.1 切片訓練策略 24
3.3.2 SAHI 推論策略 26
第四章 實驗結果與分析 28
4.1 實驗環境與評估指標 28
4.1.1 實驗環境配置 28
4.1.2 評估指標 30
4.2 架構改良消融實驗分析 32
4.2.1 BiFPN 之效益分析 33
4.2.2 SPD-Conv 之效益分析 33
4.2.3 綜合評估 34
4.3 切片輔助推論 (SAHI) 策略之效益分析 36
4.4 與主流模型之綜合比較 40
4.5 視覺化結果與分析 42
第五章 結論與未來展望 47
5.1 結論 47
5.2 未來展望 48
參考資料 49
-
dc.language.isozh_TW-
dc.subject無人機-
dc.subject物件偵測-
dc.subjectYOLOv11-
dc.subject小物件偵測-
dc.subjectBiFPN-
dc.subjectSPD-Conv-
dc.subjectSAHI-
dc.subjectUAV-
dc.subjectObject Detection-
dc.subjectYOLOv11-
dc.subjectSmall Object Detection-
dc.subjectBiFPN-
dc.subjectSPD-Conv-
dc.subjectSAHI-
dc.title無人機空拍影像之小物件偵測優化zh_TW
dc.titleOptimization of Small Object Detection in UAV Aerial Imageryen
dc.typeThesis-
dc.date.schoolyear114-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee張恆華;陳昭宏;陳彥廷;謝傳璋zh_TW
dc.contributor.oralexamcommitteeHerng-Hua Chang;Jau-Horng Chen;Yen-Ting Chen;Chuan-Cheung Tseen
dc.subject.keyword無人機,物件偵測YOLOv11小物件偵測BiFPNSPD-ConvSAHIzh_TW
dc.subject.keywordUAV,Object DetectionYOLOv11Small Object DetectionBiFPNSPD-ConvSAHIen
dc.relation.page51-
dc.identifier.doi10.6342/NTU202600891-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2026-04-10-
dc.contributor.author-college工學院-
dc.contributor.author-dept工程科學及海洋工程學系-
dc.date.embargo-lift2026-05-01-
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-114-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
4.37 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved