Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286
Title: 無人機空拍影像之小物件偵測優化
Optimization of Small Object Detection in UAV Aerial Imagery
Authors: 劉品佑
Pin-Yu Liu
Advisor: 丁肇隆
Chao-Lung Ting
Keyword: 無人機,物件偵測YOLOv11小物件偵測BiFPNSPD-ConvSAHI
UAV,Object DetectionYOLOv11Small Object DetectionBiFPNSPD-ConvSAHI
Publication Year : 2026
Degree: 碩士
Abstract: 無人機(UAV)空拍影像具有視野廣闊、背景複雜及目標尺度微小等特性,使得傳統物件偵測模型在此類場景下常面臨特徵流失與漏檢率高的挑戰。針對此問題,本研究提出了一種基於 YOLOv11 的改良偵測框架,旨在提升無人機視角下的小物件偵測效能。
研究首先在模型架構上進行優化,將原有的特徵融合層替換為雙向特徵金字塔網路(BiFPN),引入可學習的權重機制以強化跨尺度特徵的整合;同時,在骨幹網路淺層引入空間-深度轉換卷積(SPD-Conv),以無損下採樣方式替代傳統步長卷積,有效保留了微小目標的細粒度資訊。其次,針對高解析度影像處理,本研究在推論階段整合了切片輔助超推論(SAHI)策略,透過滑動視窗切片與非極大值合併(NMM)技術,進一步最大化小物件的偵測召回率。
實驗結果顯示,在具挑戰性的 VisDrone-DET2019 基準資料集上,本研究提出的改良模型在標準全圖推論模式下的 mAP@50 達到 50.40%,優於基線模型 YOLOv11s 的 40.31%;在導入 SAHI 策略後,mAP@50 更大幅提升至 61.82%。與現有的先進模型相比,本方法在僅使用 12.02M 參數量的輕量化基礎上,達到了與重型模型(如 DRONE-YOLO,76.2M 參數)相當的偵測精度,證實了本研究在精度與運算效率之間取得了極佳的平衡,極具實際應用潛力。
UAV imagery is characterized by wide fields of view, complex backgrounds, and minute target scales, posing significant challenges for traditional object detection models regarding feature loss and missed detections. This study proposes an improved YOLOv11 framework to enhance small object detection in drone-view scenarios.
The architecture is optimized by replacing the feature fusion layer with a Bidirectional Feature Pyramid Network (BiFPN) to strengthen cross-scale feature integration. Simultaneously, Space-to-Depth Convolution (SPD-Conv) is introduced into the backbone’s shallow layers, utilizing lossless downsampling to preserve fine-grained information. To address high-resolution imagery, the Slicing Aided Hyper Inference (SAHI) strategy is integrated during inference, employing sliding window slicing and Non-Maximum Merging (NMM) to maximize recall.
Experimental results on the VisDrone-DET2019 benchmark show that the proposed model achieves an mAP@50 of 50.40% in full-frame mode, outperforming the baseline YOLOv11s (40.31%). With SAHI, the mAP@50 further increases to 61.82%. Compared to state-of-the-art models, this method achieves accuracy comparable to heavy models (e.g., DRONE-YOLO) while using only 12.02M parameters, demonstrating a superior balance between accuracy and computational efficiency.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102286
DOI: 10.6342/NTU202600891
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2026-05-01
Appears in Collections:工程科學及海洋工程學系

Files in This Item:
File SizeFormat 
ntu-114-2.pdf
Access limited in NTU ip range
4.37 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved