Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97003
標題: BEVANet: 雙分支高效視覺注意力網路於即時語義分割
BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation
作者: 黃秉茂
Ping-Mao Huang
指導教授: 莊永裕
Yung-Yu Chuang
關鍵字: 電腦視覺,即時語義分割,大核注意力,自適應特徵融合,
Computer Vision,Real-time Semantic Segmentation,Large Kernel Attention,Adaptive Feature Fusion,
出版年 : 2025
學位: 碩士
摘要: 即時語義分割的發展面臨挑戰,在於設計高效的卷積神經網路(Convolutional Neural Network, CNN)或減少視覺轉換器(Vision Transformers, ViT)的計算量。儘管視覺轉換器具長距離依賴性優勢,但運算速度受限。即使大核卷積神經網路提供相似感受野,卻難以適應多尺度特徵與整合全局資訊。為了解決這些問題,我們引入了大核注意力機制(Large Kernel Attention, LKA)提出雙邊高效視覺注意力網路(Bilateral Efficient Visual Attention Network, BEVAN)。高效視覺注意力模組(Efficient Visual Attention, EVA)透過稀疏分解大可分離核注意力(Sparse Decomposed Large Separable Kernel Attentions, SDLSKA),結合區域卷積與條狀卷積與多條拓撲來擴展感受野,捕捉多尺度的全局概念及視覺與結構特徵。而全面核篩選模塊(Comprehensive Kernel Selection, CKS)可動態調整感受野,進一步提升效能。深層大核金字塔池化模組(Deep Large Kernel Pyramid Pooling Module, DLKPPM)結合擴張卷積(Dilated Convolution)與大核注意力機制豐富上下文特徵。雙邊架構(Bilateral Architecture)促進分支間的頻繁訊息交流,而邊界引導注意力融合模塊(Boundary Guided Attention Fusion, BGAF)透過邊界自適應地融合低階空間和高階語義,增強識別模糊邊界的能力。我們的模型無需預訓練即達到79.3%的mIoU,展示對大型預訓練數據集的低依賴性。而在ImageNet上預訓練後,mIoU提升至81.0%,在保持32即時幀率的同時,刷新了語義分割的標準。
The development of real-time semantic segmentation faces significant challenges in designing efficient convolutional neural network (CNN) architectures or minimizing the computational costs of vision transformers (ViTs) while maintaining real-time performance. Although ViTs excel at capturing long-range dependencies, their computational speed is often a bottleneck. Large-kernel CNNs offer similar receptive fields but struggle with multi-scale feature adaptation and global context integration. To overcome these limitations, we introduce the Large Kernel Attention mechanism. Our proposed Bilateral Efficient Visual Attention Network (BEVAN) integrates the Efficient Visual Attention (EVA) module, Deep Large Kernel Pyramid Pooling Module (DLKPPM), and Boundary Guided Attention Fusion (BGAF) module. The EVA models expands the receptive field to capture multi-scale contextual information and extracts visual and structural features using Sparse Decomposed Large Separable Kernel Attentions (SDLSKA) by combining regional and strip convolutions with diverse topological structures. The Comprehensive Kernel Selection (CKS) mechanism dynamically adapts the receptive field to further enhance performance. The Deep Large Kernel Pyramid Pooling Module (DLKPPM) enriches contextual features and extends the receptive field through a combination of dilated convolution and large kernel attention mechanisms, balancing performance and accuracy by refining features and improving semantic concept capture. The bilateral architecture facilitates frequent communication between branches, and the BGAF module uses the guidance of boundary information to adaptively merge low-level spatial features with high-level semantic features, enhancing the network's ability to accurately delineate blurred boundaries while retaining detailed contours and semantic context. Our model achieves a 79.3% mIoU without pretraining, indicating a low dependency on extensive pretraining datasets. After pretraining on ImageNet, the model further attains an 81.0% mIoU, setting a new state-of-the-art benchmark while maintaining real-time efficiency with a processing rate of 32 FPS.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97003
DOI: 10.6342/NTU202500107
全文授權: 同意授權(全球公開)
電子全文公開日期: 2025-09-18
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf1.84 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved