具注意力機制之混合任務級聯模型—實例分割之新框架

Ricardo Manzanedo

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84594

標題:	具注意力機制之混合任務級聯模型—實例分割之新框架 Hybrid Task Cascade with Attention: A New Framework for Instance Segmentation
作者:	Ricardo Manzanedo
指導教授:	謝宏昀(Hung-Yun Hsieh)
關鍵字:	機器視覺,實例分割,混合任務級聯 (Hybrid Task Cascade),注意力機制 (attention),變換器 (transformer), Instance Segmentation,Hybrid Task Cascade (HTC),Attention,
出版年 :	2022
學位:	碩士
摘要:	在需要能即時、有效檢測和物件分割的機器視覺任務，如自動駕駛汽車、行人跟蹤和接觸者追蹤檢測等，混合任務級聯 (HTC, Hybrid Task Cascade) 取得了里程碑的進展與諸多成功應用。近代的實例分割研究更多地集中在主幹模組、資料增強技術和基於轉換器的架構上。這些模型變得越來越複雜，因此也變得越來越慢。在本論文中，我們的目標是顯著降低所提出模型的大小和複雜性，但得同時保持 HTC的最佳表現。與當前領先的實例分割技術相比，我們發現基於 HTC 架構中設計較無效率之處，因此，我們提出了帶注意機制的混合任務級聯 (HTCA) 框架，並在其中測試了三種不同的設計。在三種實驗設計中，將注意力機制嵌入反卷積層的混合任務級聯 (HTCA-D) 表現最好。HTCA-D 集成了最先進的檢測器 EfficientDet作為主幹，為基於傳統 HTC 遮罩分割器的分支。分割任務也透過新模塊的合併能更聚焦於物件上。透過基準資料集的驗證比較，輕量化的 HTCA 不僅減少了使用的參數量，同時還能提高了目標偵測品質。使用 HTCA，我們在 COCO 資料集上增加 1.3 個遮罩 AP。 Hybrid Task Cascade (HTC) for instance segmentation has recently gained enormous interest in the computer vision community in the domains that require eﬀective detection and segmentation in real-time, such as the self-driving car, pedestrian tracking, and contact tracing. Recent instance segmentation research focuses more on the backbone, data augmentation, and transformer-based architectures. These models became more and more complex and consequently slower. In this thesis, we aim to signiﬁcantly reduce the size and complexity of the proposed model while maintaining the state-of-the-art performance of HTC. We have found ineﬃcient designs in the previous HTC-based architectures compared to the leading-edge developments. Therefore, we tested three diﬀerent designs in the proposed Hybrid Task Cascade with Attention (HTCA) framework. Among the three experimental designs, Hybrid Task Cascade with Attention in the deconvolutional layer (HTCA-D) appears to be the best, the proposed HTCA-D is a novel network that integrates the state-of-the-art detector EﬃcientDet as the backbone, followed by the segmentation branch based on the conventional HTC mask head. The segmentation task is also renovated by incorporating a new module to focus more on the object. Our method reduces the number of FLOPs by more than 30% and it uses almost 75% less memory than the original version of HTC. It helps to save time and energy consumption during training and inference. Through validation with benchmark datasets, the lightweight HTCA not only reduces the number of parameters used but also enhances the object detection quality at the same time. Using HTCA, we surpass our baseline mask AP and bounding box AP by +1.3 points in each task on the COCO dataset.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84594
DOI:	10.6342/NTU202203163
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2022-09-27
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-0509202217144200.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	5.17 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。