Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84594
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor謝宏昀(Hung-Yun Hsieh)
dc.contributor.authorRicardo Manzanedoen
dc.date.accessioned2023-03-19T22:16:54Z-
dc.date.copyright2022-09-27
dc.date.issued2022
dc.date.submitted2022-09-19
dc.identifier.citation[1] Kaiming He, Georgia Gkioxari, Piotr Dollr, and Ross Girshick. Mask r-cnn,2017. [2] Youngwan Lee and Jongyoul Park. Centermask : Real-time anchor-free in-stance segmentation, 2020. [3] Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, ShuyangSun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen ChangeLoy, and Dahua Lin. Hybrid task cascade for instance segmentation, 2019. [4] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, StephenLin, and Baining Guo. Swin transformer: Hierarchical vision transformerusing shifted windows, 2021. [5] Tingting Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, WeiChu, Jingdong Chen, and Haibin Ling. Cbnetv2: A composite backbonenetwork architecture for object detection, 2021. [6] Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, JiaNing, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swintransformer v2: Scaling up capacity and resolution. 2021. [7] Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, FangyunWei, Xiang Bai, and Zicheng Liu. End-to-end semi-supervised object detec-tion with soft teacher, 2021. [8] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D.Cubuk, Quoc V. Le, and Barret Zoph. Simple copy-paste is a strong dataaugmentation method for instance segmentation, 2020. [9] Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li,and Cewu Lu. Instaboost: Boosting instance segmentation via probabilitymap guided copy-pasting, 2019. [10] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all youneed, 2017. [11] Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao,Lu Yuan, and Jianfeng Gao. Focal self-attention for local-global interactionsin vision transformers, 2021. [12] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, MarkusEnzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.The cityscapes dataset for semantic urban scene understanding. In Proc. ofthe IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016. [13] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutionalnetworks for semantic segmentation, 2014. [14] Bowen Cheng, Alexander G. Schwing, and Alexander Kirillov. Per-pixel clas-sification is not all you need for semantic segmentation, 2021. [15] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn,and A. Zisserman. The pascal visual object classes challenge: A retrospective.International Journal of Computer Vision, 111(1):98–136, jan 2015. [16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: ALarge-Scale Hierarchical Image Database. In CVPR09, 2009. [17] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan,and Serge Belongie. Feature pyramid networks for object detection, 2017. [18] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all youneed, 2017. [19] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An imageis worth 16x16 words: Transformers for image recognition at scale, 2020. [20] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregationnetwork for instance segmentation, 2018. [21] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: To-wards real-time object detection with region proposal networks, 2015. [22] Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, andRohit Girdhar. Masked-attention mask transformer for universal image seg-mentation, 2021. [23] Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni,and Heung-Yeung Shum. Mask dino: Towards a unified transformer-basedframework for object detection and segmentation, 2022. [24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residuallearning for image recognition, 2015. [25] Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan,Yin Cui, Quoc V. Le, and Xiaodan Song. Spinenet: Learning scale-permutedbackbone for recognition and localization, 2020. [26] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling forconvolutional neural networks, 2020. [27] Mingxing Tan, Ruoming Pang, and Quoc V. Le. Efficientdet: Scalable andefficient object detection, 2019. [28] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler,Andrew Howard, and Quoc V. Le. Mnasnet: Platform-aware neural architec-ture search for mobile, 2019. [29] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcementlearning, 2017. [30] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, andLiang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks.2018. [31] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residuallearning for image recognition, 2015. [32] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan,and Serge Belongie. Feature pyramid networks for object detection, 2017. [33] Jifeng Dai, Kaiming He, and Jian Sun. Instance-aware semantic segmentationvia multi-task network cascades, 2015. [34] Ross Girshick. Fast r-cnn, 2015. [35] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai.Deformable detr: Deformable transformers for end-to-end object detection,2020. [36] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár.Focal loss for dense object detection, 2017. [37] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Gir-shick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, andPiotr Dollár. Microsoft coco: Common objects in context, 2015. [38] FAIR. Detectron 2 repository. [Last Access] 2022-07-29. [39] Youngwan Lee. Centermask repository. [Last Access] 2022-07-29.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84594-
dc.description.abstract在需要能即時、有效檢測和物件分割的機器視覺任務,如自動駕駛汽車、行人跟蹤和接觸者追蹤檢測等,混合任務級聯 (HTC, Hybrid Task Cascade) 取得了里程碑的進展與諸多成功應用。近代的實例分割研究更多地集中在主幹模組、資料增強技術和基於轉換器的架構上。這些模型變得越來越複雜,因此也變得越來越慢。在本論文中,我們的目標是顯著降低所提出模型的大小和複雜性,但得同時保持 HTC的最佳表現。與當前領先的實例分割技術相比,我們發現基於 HTC 架構中設計較無效率之處,因此,我們提出了帶注意機制的混合任務級聯 (HTCA) 框架,並在其中測試了三種不同的設計。在三種實驗設計中,將注意力機制嵌入反卷積層的混合任務級聯 (HTCA-D) 表現最好。HTCA-D 集成了最先進的檢測器 EfficientDet作為主幹,為基於傳統 HTC 遮罩分割器的分支。分割任務也透過新模塊的合併能更聚焦於物件上。透過基準資料集的驗證比較,輕量化的 HTCA 不僅減少了使用的參數量,同時還能提高了目標偵測品質。使用 HTCA,我們在 COCO 資料集上增加 1.3 個遮罩 AP。zh_TW
dc.description.abstractHybrid Task Cascade (HTC) for instance segmentation has recently gained enormous interest in the computer vision community in the domains that require effective detection and segmentation in real-time, such as the self-driving car, pedestrian tracking, and contact tracing. Recent instance segmentation research focuses more on the backbone, data augmentation, and transformer-based architectures. These models became more and more complex and consequently slower. In this thesis, we aim to significantly reduce the size and complexity of the proposed model while maintaining the state-of-the-art performance of HTC. We have found inefficient designs in the previous HTC-based architectures compared to the leading-edge developments. Therefore, we tested three different designs in the proposed Hybrid Task Cascade with Attention (HTCA) framework. Among the three experimental designs, Hybrid Task Cascade with Attention in the deconvolutional layer (HTCA-D) appears to be the best, the proposed HTCA-D is a novel network that integrates the state-of-the-art detector EfficientDet as the backbone, followed by the segmentation branch based on the conventional HTC mask head. The segmentation task is also renovated by incorporating a new module to focus more on the object. Our method reduces the number of FLOPs by more than 30% and it uses almost 75% less memory than the original version of HTC. It helps to save time and energy consumption during training and inference. Through validation with benchmark datasets, the lightweight HTCA not only reduces the number of parameters used but also enhances the object detection quality at the same time. Using HTCA, we surpass our baseline mask AP and bounding box AP by +1.3 points in each task on the COCO dataset.en
dc.description.provenanceMade available in DSpace on 2023-03-19T22:16:54Z (GMT). No. of bitstreams: 1
U0001-0509202217144200.pdf: 5289038 bytes, checksum: fe17eecb637a3e7a1f44cf2c70b49275 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontentsABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BACKGROUND AND RELATED WORK . . . . . 4 2.1 Introduction to Segmentation . . . . . . . . . . . . . . . . . . . . . 4 2.2 General Architecture of Instance Segmentation . . . . . . . . . . . 5 2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Backbones . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Necks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Mask Head . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 CHAPTER 3 MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 3.2 Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 3.3 Mask Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 Mask R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.2 CenterMask . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 HTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.4 MaskFormer . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.5 Mask2Former . . . . . . . . . . . . . . . . . . . . . . . . . 30 CHAPTER 4 PROPOSED DEEP LEARNING ALGORITHM . . . . . . . . . . . . . 33 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 HTCA-IF: HTC with an attention module in the mask information flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 HTCA-C: HTC with an attention module in after the convolutional layers 39 4.4 HTCA-D: HTC with an attention module before the deconvolutional layer . 40 CHAPTER 5 PERFORMANCE EVALUATION . . . . . . . . . . 42 5.1 Implementations Details . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 CHAPTER 6 CONCLUSION AND FUTURE WORK . . . . . . 49 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
dc.language.isoen
dc.subject變換器 (transformer)zh_TW
dc.subject機器視覺zh_TW
dc.subject實例分割zh_TW
dc.subject混合任務級聯 (Hybrid Task Cascade)zh_TW
dc.subject注意力機制 (attention)zh_TW
dc.subjectInstance Segmentationen
dc.subjectAttentionen
dc.subjectHybrid Task Cascade (HTC)en
dc.title具注意力機制之混合任務級聯模型—實例分割之新框架zh_TW
dc.titleHybrid Task Cascade with Attention: A New Framework for Instance Segmentationen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee藍俊宏(Jakey Blue),Jesus Fraile(Jesus Fraile),David Jimenez(David Jimenez)
dc.subject.keyword機器視覺,實例分割,混合任務級聯 (Hybrid Task Cascade),注意力機制 (attention),變換器 (transformer),zh_TW
dc.subject.keywordInstance Segmentation,Hybrid Task Cascade (HTC),Attention,en
dc.relation.page54
dc.identifier.doi10.6342/NTU202203163
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-09-21
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
dc.date.embargo-lift2022-09-27-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
U0001-0509202217144200.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
5.17 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved