Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96282
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李明穗zh_TW
dc.contributor.advisorMing-Sui Leeen
dc.contributor.author郭毅遠zh_TW
dc.contributor.authorI-Yuan Kuoen
dc.date.accessioned2024-12-24T16:08:45Z-
dc.date.available2024-12-25-
dc.date.copyright2024-12-24-
dc.date.issued2024-
dc.date.submitted2024-12-18-
dc.identifier.citation[1] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

[2] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.

[3] B. Cheng, M. D. Collins, Y. Zhu, T. Liu, T. S. Huang, H. Adam, and L.-C. Chen. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12475–12485, 2020.

[4] B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.

[5] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computervision, pages 764–773, 2017.

[6] O. Elharrouss, S. Al-Maadeed, N. Subramanian, N. Ottakath, N. Almaadeed, and Y. Himeur. Panoptic segmentation: A review. arXiv preprint arXiv:2111.10250, 2021.

[7] J. He, P. Li, Y. Geng, and X. Xie. Fastinst: A simple query-based model for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23663–23672, 2023.

[8] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

[10] A. Kirillov, R. Girshick, K. He, and P. Dollár. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6399–6408, 2019.

[11] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár. Panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9404–9413, 2019.

[12] H. W. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.

[13] F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13619–13627, 2022.

[14] F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y. Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3041–3050, 2023.

[15] Y. Li, X. Chen, Z. Zhu, L. Xie, G. Huang, D. Du, and X. Wang. Attention-guided unified network for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7026–7035, 2019.

[16] Z. Li, W. Wang, E. Xie, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, and T. Lu. Panoptic segformer: Delving deeper into panoptic segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1280–1289, 2022.

[17] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.

[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ComputerVision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.

[19] S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang. Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329,2022.

[20] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

[21] Y. Rao, G. Chen, J. Lu, and J. Zhou. Counterfactual attention learning for fine-grained visual categorization and re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1025–1034, 2021.

[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

[23] C. Wang, H. Liao, and I. Yeh. Designing network design strategies through gradient path analysis. arxiv 2022. arXiv preprint arXiv:2211.04800.

[24] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pages 13029–13038, 2021.

[25] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7464–7475, 2023.

[26] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao. You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206, 2021.

[27] H. Wang, Y. Zhu, H. Adam, A. Yuille, and L.-C. Chen. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5463–5474, 2021.

[28] X. Wang, T. Kong, C. Shen, Y. Jiang, and L. Li. Solo: Segmenting objects by locations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 649–665. Springer, 2020.

[29] X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen. Solov2: Dynamic and fast instance segmentation. Advances in Neural information processing systems, 33:17721–17732, 2020.

[30] Y. Wu, G. Zhang, H. Xu, X. Liang, and L. Lin. Auto-panoptic: Cooperative multi-component architecture search for panoptic segmentation. Advances in neural information processing systems, 33:20508–20519, 2020.

[31] Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, and R. Urtasun. Upsnet: A unified panoptic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8818–8826, 2019.

[32] Q. Yu, H. Wang, D. Kim, S. Qiao, M. Collins, Y. Zhu, H. Adam, A. Yuille, and L.-C. Chen. Cmt-deeplab: Clustering mask transformers for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2560–2570, 2022.

[33] Q. Yu, H. Wang, S. Qiao, M. Collins, Y. Zhu, H. Adam, A. Yuille, and L.-C. Chen. kmax-deeplab: k-means mask transformer. arXiv preprint arXiv:2207.04044, 2022.

[34] H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96282-
dc.description.abstract本研究針對全景分割任務提出了一種新穎且高效的方法,全景分割任務旨在精確區分圖像中的所有前景與背景類別,並辨識同類別中的不同個體。現有方法在邊界預測精度不佳與重複預測問題上仍存在諸多挑戰,且許多先進方法依賴於龐大的網路架構,需大量運算資源,難以滿足實際應用需求。基於YOLOv7與FastInst的架構,本研究提出三項核心改進策略:(1) Tasks Integration,透過整合多任務的方法,解決傳統CNN-based方法邊界預測不精確問題;(2) Segmentation-based Proposal Strategy,有效避免Query-based架構中因冗餘proposals導致的重複預測;(3) Segmentation-based Intra and Counterfactual Loss,提升特徵的一致性與鑑別性,同時排除潛在誤導性特徵的影響。實驗結果表明,提出的方法顯著提升了模型的預測品質,為全景分割任務提供了一種兼具精度與效率的解決方案。zh_TW
dc.description.abstractThis study presents a novel and efficient framework for panoptic segmentation, a task aimed at accurately delineating all foreground and background categories in an image while distinguishing individual instances within the same category. Current approaches face persistent challenges, including imprecise boundary predictions and redundant proposals resulting in duplicate predictions. Moreover, many state-of-the-art methods rely on resource-intensive network architectures, making them less practical for real-world applications. Building on the architectures of YOLOv7 and FastInst, this research introduces three core advancements: (1) Tasks Integration, which unifies multi-task learning to address boundary prediction inaccuracies inherent to traditional CNN-based methods; (2) Segmentation-based Proposal Strategy, which effectively mitigates duplicate predictions by addressing redundancy in Query-based architectures; and (3) Segmentation-based Intra and Counterfactual Loss, which enhances feature consistency and discriminability while suppressing the influence of misleading features. Experimental evaluations demonstrate that the proposed methodology achieves substantial improvements in prediction quality, offering a robust and efficient solution for panoptic segmentation tasks.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-12-24T16:08:45Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-12-24T16:08:45Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsMaster’s Thesis Acceptance Certificate i
Acknowledgements ii
摘要 iii
Abstract iv
Contents vi
List of Figures viii
List of Tables ix

Chapter 1 Introduction 1

Chapter 2 Related Works 3
2.1 Panoptic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 YOLOv7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 FastInst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Counterfactual Attention Learning . . . . . . . . . . . . . . . . . . . 9

Chapter 3 Approach 11
3.1 Fundamental Problems in Panoptic Segmentation . . . . . . . . . . . 11
3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Tasks Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Segmentation-based Proposal Strategy . . . . . . . . . . . . . . . . . 17
3.5 Segmentation-based Intra and Counterfactual Loss . . . . . . . . . . 21
3.5.1 Segmentation-based Intra Loss . . . . . . . . . . . . . . . . . . . . 22
3.5.2 Segmentation-based Counterfactual Loss . . . . . . . . . . . . . . . 23

Chapter 4 Experiments 25
4.1 Implementation Details and Evaluation Protocols . . . . . . . . . . . 25
4.2 Results and Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 5 Conclusions 32

References 34
-
dc.language.isozh_TW-
dc.subject深度學習zh_TW
dc.subject全景分割zh_TW
dc.subject任務整合zh_TW
dc.subject圖像分割zh_TW
dc.subject反事實注意力zh_TW
dc.subjectCounterfactual Attentionen
dc.subjectDeep Learningen
dc.subjectSegmentationen
dc.subjectPanoptic Segmentationen
dc.subjectTasks Integrationen
dc.title基於任務整合與分割策略的YOLOv7全景分割系統zh_TW
dc.titlePanoptic Segmentation via Tasks Integration and Segmentation-based Strategies on YOLOv7en
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.coadvisor廖弘源zh_TW
dc.contributor.coadvisorHong-Yuan Liaoen
dc.contributor.oralexamcommittee王建堯zh_TW
dc.contributor.oralexamcommitteeChien-Yao Wangen
dc.subject.keyword全景分割,任務整合,反事實注意力,深度學習,圖像分割,zh_TW
dc.subject.keywordPanoptic Segmentation,Tasks Integration,Counterfactual Attention,Deep Learning,Segmentation,en
dc.relation.page38-
dc.identifier.doi10.6342/NTU202404744-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-12-18-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf4.84 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved