弱監督式影像語意分割之少樣本學習

Yuan-Hao Lee; 李元顥

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56295

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強(Yu-Chiang Frank Wang)
dc.contributor.author	Yuan-Hao Lee	en
dc.contributor.author	李元顥	zh_TW
dc.date.accessioned	2021-06-16T05:22:18Z	-
dc.date.available	2021-01-20
dc.date.copyright	2020-08-04
dc.date.issued	2020
dc.date.submitted	2020-07-27
dc.identifier.citation	[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. 1 [2] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. 1, 5 [3] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. 1, 5 [4] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014. 1, 5 [5] ——, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017. 1, 5 [6] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017. 1, 5 [7] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818. 1, 5, 12, 16 [8] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1126–1135. 1, 5 [9] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net, 2017. 1, 5 [10] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in International conference on machine learning, 2016, pp. 1842–1850. 1, 5 [11] M. Siam, B. N. Oreshkin, and M. Jagersand, “Amp: Adaptive masked proxies for few-shot segmentation,” in The IEEE International Conference on Computer Vision (ICCV), 2019. iii, 1, 3, 17, 18, 20 [12] K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “Panet: Few-shot image semantic segmentation with prototype alignment,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9197–9206. iii, 1, 2, 3, 6, 9, 15, 17, 18, 19 [13] C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5217–5226. iii, 1, 2, 3, 6, 9, 15, 20 [14] K. Nguyen and S. Todorovic, “Feature weighting and boosting for few-shot segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 622–631. iii, 1, 3, 6, 15, 20 [15] K. Rakelly, E. Shelhamer, T. Darrell, A. Efros, and S. Levine, “Conditional networks for few-shot semantic segmentation,” 2018. 2, 6, 15 [16] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929. 2, 9 [17] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626. 2 [18] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003. 2 [19] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. 2 [20] M. Siam, N. Doraiswamy, B. N. Oreshkin, H. Yao, and M. Jagersand, “Weakly supervised few-shot object segmentation using co-attention with visual and semantic inputs,” arXiv preprint arXiv:2001.09540, 2020. iii, v, 3, 6, 17, 18 [21] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890. 5 [22] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. 5 [23] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu, “Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation,” arXiv preprint arXiv:1903.11816, 2019. 5 [24] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015. 5 [25] A. Vezhnevets and J. M. Buhmann, “Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 3249–3256. 5 [26] A. Vezhnevets, V. Ferrari, and J. M. Buhmann, “Weakly supervised structured output learning for semantic segmentation,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 845–852. 5 [27] L. Zhang, M. Song, Z. Liu, X. Liu, J. Bu, and C. Chen, “Probabilistic graphlet cut: Exploiting spatial structure cue for weakly supervised image segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1908–1915. 5 [28] W. Xie, Y. Peng, and J. Xiao, “Weakly-supervised image parsing via constructing semantic graphs and hypergraphs,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 277–286. 5 [29] N. Pourian, S. Karthikeyan, and B. S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1359–1367. 5 [30] J. Xu, A. G. Schwing, and R. Urtasun, “Tell me what you see and i will show you where it is,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3190–3197. 5 [31] ——, “Learning to segment under various forms of weak supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3781–3790. 5 [32] W. Zhang, S. Zeng, D. Wang, and X. Xue, “Weakly supervised semantic segmentation for social images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2718–2726. 5 [33] M. Fink, “Object classification from a single example utilizing class relevance metrics,” in Advances in neural information processing systems, 2005, pp. 449–456. 5 [34] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, pp. 594–611, 2006. 5 [35] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in neural information processing systems, 2017, pp. 4077–4087. 6, 14 [36] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208. 6 [37] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in British Machine Vision Conference 2017, BMVC 2017. BMVA Press, 2017. 6, 15, 20 [38] K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Few-shot segmentation propagation with guided networks,” arXiv preprint arXiv:1806.07373, 2018. 6 [39] X. Zhang, Y. Wei, Y. Yang, and T. Huang, “Sg-one: Similarity guidance network for one-shot semantic segmentation,” arXiv preprint arXiv:1810.09091, 2018. 6, 15 [40] N. Dong and E. Xing, “Few-shot semantic segmentation with prototype learning.” in BMVC, vol. 3, no. 4, 2018. 6, 15 [41] H. Raza, M. Ravanbakhsh, T. Klein, and M. Nabi, “Weakly supervised one shot segmentation,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0. 6 [42] M. Bucher, V. Tuan-Hung, M. Cord, and P. Perez, “Zero-shot semantic segmentation,” in Advances in Neural Information Processing Systems, 2019, pp. 466–477. 6 [43] Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7014–7023. 9 [44] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 9, 16 [45] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015. 9, 16 [46] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010. 15 [47] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in 2011 International Conference on Computer Vision. IEEE, 2011, pp. 991–998. 15 [48] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision. Springer, 2014, pp. 740–755. 15 [49] T. Hu, P. Yang, C. Zhang, G. Yu, Y. Mu, and C. G. Snoek, “Attention-based multi-context guiding for few-shot semantic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 8441–8448. 15, 20 [50] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. 16 [51] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3203–3212. 16
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56295	-
dc.description.abstract	在深度學習技術的幫助之下，近年來語意分割模型的辨識率已經得到了大幅度的提升。然而，由於語意分割的目標為替每一個像素分類，此類模型的訓練必須依靠大量含有完整像素資訊（即類別遮罩）的影像樣本。當某些特定類別的訓練影像不足時，如何正確地分辨這些類別的像素便相當具有挑戰性。為了解決前述的問題，我們在本篇論文提出一稱為「弱監督式少樣本語意分割」之學習設定，限制模型在訓練及測試時皆僅能使用影像以及其本身含有的類別名稱（而非每個像素的類別遮罩）。藉由分析輸入影像以及其包含類別名稱之語意標籤，我們首先對每個訓練樣本產生成其偽類別遮罩以做為後續學習的監督來源。接著，我們提出一個可透過偽類別遮罩訓練之元學習少樣本語意分割模型以產生最終之辨識結果。從本篇論文的實驗可得知，我們的架構使用標準資料集在提出的弱監督式設定下大幅超越先前頂尖水準的辨識率，且在常用的全監督式設定下也有良好的結果。	zh_TW
dc.description.abstract	Although promising results have been achieved by recent deep learning models for few-shot semantic segmentation, most of them require a large amount of data with pixel-level ground truth labels (i.e., masks) for training. However, if such ground truth information is not sufficient for particular image categories, learning to segment such images becomes an even more challenging task. To address the above problem, we propose a novel learning framework in an extremely weakly supervised setting, where only image-level labels are observed during both training and testing (but not pixel-level masks). By observing the input image and its semantic label, we first generate its pseudo pixel-wise semantic mask, which guides the learning of our meta-trained architecture for segmentation purposes. Through extensive experiments on benchmark datasets, we show that our model achieves satisfactory performances under fully supervised settings, while performing favorably against state-of-the-art methods under weakly supervised settings.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T05:22:18Z (GMT). No. of bitstreams: 1 U0001-2707202014412100.pdf: 4134280 bytes, checksum: ec10b01443cc0ba06257b2d3fcb44a43 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Abstract i List of Figures iii List of Tables v 1 Introduction 1 2 Related Work 5 3 Proposed Method 8 3.1 Notation and Problem Formulation 8 3.2 Pseudo Pixel-Level Label Generation 9 3.3 Meta-Learning for Extremely Weakly Supervised Few-Shot Segmentation 12 4 Experiments 15 4.1 Comparison with State-of-the-art Methods 17 4.2 Analysis of Our Proposed Method 20 5 Conclusion 26 Reference 27
dc.language.iso	en
dc.subject	機器學習	zh_TW
dc.subject	弱監督式學習	zh_TW
dc.subject	語意分割	zh_TW
dc.subject	人工智慧	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	電腦視覺	zh_TW
dc.subject	少樣本學習	zh_TW
dc.subject	Weakly Supervised Learning	en
dc.subject	Deep Learning	en
dc.subject	Artificial Intelligence	en
dc.subject	Machine Learning	en
dc.subject	Few-Shot Learning	en
dc.subject	Semantic Segmentation	en
dc.subject	Computer Vision	en
dc.title	弱監督式影像語意分割之少樣本學習	zh_TW
dc.title	Extremely Weakly Supervised Few-Shot Semantic Segmentation	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.author-orcid	0000-0001-8427-1104
dc.contributor.advisor-orcid	王鈺強(0000-0002-2333-157X)
dc.contributor.oralexamcommittee	邱維辰(Wei-Chen Chiu),林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee-orcid	邱維辰(0000-0001-7715-8306),林彥宇(0000-0002-7183-6070)
dc.subject.keyword	電腦視覺,深度學習,人工智慧,機器學習,少樣本學習,語意分割,弱監督式學習,	zh_TW
dc.subject.keyword	Computer Vision,Deep Learning,Artificial Intelligence,Machine Learning,Few-Shot Learning,Semantic Segmentation,Weakly Supervised Learning,	en
dc.relation.page	33
dc.identifier.doi	10.6342/NTU202001909
dc.rights.note	有償授權
dc.date.accepted	2020-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-2707202014412100.pdf 未授權公開取用	4.04 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。