Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56295
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王鈺強(Yu-Chiang Frank Wang)
dc.contributor.authorYuan-Hao Leeen
dc.contributor.author李元顥zh_TW
dc.date.accessioned2021-06-16T05:22:18Z-
dc.date.available2021-01-20
dc.date.copyright2020-08-04
dc.date.issued2020
dc.date.submitted2020-07-27
dc.identifier.citation[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. 1
[2] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. 1, 5
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. 1, 5
[4] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014. 1, 5
[5] ——, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017. 1, 5
[6] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017. 1, 5
[7] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818. 1, 5, 12, 16
[8] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1126–1135. 1, 5
[9] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net, 2017. 1, 5
[10] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in International conference on machine learning, 2016, pp. 1842–1850. 1, 5
[11] M. Siam, B. N. Oreshkin, and M. Jagersand, “Amp: Adaptive masked proxies for few-shot segmentation,” in The IEEE International Conference on Computer Vision (ICCV), 2019. iii, 1, 3, 17, 18, 20
[12] K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “Panet: Few-shot image semantic segmentation with prototype alignment,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9197–9206. iii, 1, 2, 3, 6, 9, 15, 17, 18, 19
[13] C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5217–5226. iii, 1, 2, 3, 6, 9, 15, 20
[14] K. Nguyen and S. Todorovic, “Feature weighting and boosting for few-shot segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 622–631. iii, 1, 3, 6, 15, 20
[15] K. Rakelly, E. Shelhamer, T. Darrell, A. Efros, and S. Levine, “Conditional networks for few-shot semantic segmentation,” 2018. 2, 6, 15
[16] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929. 2, 9
[17] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626. 2
[18] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003. 2
[19] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. 2
[20] M. Siam, N. Doraiswamy, B. N. Oreshkin, H. Yao, and M. Jagersand, “Weakly supervised few-shot object segmentation using co-attention with visual and semantic inputs,” arXiv preprint arXiv:2001.09540, 2020. iii, v, 3, 6, 17, 18
[21] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890. 5
[22] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. 5
[23] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu, “Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation,” arXiv preprint arXiv:1903.11816, 2019. 5
[24] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015. 5
[25] A. Vezhnevets and J. M. Buhmann, “Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 3249–3256. 5
[26] A. Vezhnevets, V. Ferrari, and J. M. Buhmann, “Weakly supervised structured output learning for semantic segmentation,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 845–852. 5
[27] L. Zhang, M. Song, Z. Liu, X. Liu, J. Bu, and C. Chen, “Probabilistic graphlet cut: Exploiting spatial structure cue for weakly supervised image segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1908–1915. 5
[28] W. Xie, Y. Peng, and J. Xiao, “Weakly-supervised image parsing via constructing semantic graphs and hypergraphs,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 277–286. 5
[29] N. Pourian, S. Karthikeyan, and B. S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1359–1367. 5
[30] J. Xu, A. G. Schwing, and R. Urtasun, “Tell me what you see and i will show you where it is,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3190–3197. 5
[31] ——, “Learning to segment under various forms of weak supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3781–3790. 5
[32] W. Zhang, S. Zeng, D. Wang, and X. Xue, “Weakly supervised semantic segmentation for social images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2718–2726. 5
[33] M. Fink, “Object classification from a single example utilizing class relevance metrics,” in Advances in neural information processing systems, 2005, pp. 449–456. 5
[34] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, pp. 594–611, 2006. 5
[35] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in neural information processing systems, 2017, pp. 4077–4087. 6, 14
[36] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208. 6
[37] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in British Machine Vision Conference 2017, BMVC 2017. BMVA Press, 2017. 6, 15, 20
[38] K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Few-shot segmentation propagation with guided networks,” arXiv preprint arXiv:1806.07373, 2018. 6
[39] X. Zhang, Y. Wei, Y. Yang, and T. Huang, “Sg-one: Similarity guidance network for one-shot semantic segmentation,” arXiv preprint arXiv:1810.09091, 2018. 6, 15
[40] N. Dong and E. Xing, “Few-shot semantic segmentation with prototype learning.” in BMVC, vol. 3, no. 4, 2018. 6, 15
[41] H. Raza, M. Ravanbakhsh, T. Klein, and M. Nabi, “Weakly supervised one shot segmentation,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0. 6
[42] M. Bucher, V. Tuan-Hung, M. Cord, and P. Perez, “Zero-shot semantic segmentation,” in Advances in Neural Information Processing Systems, 2019, pp. 466–477. 6
[43] Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7014–7023. 9
[44] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 9, 16
[45] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015. 9, 16
[46] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010. 15
[47] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in 2011 International Conference on Computer Vision. IEEE, 2011, pp. 991–998. 15
[48] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision. Springer, 2014, pp. 740–755. 15
[49] T. Hu, P. Yang, C. Zhang, G. Yu, Y. Mu, and C. G. Snoek, “Attention-based multi-context guiding for few-shot semantic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 8441–8448. 15, 20
[50] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. 16
[51] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3203–3212. 16
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56295-
dc.description.abstract在深度學習技術的幫助之下,近年來語意分割模型的辨識率已經得到了大幅度的提升。然而,由於語意分割的目標為替每一個像素分類,此類模型的訓練必須依靠大量含有完整像素資訊(即類別遮罩)的影像樣本。當某些特定類別的訓練影像不足時,如何正確地分辨這些類別的像素便相當具有挑戰性。為了解決前述的問題,我們在本篇論文提出一稱為「弱監督式少樣本語意分割」之學習設定,限制模型在訓練及測試時皆僅能使用影像以及其本身含有的類別名稱(而非每個像素的類別遮罩)。藉由分析輸入影像以及其包含類別名稱之語意標籤,我們首先對每個訓練樣本產生成其偽類別遮罩以做為後續學習的監督來源。接著,我們提出一個可透過偽類別遮罩訓練之元學習少樣本語意分割模型以產生最終之辨識結果。從本篇論文的實驗可得知,我們的架構使用標準資料集在提出的弱監督式設定下大幅超越先前頂尖水準的辨識率,且在常用的全監督式設定下也有良好的結果。zh_TW
dc.description.abstractAlthough promising results have been achieved by recent deep learning models for few-shot semantic segmentation, most of them require a large amount of data with pixel-level ground truth labels (i.e., masks) for training. However, if such ground truth information is not sufficient for particular image categories, learning to segment such images becomes an even more challenging task. To address the above problem, we propose a novel learning framework in an extremely weakly supervised setting, where only image-level labels are observed during both training and testing (but not pixel-level masks). By observing the input image and its semantic label, we first generate its pseudo pixel-wise semantic mask, which guides the learning of our meta-trained architecture for segmentation purposes. Through extensive experiments on benchmark datasets, we show that our model achieves satisfactory performances under fully supervised settings, while performing favorably against state-of-the-art methods under weakly supervised settings.en
dc.description.provenanceMade available in DSpace on 2021-06-16T05:22:18Z (GMT). No. of bitstreams: 1
U0001-2707202014412100.pdf: 4134280 bytes, checksum: ec10b01443cc0ba06257b2d3fcb44a43 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontentsAbstract i
List of Figures iii
List of Tables v
1 Introduction 1
2 Related Work 5
3 Proposed Method 8
3.1 Notation and Problem Formulation 8
3.2 Pseudo Pixel-Level Label Generation 9
3.3 Meta-Learning for Extremely Weakly Supervised Few-Shot Segmentation 12
4 Experiments 15
4.1 Comparison with State-of-the-art Methods 17
4.2 Analysis of Our Proposed Method 20
5 Conclusion 26
Reference 27
dc.language.isoen
dc.subject機器學習zh_TW
dc.subject弱監督式學習zh_TW
dc.subject語意分割zh_TW
dc.subject人工智慧zh_TW
dc.subject深度學習zh_TW
dc.subject電腦視覺zh_TW
dc.subject少樣本學習zh_TW
dc.subjectWeakly Supervised Learningen
dc.subjectDeep Learningen
dc.subjectArtificial Intelligenceen
dc.subjectMachine Learningen
dc.subjectFew-Shot Learningen
dc.subjectSemantic Segmentationen
dc.subjectComputer Visionen
dc.title弱監督式影像語意分割之少樣本學習zh_TW
dc.titleExtremely Weakly Supervised Few-Shot Semantic Segmentationen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.author-orcid0000-0001-8427-1104
dc.contributor.advisor-orcid王鈺強(0000-0002-2333-157X)
dc.contributor.oralexamcommittee邱維辰(Wei-Chen Chiu),林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee-orcid邱維辰(0000-0001-7715-8306),林彥宇(0000-0002-7183-6070)
dc.subject.keyword電腦視覺,深度學習,人工智慧,機器學習,少樣本學習,語意分割,弱監督式學習,zh_TW
dc.subject.keywordComputer Vision,Deep Learning,Artificial Intelligence,Machine Learning,Few-Shot Learning,Semantic Segmentation,Weakly Supervised Learning,en
dc.relation.page33
dc.identifier.doi10.6342/NTU202001909
dc.rights.note有償授權
dc.date.accepted2020-07-28
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
U0001-2707202014412100.pdf
  未授權公開取用
4.04 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved