基於語義分割模型之台灣街景解析優化

Yung-Chieh Chang; 張詠絜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84485

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃乾綱(Chien-Kang Huang)
dc.contributor.author	Yung-Chieh Chang	en
dc.contributor.author	張詠絜	zh_TW
dc.date.accessioned	2023-03-19T22:13:06Z	-
dc.date.copyright	2022-09-29
dc.date.issued	2022
dc.date.submitted	2022-09-24
dc.identifier.citation	[1] I. R. I. Haque, J. Neubert, Deep learning approaches to biomedical image segmentation, Informatics in Medicine Unlocked (2020) 100297doi:https://doi.org/10.1016/j.imu.2020.100297. [2] J. Song, L. Xiao, M. Molaei, Z. Lian, Multi-layer boosting sparse convolutional model for generalized nuclear segmentation from histopathology images, Knowledge-Based Systems 176 (2019) 40 { 53. doi:https://doi.org/10.1016/j.knosys.2019.03.031. [3] B. De Brabandere, D. Neven, L. Van Gool, Semantic instance segmentation for autonomous driving, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017. [4] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollar, Panoptic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp. 9404{9413. [5] Krizhevsky, Alex, Ilya Sutskever and Geoffrey E. Hinton, ImageNet classification with deep convolutional neural networks, Communications of the ACM 60 (2012): 84 - 90. [6] H. Noh, S. Hong and B. Han, Learning Deconvolution Network for Semantic Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1520-1528, doi: 10.1109/ICCV.2015.178. [7] Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham. [8] V. Badrinarayanan, A. Kendall and R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 1 Dec. 2017, doi: 10.1109/TPAMI.2016.2644615. [9] L. -C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018, doi: 10.1109/TPAMI.2017.2699184. [10] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, Pyramid Scene Parsing Network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6230-6239, doi: 10.1109/CVPR.2017.660. [11] J. Hu, L. Shen and G. Sun, Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141, doi: 10.1109/CVPR.2018.00745. [12] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo and Q. Hu, ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11531-11539, doi: 10.1109/CVPR42600.2020.01155. [13] Jianbo Shi and J. Malik, 'Normalized cuts and image segmentation,' in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000, doi: 10.1109/34.868688. [14] J. Singh and Y. Sharma, “Encoder-Decoder Architectures for Generating Questions,” Procedia Comput. Sci., vol. 132, pp. 1041–1048, 2018. [15] J. Long, E. Shelhamer and T. Darrell, 'Fully convolutional networks for semantic segmentation,' 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440, doi: 10.1109/CVPR.2015.7298965. [16] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062. [17] Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. [18] Chen, LC., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_49 [19] S. Liu and W. Deng, 'Very deep convolutional neural network based image classification using small training sample size,' 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 2015, pp. 730-734, doi: 10.1109/ACPR.2015.7486599. [20] K. He, X. Zhang, S. Ren and J. Sun, 'Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,' in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015, doi: 10.1109/TPAMI.2015.2389824. [21] Bahdanau, D., Cho, K. H., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States. [22] Woo, S., Park, J., Lee, JY., Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1 [23] X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan and M. Jagersand, 'BASNet: Boundary-Aware Salient Object Detection,' 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7471-7481, doi: 10.1109/CVPR.2019.00766. [24] Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville and Yoshua Bengio. “Generative Adversarial Nets.” NIPS (2014). [25] X. Wu, Z. Wu, H. Guo, L. Ju and S. Wang, 'DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation,' 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15764-15773, doi: 10.1109/CVPR46437.2021.01551. [26] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, and Manmohan Chandraker. Learning to adapt structured output space for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7472–7481, 2018. [27] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. Int. Conf. Learn. Represent., 2015. [28] M. Cordts et al., 'The Cityscapes Dataset for Semantic Urban Scene Understanding,' 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223, doi: 10.1109/CVPR.2016.350. [29] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Int. Conf. Comput. Vis., pages 7374–7383, 2019. [30] P. Wen et al., 'A-PSPNet: A novel segmentation method of renal ultrasound image,' 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2021, pp. 40-45, doi: 10.1109/SMC52423.2021.9658740. [31] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8798–8807, 2018. [32] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process., 26(7):3142–3155, 2017. [33] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004. [34] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3291–3300, 2018. [35] Clement Godard, Oisin Mac Aodha, and Gabriel J Brostow. ´ Unsupervised monocular depth estimation with left-right consistency. In IEEE Conf. Comput. Vis. Pattern Recog., pages 270–279, 2017. [36] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Learning a discriminative feature network for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1857–1866, 2018. [37] Minghao Chen, Hongyang Xue, and Deng Cai. Domain adaptation for semantic segmentation with maximum squares loss. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2090–2099, 2019. [38] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In ´ Int. Conf. Comput. Vis., pages 2980–2988, 2017. [39] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2794–2802, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84485	-
dc.description.abstract	近年來人工智慧(Artificial Intelligence)與深度學習(Deep Learning)蓬勃發展，也被廣泛的應用在圖像分割的辨識任務上。從汽車的自動駕駛到醫學上的病因診斷都看得的圖像分割的身影。圖像分割可以被分為兩種類型，實例分割(Instance Segmentation)與語義分割(Semantic Segmentation)，而這兩種基礎任務組合在一起則稱為全景分割(Panoptic Segmentation)。本研究主要探討語義分割模型在台灣街景辨識上的應用。本研究目的為改善語義分割模型常見的問題。一般的語義分割模型在訓練時需要仰賴大量的標記資料。多數標記完成的開源資料集皆以西方國家街景為主要蒐集地區，目前並沒有專門蒐集台灣街景的開源資料集。除了資料蒐集的問題外，許多模型提取的特徵不足，而被忽略的特徵訊息可能導致像素點分類錯誤也會造成細節的預測較不精確。為改善上述問題，本研究提出改良版域適應模型，域適應網路採用遷移學習的方式來解決無標記資料的訓練。本研究將自行蒐集的無標記台灣日間與夜間街景進行匹配過後加入訓練資料集中，用以改善在台灣街景預測結果。最後在語義分割網路中用特殊的方法引入模塊。在不破壞原有的架構下引入注意力機制模塊來改善類別混淆問題以及引入殘差細化模塊來強化細節的預測。根據實驗結果顯示，本研究之模型與DANNet相比像素準確度提升了2.56%，平均交集聯集比提升了2.31%。	zh_TW
dc.description.abstract	Recently, artificial intelligence and deep learning have flourished, and they have also been widely used in image segmentation, which can be used for automatic driving and medical diagnosis. Image segmentation can be divided into two types, Instance Segmentation and Semantic Segmentation, and the combination of them is called Panoptic Segmentation. The thesis mainly discusses semantic segmentation and the application of Taiwan street scene recognition. The purpose of the thesis is to solve the common problems of semantic segmentation models. General semantic segmentation models rely on a massive amount of labeled data during training. Most of the open-source datasets they used are mainly collected from Western countries. Currently, there is no dataset dedicated to collecting Taiwan street scenes. In addition to data collection problems, many models extract insufficient features, and the ignored information from loosed features leads to incorrect pixel classification and less accurate prediction of details. In order to solve the above problems, we propose an optimized version of the domain adaptation network. The domain adaptation network uses transfer learning, so the training of unlabeled data can be implemented. We collected unlabeled Taiwanese daytime and nighttime street scenes by ourselves and made them aligned. Then, we added them to the training dataset to improve the prediction of Taiwanese street scenes. Furthermore, we introduced the attention module into the semantic segmentation network with a special method without destroying the original architecture to solve the class confusion problem. And the residual refinement module is introduced to strengthen the prediction of details. According to the experimental results, the pixel accuracy of the model in this study is improved by 2.56% compared with DANNet, and the mean intersection over union is improved by 2.31%.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T22:13:06Z (GMT). No. of bitstreams: 1 U0001-2309202200243600.pdf: 3365047 bytes, checksum: b4feb3dc5c4bb54b302c01aa9811473a (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iii 目錄 v 圖目錄 viii 表目錄 x 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究貢獻 4 1.4 論文架構 4 第二章相關文獻探討 6 2.1 編碼器解碼器架構 7 2.1.1 U型網路 7 2.2 金字塔池化架構 8 2.2.1 空洞空間金字塔池化 8 2.2.2 金字塔池化 10 2.3 注意力機制模塊 11 2.3.1 擠壓和激發模塊 12 2.3.2 卷積塊注意力模塊 12 2.4 殘差細化模塊 14 2.5 深度學習方法 14 2.5.1 監督式學習 14 2.5.2 非監督式學習 15 2.5.3 遷移學習 15 2.6 域適應網路 16 第三章研究方法 18 3.1 問題定義 18 3.1.1 資料蒐集問題 18 3.1.2 台灣與夜間街道影像辨識問題 18 3.1.3 類別混淆與細節細化問題 19 3.2 整體網路架構 19 3.3 模型設計 20 3.4 損失函數 22 3.4.1 光損失 22 3.4.2 語義分割損失 23 3.4.3 靜態損失 24 3.4.4 對抗損失 25 3.4.5 總損失函數 25 第四章實驗結果與討論 26 4.1 實驗環境與設定 26 4.2 實驗資料蒐集 26 4.2.1 源域資料集 26 4.2.2 目標域資料集 27 4.3 實驗評估方式 28 4.4 損失函數參數設置 28 4.5 強化模塊引入方法實驗 29 4.6 強化模塊策略挑選實驗 32 4.7 訓練資料數量設定實驗 35 4.8 夜間辨識實驗結果與討論 39 4.9 台灣街景辨識實驗結果與討論 40 第五章結論與未來展望 45 5.1 結論 45 5.2 未來展望 45 參考文獻 46
dc.language.iso	zh-TW
dc.subject	域適應	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	注意力機制	zh_TW
dc.subject	語義分割	zh_TW
dc.subject	遷移學習	zh_TW
dc.subject	Attention Mechanism	en
dc.subject	Deep Learning	en
dc.subject	Transfer Learning	en
dc.subject	Domain Adaptation	en
dc.subject	Semantic Segmentation	en
dc.title	基於語義分割模型之台灣街景解析優化	zh_TW
dc.title	The optimization of semantic segmentation model for Taiwan street scene	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	丁肇隆(Chao-Lung Ting),張恆華(Heng-Hua Chang),傅楸善(Chiu-Shan Fu)
dc.subject.keyword	深度學習,遷移學習,域適應,語義分割,注意力機制,	zh_TW
dc.subject.keyword	Deep Learning,Transfer Learning,Domain Adaptation,Semantic Segmentation,Attention Mechanism,	en
dc.relation.page	51
dc.identifier.doi	10.6342/NTU202203866
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2022-09-26
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-29	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
U0001-2309202200243600.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	3.29 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。