應用於語意分割的快速雙重注意力拉普拉斯金字塔網絡

Kam-In Ng; 吳錦賢

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74345

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭振牟
dc.contributor.author	Kam-In Ng	en
dc.contributor.author	吳錦賢	zh_TW
dc.date.accessioned	2021-06-17T08:30:54Z	-
dc.date.available	2024-08-19
dc.date.copyright	2019-08-19
dc.date.issued	2019
dc.date.submitted	2019-08-12
dc.identifier.citation	[1] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual Attention Network for Scence Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [2] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet:Efﬁcient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation,”Proceedings of European Conference on Computer Vision (ECCV), 2018. [3] M. Cordts, M. Omran, S. Ramos, Timo, Rehfeld, M. Enzweiler, R. Benenson,U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [4] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,”Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [5] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. [6] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” Proceedings of European Conference on Computer Vision (ECCV),2018. [7] F. Yu and K. V, “Multi-scale context aggregation by dilated convolutions,” Proceddings of International Conference on Learning Representations (ICLR),2016. [8] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Proceddings of International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015. [9] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, B. Glocker, and D. Rueckert, “Attention U-Net: Learning Where to Look for the Pancreas,” Proceddings of International Conference on Medical Imaging with Deep Learning(MIDL), 2018. [10] G. Ghiasi and C. C. Fowlkes, “Laplacian Pyramid Reconstruction and Reﬁnement for Semantic Segmentation,” Proceedings of European Conference on Computer Vision (ECCV), 2016. [11] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a Discriminative Feature Network for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [12] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,”Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [13] H. Zhao, Y. Zhang, S. Liu, J. Shi, C. C. Loy, D. Lin, and J. Jia, “PSANet point-wise Saptial Attention Netwwork for Scene Parsing,” Proceedings of European Conference on Computer Vision (ECCV), 2018. [14] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal,“Context Encoding for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [15] Y. Yuhui and W. Jingdong, “OCNet: Object Context Network for Scene Parsing,” arXiv:1809.00916, 2018. [16] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” Proceedings of European Conference on Computer Vision (ECCV), 2016. [17] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [18] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-Time Single Image and Video Super-Resolution Using an Efﬁcient Sub-Pixel Convolutional Neural Network,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74345	-
dc.description.abstract	語義分割是計算機視覺中必不可少但計算成本高的任務。此外，自注意力機制(Self-attention mechanism)可以幫助提取富含的上下文依賴關係的特徵。但是它需要讓類神經網絡額外增加高的計算量。在這項研究中，我們基於雙重注意力機制(Dual Attention Module)提出了快速雙重注意力機制(Fast Dual Attention Module)，它可以有效率及有效地提取具長距離依賴性關係的訊息。此外，我們也提出了拉普拉斯金字塔解碼器(Laplacian Pyramid Decoder)，它可以有效地從低解析度的語義分割結果還原高頻率的細節特徵並獲得高解析度的語義分割結果。我們將 FDAM 和 LPD 集成到 ESPNet 中，並將我們提出的網絡架構稱為快速雙重注意力拉普拉斯金字塔網絡(Fast Dual Attention Laplacian Pyramid network)。我們在 Cityscapes 數據集上評估 FDALPNet準確率及計算速度。FDA 相對於 DA 在執行時間上降低了 76.68%。LPD 讓 ESPNet 的mIoU score 提升了 5.41%。 FDALPNet 相對於 ESPNet mIoU score 提升了 8.14%。實驗結果顯示FDALPNet 相對於 ESPNet 的準確率有顯注的提升。	zh_TW
dc.description.abstract	Semantic segmentation is an essential yet computationally expensive task in computer vision. Self-attention mechanism can help to capture rich contextual dependencies. However, it requires an even higher computation overhead. In this thesis, we propose a Fast Dual Attention Module (FDAM), which is based on the Dual Attention Module (DAM), that can capture the long-range dependencies information both efﬁciently and effectively. Besides, we introduce a Laplacian Pyramid Decoder (LPD), which can effectively recover the high-frequency information from a low-resolution segmentation mask. We integrate FDAM and LPD into the ESPNet and call our proposed framework as Fast Dual Attention Laplacian Pyramid network (FDALPNet). We evaluated FDALPNet on the Cityscapes dataset. FDA module is 76.68% less running time than the DA module. LPD improves the mIoU score by 5.41%. The experimental results show that FDALPNet performs favorably against the ESPNet in terms of accuracy. FDALPNet is 8.14% more accurate than ESPNet.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:30:54Z (GMT). No. of bitstreams: 1 ntu-108-R05921089-1.pdf: 6214173 bytes, checksum: a1ca1aa509247a464ffd4cb866cadfc2 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Abstract i List of Figures v List of Tables vi 1 Introduction 1 1.1 Motivation and objective 1 1.2 Introduction of Semantic segmentation 2 1.2.1 Definition of Semantic segmentation 2 1.2.2 mean Intersection-Over-Union score (mIoU) 2 1.3 Literature review 4 1.4 Thesis organization 8 2 Method 9 2.1 Overview 9 2.2 Laplacian Pyramid Decoder (LPD) 12 2.3 Fast Dual Attention Module (FDA) 14 3 Experiments 19 3.1 Overview 19 3.2 Implementation and training details 19 3.3 Cityscapes dataset 20 3.4 Hyperparameter optimization 22 3.5 Ablation Study 24 3.6 Comparisons 25 3.7 Execution time 25 4 Conclusions, Contributions, and Future Works 28 4.1 Conclusions and Contribution 28 4.2 Future Works 29 Reference 30
dc.language.iso	en
dc.subject	語意分割	zh_TW
dc.subject	影像處理	zh_TW
dc.subject	計算機視覺	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	超解析度成像	zh_TW
dc.subject	Semantic Segmentation	en
dc.subject	Image processing	en
dc.subject	Computer vision	en
dc.subject	Machine learning	en
dc.subject	Super resolution	en
dc.title	應用於語意分割的快速雙重注意力拉普拉斯金字塔網絡	zh_TW
dc.title	Fast Dual Attention Laplacian Pyramid Network for Semantic Segmentation	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.coadvisor	廖世偉
dc.contributor.oralexamcommittee	張富傑,葉羅堯
dc.subject.keyword	語意分割,影像處理,計算機視覺,機器學習,超解析度成像,	zh_TW
dc.subject.keyword	Semantic Segmentation,Image processing,Computer vision,Machine learning,Super resolution,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU201902916
dc.rights.note	有償授權
dc.date.accepted	2019-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	6.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。