Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74345
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor鄭振牟
dc.contributor.authorKam-In Ngen
dc.contributor.author吳錦賢zh_TW
dc.date.accessioned2021-06-17T08:30:54Z-
dc.date.available2024-08-19
dc.date.copyright2019-08-19
dc.date.issued2019
dc.date.submitted2019-08-12
dc.identifier.citation[1] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual Attention Network for Scence Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[2] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet:Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation,”Proceedings of European Conference on Computer Vision (ECCV), 2018.
[3] M. Cordts, M. Omran, S. Ramos, Timo, Rehfeld, M. Enzweiler, R. Benenson,U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[4] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,”Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[5] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[6] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” Proceedings of European Conference on Computer Vision (ECCV),2018.
[7] F. Yu and K. V, “Multi-scale context aggregation by dilated convolutions,” Proceddings of International Conference on Learning Representations (ICLR),2016.
[8] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Proceddings of International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015.
[9] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, B. Glocker, and D. Rueckert, “Attention U-Net: Learning Where to Look for the Pancreas,” Proceddings of International Conference on Medical Imaging with Deep Learning(MIDL), 2018.
[10] G. Ghiasi and C. C. Fowlkes, “Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation,” Proceedings of European Conference on Computer Vision (ECCV), 2016.
[11] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a Discriminative Feature Network for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[12] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,”Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] H. Zhao, Y. Zhang, S. Liu, J. Shi, C. C. Loy, D. Lin, and J. Jia, “PSANet point-wise Saptial Attention Netwwork for Scene Parsing,” Proceedings of European Conference on Computer Vision (ECCV), 2018.
[14] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal,“Context Encoding for Semantic Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[15] Y. Yuhui and W. Jingdong, “OCNet: Object Context Network for Scene Parsing,” arXiv:1809.00916, 2018.
[16] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” Proceedings of European Conference on Computer Vision (ECCV), 2016.
[17] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[18] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74345-
dc.description.abstract語義分割是計算機視覺中必不可少但計算成本高的任務。此外,自注意力機制(Self-attention mechanism)可以幫助提取富含的上下文依賴關係的特徵。但是它需要讓類神經網絡額外增加高的計算量。
在這項研究中,我們基於雙重注意力機制(Dual Attention Module)提出了快速雙重注意力機制(Fast Dual Attention Module),它可以有效率及有效地提取具長距離依賴性關係的訊息。此外,我們也提出了拉普拉斯金字塔解碼器(Laplacian Pyramid Decoder),它可以有效地從低解析度的語義分割結果還原高頻率的細節特徵並獲得高解析度的語義分割結果。我們將 FDAM 和 LPD 集成到 ESPNet 中,並將我們提出的網絡架構稱為快速雙重注意力拉普拉斯金字塔網絡(Fast Dual Attention Laplacian Pyramid network)。我們在 Cityscapes 數據集上評估 FDALPNet準確率及計算速度。FDA 相對於 DA 在執行時間上降低了 76.68%。LPD 讓 ESPNet 的mIoU score 提升了 5.41%。
FDALPNet 相對於 ESPNet mIoU score 提升了 8.14%。實驗結果顯示FDALPNet 相對於 ESPNet 的準確率有顯注的提升。
zh_TW
dc.description.abstractSemantic segmentation is an essential yet computationally expensive task in computer vision. Self-attention mechanism can help to capture rich contextual dependencies. However, it requires an even higher computation overhead.
In this thesis, we propose a Fast Dual Attention Module (FDAM), which is based on the Dual Attention Module (DAM), that can capture the long-range dependencies information both efficiently and effectively. Besides, we introduce a Laplacian Pyramid Decoder (LPD), which can effectively recover the high-frequency information from a low-resolution segmentation mask. We integrate FDAM and LPD into the ESPNet and call our proposed framework as Fast Dual Attention Laplacian Pyramid network (FDALPNet). We evaluated FDALPNet on the Cityscapes dataset. FDA module is 76.68% less running time than the DA module. LPD improves the mIoU score by 5.41%.
The experimental results show that FDALPNet performs favorably against the ESPNet in terms of accuracy. FDALPNet is 8.14% more accurate than ESPNet.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:30:54Z (GMT). No. of bitstreams: 1
ntu-108-R05921089-1.pdf: 6214173 bytes, checksum: a1ca1aa509247a464ffd4cb866cadfc2 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAbstract i
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Motivation and objective 1
1.2 Introduction of Semantic segmentation 2
1.2.1 Definition of Semantic segmentation 2
1.2.2 mean Intersection-Over-Union score (mIoU) 2
1.3 Literature review 4
1.4 Thesis organization 8
2 Method 9
2.1 Overview 9
2.2 Laplacian Pyramid Decoder (LPD) 12
2.3 Fast Dual Attention Module (FDA) 14
3 Experiments 19
3.1 Overview 19
3.2 Implementation and training details 19
3.3 Cityscapes dataset 20
3.4 Hyperparameter optimization 22
3.5 Ablation Study 24
3.6 Comparisons 25
3.7 Execution time 25
4 Conclusions, Contributions, and Future Works 28
4.1 Conclusions and Contribution 28
4.2 Future Works 29
Reference 30
dc.language.isoen
dc.subject語意分割zh_TW
dc.subject影像處理zh_TW
dc.subject計算機視覺zh_TW
dc.subject機器學習zh_TW
dc.subject超解析度成像zh_TW
dc.subjectSemantic Segmentationen
dc.subjectImage processingen
dc.subjectComputer visionen
dc.subjectMachine learningen
dc.subjectSuper resolutionen
dc.title應用於語意分割的快速雙重注意力拉普拉斯金字塔網絡zh_TW
dc.titleFast Dual Attention Laplacian Pyramid Network for Semantic Segmentationen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.coadvisor廖世偉
dc.contributor.oralexamcommittee張富傑,葉羅堯
dc.subject.keyword語意分割,影像處理,計算機視覺,機器學習,超解析度成像,zh_TW
dc.subject.keywordSemantic Segmentation,Image processing,Computer vision,Machine learning,Super resolution,en
dc.relation.page32
dc.identifier.doi10.6342/NTU201902916
dc.rights.note有償授權
dc.date.accepted2019-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
6.07 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved