Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86743
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳沛遠(Pei-Yuan Wu)
dc.contributor.authorYu-Chieh Chaoen
dc.contributor.author趙昱傑zh_TW
dc.date.accessioned2023-03-20T00:14:50Z-
dc.date.copyright2022-08-02
dc.date.issued2022
dc.date.submitted2022-07-28
dc.identifier.citation[1] S. Albawi, T. A. Mohammed, and S. Al-Zawi. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET), pages 1–6. Ieee, 2017. [2] M.AlkhatibandA.Hafiane.Robustadaptivemedianbinarypatternfornoisytexture classification and retrieval. IEEE Transactions on Image Processing, 28(11):5407–5418, 2019. [3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated re- current neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. [4] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014. [5] M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3828–3836, 2015. [6] X. Dai, J. Yue-Hei Ng, and L. S. Davis. Fason: First and second order information fusion network for texture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7352–7360, 2017. [7] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, and K. Barnard. Attentional feature fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3560–3569, 2021. [8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [9] S. Ghose, P. N. Chowdhury, P. P. Roy, and U. Pal. Modeling extent-of-texture in- formation for ground terrain recognition. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4766–4773. IEEE, 2021. [10] J.Gildenblatandcontributors.Pytorchlibraryforcammethods.https://github. com/jacobgil/pytorch-grad-cam, 2021. [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [12] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [13] T.-Y. Lin and S. Maji. Visualizing and understanding deep texture representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2791–2799, 2016. [14] L. Liu, J. Chen, P. Fieguth, G. Zhao, R. Chellappa, and M. Pietikäinen. From bow to cnn: Two decades of texture representation for texture classification. International Journal of Computer Vision, 127(1):74–109, 2019. [15] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin trans- former: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. [16] D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.International journal of computer vision, 60(2):91–110, 2004. [17] J.Ma,Y.Bai,B.Zhong,W.Zhang,T.Yao,andT.Mei.Visualizingandunderstanding patch interactions in vision transformer. arXiv preprint arXiv:2203.05922, 2022. [18] P. Mallikarjuna, A. T. Targhi, M. Fritz, E. Hayman, B. Caputo, and J.-O. Eklundh. The kth-tips2 database. Computational Vision and Active Perception Laboratory, Stockholm, Sweden, pages 1–10, 2006. [19] F.Perronnin,J.Sánchez,andT.Mensink.Improvingthefisherkernelforlarge-scale image classification. In European conference on computer vision, pages 143–156. Springer, 2010. [20] Y.Quan,Y.Xu,Y.Sun,andY.Luo.Lacunarityanalysisonimagepatternsfortexture classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 160–167, 2014. [21] O.Russakovsky,J.Deng,H.Su,J.Krause,S.Satheesh,S.Ma,Z.Huang,A.Karpa- thy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015. [22] L. Sharan, R. Rosenholtz, and E. Adelson. Material perception: What can you see in a brief glance? Journal of Vision, 9(8):784–784, 2009. [23] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [24] Y.Song,F.Zhang,Q.Li,H.Huang,L.J.O’Donnell,andW.Cai.Locally-transferred fisher vectors for texture classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 4912–4920, 2017. [25] L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008. [26] M. Varma and A. Zisserman. A statistical approach to texture classification from single images. International journal of computer vision, 62(1-2):61–81, 2005. [27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [28] K. Wang, C.-E. Bichot, Y. Li, and B. Li. Local binary circumferential and radial derivative pattern for texture classification. Pattern recognition, 67:213–229, 2017. [29] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 568–578, 2021. [30] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, pages 1–10, 2022. [31] Y.Wen,K.Zhang,Z.Li,andY.Qiao.Adiscriminativefeaturelearningapproachfor deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016. [32] J. Xue, H. Zhang, and K. Dana. Deep texture manifold for ground terrain recog- nition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 558–567, 2018. [33] J.Xue,H.Zhang,K.Dana,andK.Nishino.Differentialangularimagingformaterial recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 764–773, 2017. [34] W. Zhai, Y. Cao, Z.-J. Zha, H. Xie, and F. Wu. Deep structure-revealed network for texture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11010–11019, 2020. [35] W.Zhai,Y.Cao,J.Zhang,andZ.-J.Zha.Deepmultiple-attribute-perceivednetwork for real-world texture recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3613–3622, 2019. [36] H.Zhang,J.Xue,andK.Dana.Deepten:Textureencodingnetwork.InProceedings of the IEEE conference on computer vision and pattern recognition, pages 708–717, 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86743-
dc.description.abstract由於細微的材質特徵以及多種環境資訊所導致視覺上較大的差異,想要從材質影像中提取具有區別性的特徵表示是困難的,使得材質辨識是一個具有挑戰性的任務。過去的文獻專注於使用卷積神經網路從材質物體中提取圖樣資訊。在我們的工作中,我們首次將 Vision Transformer (ViT) 應用至材質辨識,討論自我專注機制所學習的材質影像區域與區域之間的關聯性在材質辨識領域的可行性。接著,為了生成資訊量更大的特徵表示,我們提出了 CTF-Net,透過將 ViT 和卷積神經網路所生成的高階級特徵圖譜在圖譜的每個位置進行結合,實現全域關聯性和區域圖樣特徵兩種資訊的互補。除了如過去文獻只考慮高階級特徵圖譜外,我們提出了 MLCTF-Net 將 ViT 和卷積神經網路在多個神經網路層所生成的特徵圖譜都納入考量,整合了不同階級的材質特徵。最後,除了為了處理材質組間差異所使用的交叉直熵,我們提出了 MLCTF-Net† 進一步採用了中心損失來解決材質組內差異大的問題。透過在 DTD、KTH-TIPS2-b、FMD、GTOS、GTOS-mobile 上全面的實驗展示了我們所提出的模型具有出色的材質分類準確度。zh_TW
dc.description.abstractWith the minuscule texture primitives and the large perceptual variations under diverse contexts, it is hard to capture discriminative representations from texture images, which makes texture recognition a challenging problem. Up to now, previous works focused on applying Convolutional Neural Networks (CNN) to extract pattern information from texture objects. In our work, we first investigate the efficacy of vision transformer (ViT) on texture recognition, which models the global semantic relevance of texture image patches by a series of self-attention mechanisms. Next, to generate more informative representations, we propose CNN-ViT Fusion Network (CTF-Net), which fuses high-level feature maps generated by CNN and ViT backbones, complementing the global semantic relevance learned by ViT with pattern characteristics captured by CNN at each spatial position. Besides considering only the high-level feature map as in previous works, we propose Multi-Level CNN-ViT Fusion Network (MLCTF-Net) that fuses feature maps generated by CNN and ViT at multiple layers to incorporate texture features of different abstraction levels. Finally, in addition to cross entropy loss that is used to deal with the inter-class variations between texture categories, we propose MLCTF-Net† that further takes center loss into account to address intra-class variations within each texture category. The ex- tensive experiments on DTD, KTH-TIPS2-b, FMD, GTOS, and GTOS-mobile show that the proposed fusion networks achieve prominent performance on texture classification.en
dc.description.provenanceMade available in DSpace on 2023-03-20T00:14:50Z (GMT). No. of bitstreams: 1
U0001-1907202214253300.pdf: 2594569 bytes, checksum: 8fb7505e1ae308e5eb625f85e7f98916 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 Chapter 3 Method 7 3.1 Overview 7 3.2 Ensemble-Net 11 3.3 CTF-Net 11 3.4 MLCTF-Net 12 3.5 MLCTF-Net† 13 Chapter 4 Experiment 15 4.1 Datasets 15 4.2 Experimental Setup 18 4.3 Performance Analysis 19 Chapter 5 Conclusion 23 References 25
dc.language.isoen
dc.subject視覺變換器zh_TW
dc.subject模型結合zh_TW
dc.subject材質辨識zh_TW
dc.subject深度學習zh_TW
dc.subject卷積神經網路zh_TW
dc.subject視覺變換器zh_TW
dc.subject模型結合zh_TW
dc.subject材質辨識zh_TW
dc.subject深度學習zh_TW
dc.subject卷積神經網路zh_TW
dc.subjectTexture Recognitionen
dc.subjectDeep Learningen
dc.subjectConvolutional Neural Networken
dc.subjectVision Transformeren
dc.subjectModel Fusionen
dc.subjectTexture Recognitionen
dc.subjectDeep Learningen
dc.subjectConvolutional Neural Networken
dc.subjectModel Fusionen
dc.subjectVision Transformeren
dc.title基於多階級全域關聯性與區域圖樣資訊進行材質辨識zh_TW
dc.titleTexture Recognition with Multi-Level Global Semantic Relevance and Local Pattern Informationen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王鈺強(Yu-Chiang Wang),馬偉雲(Wei-Yun Ma)
dc.subject.keyword材質辨識,模型結合,視覺變換器,卷積神經網路,深度學習,zh_TW
dc.subject.keywordTexture Recognition,Model Fusion,Vision Transformer,Convolutional Neural Network,Deep Learning,en
dc.relation.page29
dc.identifier.doi10.6342/NTU202201544
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-07-28
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
dc.date.embargo-lift2022-08-02-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
U0001-1907202214253300.pdf2.53 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved