基於多階級全域關聯性與區域圖樣資訊進行材質辨識

Yu-Chieh Chao; 趙昱傑

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86743

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳沛遠(Pei-Yuan Wu)
dc.contributor.author	Yu-Chieh Chao	en
dc.contributor.author	趙昱傑	zh_TW
dc.date.accessioned	2023-03-20T00:14:50Z	-
dc.date.copyright	2022-08-02
dc.date.issued	2022
dc.date.submitted	2022-07-28
dc.identifier.citation	[1] S. Albawi, T. A. Mohammed, and S. Al-Zawi. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET), pages 1–6. Ieee, 2017. [2] M.AlkhatibandA.Hafiane.Robustadaptivemedianbinarypatternfornoisytexture classification and retrieval. IEEE Transactions on Image Processing, 28(11):5407–5418, 2019. [3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated re- current neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. [4] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014. [5] M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3828–3836, 2015. [6] X. Dai, J. Yue-Hei Ng, and L. S. Davis. Fason: First and second order information fusion network for texture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7352–7360, 2017. [7] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, and K. Barnard. Attentional feature fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3560–3569, 2021. [8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [9] S. Ghose, P. N. Chowdhury, P. P. Roy, and U. Pal. Modeling extent-of-texture in- formation for ground terrain recognition. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4766–4773. IEEE, 2021. [10] J.Gildenblatandcontributors.Pytorchlibraryforcammethods.https://github. com/jacobgil/pytorch-grad-cam, 2021. [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [12] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [13] T.-Y. Lin and S. Maji. Visualizing and understanding deep texture representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2791–2799, 2016. [14] L. Liu, J. Chen, P. Fieguth, G. Zhao, R. Chellappa, and M. Pietikäinen. From bow to cnn: Two decades of texture representation for texture classification. International Journal of Computer Vision, 127(1):74–109, 2019. [15] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin trans- former: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. [16] D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.International journal of computer vision, 60(2):91–110, 2004. [17] J.Ma,Y.Bai,B.Zhong,W.Zhang,T.Yao,andT.Mei.Visualizingandunderstanding patch interactions in vision transformer. arXiv preprint arXiv:2203.05922, 2022. [18] P. Mallikarjuna, A. T. Targhi, M. Fritz, E. Hayman, B. Caputo, and J.-O. Eklundh. The kth-tips2 database. Computational Vision and Active Perception Laboratory, Stockholm, Sweden, pages 1–10, 2006. [19] F.Perronnin,J.Sánchez,andT.Mensink.Improvingthefisherkernelforlarge-scale image classification. In European conference on computer vision, pages 143–156. Springer, 2010. [20] Y.Quan,Y.Xu,Y.Sun,andY.Luo.Lacunarityanalysisonimagepatternsfortexture classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 160–167, 2014. [21] O.Russakovsky,J.Deng,H.Su,J.Krause,S.Satheesh,S.Ma,Z.Huang,A.Karpa- thy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015. [22] L. Sharan, R. Rosenholtz, and E. Adelson. Material perception: What can you see in a brief glance? Journal of Vision, 9(8):784–784, 2009. [23] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [24] Y.Song,F.Zhang,Q.Li,H.Huang,L.J.O’Donnell,andW.Cai.Locally-transferred fisher vectors for texture classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 4912–4920, 2017. [25] L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008. [26] M. Varma and A. Zisserman. A statistical approach to texture classification from single images. International journal of computer vision, 62(1-2):61–81, 2005. [27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [28] K. Wang, C.-E. Bichot, Y. Li, and B. Li. Local binary circumferential and radial derivative pattern for texture classification. Pattern recognition, 67:213–229, 2017. [29] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 568–578, 2021. [30] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, pages 1–10, 2022. [31] Y.Wen,K.Zhang,Z.Li,andY.Qiao.Adiscriminativefeaturelearningapproachfor deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016. [32] J. Xue, H. Zhang, and K. Dana. Deep texture manifold for ground terrain recog- nition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 558–567, 2018. [33] J.Xue,H.Zhang,K.Dana,andK.Nishino.Differentialangularimagingformaterial recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 764–773, 2017. [34] W. Zhai, Y. Cao, Z.-J. Zha, H. Xie, and F. Wu. Deep structure-revealed network for texture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11010–11019, 2020. [35] W.Zhai,Y.Cao,J.Zhang,andZ.-J.Zha.Deepmultiple-attribute-perceivednetwork for real-world texture recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3613–3622, 2019. [36] H.Zhang,J.Xue,andK.Dana.Deepten:Textureencodingnetwork.InProceedings of the IEEE conference on computer vision and pattern recognition, pages 708–717, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86743	-
dc.description.abstract	由於細微的材質特徵以及多種環境資訊所導致視覺上較大的差異，想要從材質影像中提取具有區別性的特徵表示是困難的，使得材質辨識是一個具有挑戰性的任務。過去的文獻專注於使用卷積神經網路從材質物體中提取圖樣資訊。在我們的工作中，我們首次將 Vision Transformer (ViT) 應用至材質辨識，討論自我專注機制所學習的材質影像區域與區域之間的關聯性在材質辨識領域的可行性。接著，為了生成資訊量更大的特徵表示，我們提出了 CTF-Net，透過將 ViT 和卷積神經網路所生成的高階級特徵圖譜在圖譜的每個位置進行結合，實現全域關聯性和區域圖樣特徵兩種資訊的互補。除了如過去文獻只考慮高階級特徵圖譜外，我們提出了 MLCTF-Net 將 ViT 和卷積神經網路在多個神經網路層所生成的特徵圖譜都納入考量，整合了不同階級的材質特徵。最後，除了為了處理材質組間差異所使用的交叉直熵，我們提出了 MLCTF-Net† 進一步採用了中心損失來解決材質組內差異大的問題。透過在 DTD、KTH-TIPS2-b、FMD、GTOS、GTOS-mobile 上全面的實驗展示了我們所提出的模型具有出色的材質分類準確度。	zh_TW
dc.description.abstract	With the minuscule texture primitives and the large perceptual variations under diverse contexts, it is hard to capture discriminative representations from texture images, which makes texture recognition a challenging problem. Up to now, previous works focused on applying Convolutional Neural Networks (CNN) to extract pattern information from texture objects. In our work, we first investigate the efficacy of vision transformer (ViT) on texture recognition, which models the global semantic relevance of texture image patches by a series of self-attention mechanisms. Next, to generate more informative representations, we propose CNN-ViT Fusion Network (CTF-Net), which fuses high-level feature maps generated by CNN and ViT backbones, complementing the global semantic relevance learned by ViT with pattern characteristics captured by CNN at each spatial position. Besides considering only the high-level feature map as in previous works, we propose Multi-Level CNN-ViT Fusion Network (MLCTF-Net) that fuses feature maps generated by CNN and ViT at multiple layers to incorporate texture features of different abstraction levels. Finally, in addition to cross entropy loss that is used to deal with the inter-class variations between texture categories, we propose MLCTF-Net† that further takes center loss into account to address intra-class variations within each texture category. The ex- tensive experiments on DTD, KTH-TIPS2-b, FMD, GTOS, and GTOS-mobile show that the proposed fusion networks achieve prominent performance on texture classification.	en
dc.description.provenance	Made available in DSpace on 2023-03-20T00:14:50Z (GMT). No. of bitstreams: 1 U0001-1907202214253300.pdf: 2594569 bytes, checksum: 8fb7505e1ae308e5eb625f85e7f98916 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 Chapter 3 Method 7 3.1 Overview 7 3.2 Ensemble-Net 11 3.3 CTF-Net 11 3.4 MLCTF-Net 12 3.5 MLCTF-Net† 13 Chapter 4 Experiment 15 4.1 Datasets 15 4.2 Experimental Setup 18 4.3 Performance Analysis 19 Chapter 5 Conclusion 23 References 25
dc.language.iso	en
dc.subject	視覺變換器	zh_TW
dc.subject	模型結合	zh_TW
dc.subject	材質辨識	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	視覺變換器	zh_TW
dc.subject	模型結合	zh_TW
dc.subject	材質辨識	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	Texture Recognition	en
dc.subject	Deep Learning	en
dc.subject	Convolutional Neural Network	en
dc.subject	Vision Transformer	en
dc.subject	Model Fusion	en
dc.subject	Texture Recognition	en
dc.subject	Deep Learning	en
dc.subject	Convolutional Neural Network	en
dc.subject	Model Fusion	en
dc.subject	Vision Transformer	en
dc.title	基於多階級全域關聯性與區域圖樣資訊進行材質辨識	zh_TW
dc.title	Texture Recognition with Multi-Level Global Semantic Relevance and Local Pattern Information	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王鈺強(Yu-Chiang Wang),馬偉雲(Wei-Yun Ma)
dc.subject.keyword	材質辨識,模型結合,視覺變換器,卷積神經網路,深度學習,	zh_TW
dc.subject.keyword	Texture Recognition,Model Fusion,Vision Transformer,Convolutional Neural Network,Deep Learning,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU202201544
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
dc.date.embargo-lift	2022-08-02	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1907202214253300.pdf	2.53 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。