請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95477完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 傅立成 | zh_TW |
| dc.contributor.advisor | Li-Chen Fu | en |
| dc.contributor.author | 陳烱濤 | zh_TW |
| dc.contributor.author | Chiung-Tao Chen | en |
| dc.date.accessioned | 2024-09-10T16:16:32Z | - |
| dc.date.available | 2024-09-11 | - |
| dc.date.copyright | 2024-09-10 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-09 | - |
| dc.identifier.citation | [1] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognit., vol. 77, pp. 354–377, 2018.
[2] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable Person Re-identification: A Benchmark,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 1116–1124, 2015. [3] E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi, “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking,” arXiv preprint arXiv:1609.01775, 2016. [4] Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 79–88, 2018 [5] Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deep-reid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 152–159, 2014 [6] T. Morey, "Customer Data: Designing for Transparency and Trust," Harvard Business Rev., vol. 93, no. 5, pp. 96–105, 2015. [7] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline),” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 501–518, 2018. [8] F. Zheng, C. Deng, X. Sun, X. Jiang, X. Guo, Z. Yu, F. Huang, and R. Ji, “Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8506–8514, 2019. [9] T. Chen, T. Chen, J. Xie, Y. Yuan, W. Chen, Y. Yang, Z. Ren, and Z. Wang, in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 8350–8360, 2019. [10] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, “Human Semantic Parsing for Person Re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1062–1071, 2018. [11] Y. Dai, X. Li, J. Liu, Z. Tong, and L.-Y. Duan, “Generalizable person re-identification with relevance-aware mixture of experts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021. [12] S. Choi, T. Kim, M. Jeong, H. Park, and C. Kim, “Meta Batch-instance normalization for generalizable person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021. [13] H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen, “Meta distribution alignment for generalizable person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022. https://doi.org/10.1109/cvpr52688.2022.00252 [14] X. Jin, C. Lan, W. Zeng, Z. Chen, and L. Zhang, “Style normalization and restitution for Generalizable Person Re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020. [15] Y.-F. Zhang, Z. Zhang, D. Li, Z. Jia, L. Wang, and T. Tan, “Learning domain invariant representations for generalizable person re-identification,” IEEE Trans. Image Process., vol. 32, pp. 509–523, 2023. [16] H. Ni, Y. Li, L. Gao, H. T. Shen, and J. Song, “Part-Aware Transformer for Generalizable Person Re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023. [17] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 448–456. [18] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2017. [19] J. Jiang, W. Zhang, R. Ran, W. Hu, and J. Dai, “Multi-scale transformer-based matching network for generalizable person re-identification,” IEEE Signal Processing Letters, vol. 30, pp. 1277–1281, 2023. [20] S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “TransReID: Transformer-based Object Re-Identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021. [21] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814. [22] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016. [23] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016. [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Inf. Process. Syst. (NIPS), 2014, vol. 27. [26] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 1180–1189. [27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021. [28] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, 1948. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x [29] K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 346–363, 2020. [30] S. Dou, C. Zhao, X. Jiang, S. Zhang, W.-S. Zheng, and W. Zuo, “Human co-parsing guided alignment for occluded person re-identification,” IEEE Trans. Image Process., vol. 32, pp. 458–470, 2023. [31] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Lo, P. Dollár, and R. Girshick, “Segment Anything,” arXiv preprint arXiv:2304.02643, 2023. https://doi.org/10.48550/arxiv.2304.02643 [32] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D pose Estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, 2021. https://doi.org/10.1109/tpami.2019.2929257 [33] B. Li, Y. Shen, J. Yang, Y. Wang, J. Ren, T. Che, J. Zhang, and Z. Liu, "Sparse mixture-of-experts are domain generalizable learners," arXiv preprint arXiv:2206.04046, Jan. 2023. https://arxiv.org/abs/2206.04046 [34] J. Deng, W. Dong, R. Socher, L. -J. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848. [35] Yunpeng Gong. A general multi-modal data learning method for person re-identification. arXiv preprint arXiv:2101.08533, 2021. [36] X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning and generalization capacities via ibn-net,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 464–479. [37] Shengcai Liao and Ling Shao. Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In European Conference on Computer Vision, pages 456–474. Springer, 2020 [38] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. [39] Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Yaojin Lin, Shaozi Li, and Nicu Sebe. Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6277–6286, 2021 [40] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi, “Generalizing across domains via cross-gradient training,” arXiv preprint arXiv:1804.10745, 2018. [41] K. Zhou, Y. Yang, T. Hospedales, and T. Xiang, “Learning to generate novel domains for domain generalization,” in European Conference on Computer Vision. Springer, 2020, pp. 561–578. [42] G. Jocher, ”ultralytics/yolov5: v7.0-yolov5 sota realtime instance segmentation,” Zenodo, Nov., 2022. [43] L. Van Der Maaten and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579-2605, 2008. [Online]. Available: http://isplab.tudelft.nl/sites/default/files/vandermaaten08a.pdf.. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95477 | - |
| dc.description.abstract | 近年來,隨著對公共安全需求的增加以及監控系統的廣泛應用,行人重識別技術在相關研究中備受關注。儘管基於監督學習的方法在公開資料集上取得了顯著成果,但現實環境與訓練資料之間存在的領域差異,而且為每次佈署建立有標註的資料集是十分耗費人力的,所以這些方法並不能夠輕鬆地移轉至現實生活中的應用。
無監督域適應和完全無監督學習雖然在一定程度上解決了資料標注的問題,但它們往往會過度擬合於目標領域的風格,這對於多變的現實環境並不利。 鑒於上述問題,我們提出了一種基於域泛化學習的全新方法。該方法包括兩個關鍵模組:一是具備語義感知能力的注意力遮罩生成模組,二是專家混合與域不變注意層。前者有助於模型有效學習不同人體部位的特徵,後者通過域不變注意去除域相關資訊,並使用專家混合使模型使用不同參數處理不同數據,從而避免過度擬合於源域的分布 我們的方法在實驗中表現優異,超越了許多現有方法。這一創新方法將為行人重識別系統的實際應用帶來新的可能性,同時也推動了該領域的進一步發展。 | zh_TW |
| dc.description.abstract | In recent years, with the increasing demand for public safety and the widespread application of surveillance systems, person re-identification technology has received significant attention in related research. Although supervised learning-based methods have achieved remarkable results on public datasets, the domain discrepancy between real-world environments and training data poses a challenge. Additionally, creating labeled datasets for each deployment is labor-intensive, making it difficult for these methods to be easily transferred to real-life applications.
While unsupervised domain adaptation and fully unsupervised learning partially address the issue of data annotation, they often tend to overfit to the target domain style, which is disadvantageous for the dynamic nature of real-world environments. Given the aforementioned challenges, we propose a novel method based on domain generalization learning. This method comprises two key modules: a Semantic Aware Mask Generator and a Mixture of Experts with Domain-invariant Attention layer. The former helps the model effectively learn features of different body parts, while the latter removes domain-specific information through domain-invariant attention and employs a mixture of experts to process different data with different parameters, thereby avoiding overfitting to the source domain distribution. Our method performs exceptionally well in experiments, surpassing many existing methods. This innovative approach opens up new possibilities for the practical application of person re-identification systems and advances the field further. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-10T16:16:32Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-09-10T16:16:32Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 #
誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vii LIST OF TABLES x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Literature Review 3 1.2.1 Fully Supervised Person Re-ID 3 1.2.2 Domain Generalization Person Re-ID 6 1.2.3 Relevance-aware Mixture of Experts (RaMoE) 7 1.2.4 Meta Batch-Instance Normalization (MetaBIN) 8 1.2.5 Meta Distribution Alignment (MDA) 9 1.2.6 Style Normalization and Restitution (SNR) 10 1.2.7 Domain Invariant Representation Learning (DIR-ReID) 11 1.2.8 Part-Aware-Transformer (PAT) 12 1.3 Contribution 14 1.4 Thesis Organization 15 Chapter 2 Preliminaries 17 2.1 Deep Learning and Networks 17 2.1.1 Multi-Layer Perceptron 17 2.1.2 Attention Mechanism 18 2.1.3 Generative Adversarial Network (GAN) and Gradient Reversal Layer (GRL) 20 2.2 Transformer Networks 22 2.2.1 Transformer Encoder 22 2.2.2 Vision Transformer (ViT) 23 2.3 Objective Functions 25 2.3.1 Cross Entropy Loss 25 2.3.2 Triplet Loss 26 Chapter 3 Generalizable Person Re-identification 27 3.1 Framework Overview 27 3.2 Semantic Aware Mask Generator 28 3.2.1 Part Mask Transformation 30 3.3 Mixture of Expert Domain Invariant Attention (MoE/DIA) Transformer 31 3.3.1 Global Feature Extraction 33 3.3.2 Part Feature Extraction 34 3.3.3 Mixture of Experts and Domain Invariance Attention (MoE/DIA) Layer 34 3.4 Part Feature Learning 42 3.5 Global Feature Learning 44 3.6 Objective Functions 46 3.7 Inference Stage 46 Chapter 4 Experimental Results 48 4.1 System Configuration 48 4.2 Person Re-identification Datasets 50 4.2.1 Market-1501 Dataset 50 4.2.2 DukeMTMC-reID Dataset 50 4.2.3 CUHK03 Dataset 51 4.2.4 MSMT17 51 4.3 Implementation Detail 52 4.4 Comparison with SOTA 52 4.4.1 Multi-source DG ReID 52 4.4.2 Cross Domain ReID 53 4.5 Ablation Studies 55 4.6 System Implementation and Results 57 4.6.1 System implementation 57 4.6.2 Implementation Result 58 (1) Demo on Video from National Taiwan University Advanced Control Laboratory 58 (2) Demo on Security System in Office Building 60 (3) The attention map visualization when occlusion 62 Chapter 5 Conclusions 64 REFERENCES 65 | - |
| dc.language.iso | en | - |
| dc.subject | 語義分割 | zh_TW |
| dc.subject | 領域泛化 | zh_TW |
| dc.subject | 專家混合 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 行人重識別 | zh_TW |
| dc.subject | Deep Learning | en |
| dc.subject | Person Re-identification | en |
| dc.subject | Domain Generalization | en |
| dc.subject | Mixture of Experts | en |
| dc.subject | Semantic Segmentation | en |
| dc.title | 利用專家混合與域不變注意機制之域泛化行人重識別系統 | zh_TW |
| dc.title | Generalizable Person Re-Identification System with Mixture of Experts and Domain Invariance Attention | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 王鈺強;傅楸善;黃世勳;黃正民 | zh_TW |
| dc.contributor.oralexamcommittee | Yu-Chiang Frank Wang;Chiou-Shann Fuh;Shih-Shinh Huang;Cheng-Ming Huang | en |
| dc.subject.keyword | 深度學習,行人重識別,領域泛化,專家混合,語義分割, | zh_TW |
| dc.subject.keyword | Deep Learning,Person Re-identification,Domain Generalization,Mixture of Experts,Semantic Segmentation, | en |
| dc.relation.page | 67 | - |
| dc.identifier.doi | 10.6342/NTU202403721 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2024-08-09 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| dc.date.embargo-lift | 2029-08-06 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 4.68 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
