Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95477
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成zh_TW
dc.contributor.advisorLi-Chen Fuen
dc.contributor.author陳烱濤zh_TW
dc.contributor.authorChiung-Tao Chenen
dc.date.accessioned2024-09-10T16:16:32Z-
dc.date.available2024-09-11-
dc.date.copyright2024-09-10-
dc.date.issued2024-
dc.date.submitted2024-08-09-
dc.identifier.citation[1] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognit., vol. 77, pp. 354–377, 2018.
[2] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable Person Re-identification: A Benchmark,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 1116–1124, 2015.
[3] E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi, “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking,” arXiv preprint arXiv:1609.01775, 2016.
[4] Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 79–88, 2018
[5] Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deep-reid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 152–159, 2014
[6] T. Morey, "Customer Data: Designing for Transparency and Trust," Harvard Business Rev., vol. 93, no. 5, pp. 96–105, 2015.
[7] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline),” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 501–518, 2018.
[8] F. Zheng, C. Deng, X. Sun, X. Jiang, X. Guo, Z. Yu, F. Huang, and R. Ji, “Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8506–8514, 2019.
[9] T. Chen, T. Chen, J. Xie, Y. Yuan, W. Chen, Y. Yang, Z. Ren, and Z. Wang, in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 8350–8360, 2019.
[10] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, “Human Semantic Parsing for Person Re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1062–1071, 2018.
[11] Y. Dai, X. Li, J. Liu, Z. Tong, and L.-Y. Duan, “Generalizable person re-identification with relevance-aware mixture of experts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021.
[12] S. Choi, T. Kim, M. Jeong, H. Park, and C. Kim, “Meta Batch-instance normalization for generalizable person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021.
[13] H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen, “Meta distribution alignment for generalizable person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022. https://doi.org/10.1109/cvpr52688.2022.00252
[14] X. Jin, C. Lan, W. Zeng, Z. Chen, and L. Zhang, “Style normalization and restitution for Generalizable Person Re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020.
[15] Y.-F. Zhang, Z. Zhang, D. Li, Z. Jia, L. Wang, and T. Tan, “Learning domain invariant representations for generalizable person re-identification,” IEEE Trans. Image Process., vol. 32, pp. 509–523, 2023.
[16] H. Ni, Y. Li, L. Gao, H. T. Shen, and J. Song, “Part-Aware Transformer for Generalizable Person Re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023.
[17] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 448–456.
[18] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2017.
[19] J. Jiang, W. Zhang, R. Ran, W. Hu, and J. Dai, “Multi-scale transformer-based matching network for generalizable person re-identification,” IEEE Signal Processing Letters, vol. 30, pp. 1277–1281, 2023.
[20] S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “TransReID: Transformer-based Object Re-Identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021.
[21] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.
[22] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
[23] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016.
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
[25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Inf. Process. Syst. (NIPS), 2014, vol. 27.
[26] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 1180–1189.
[27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
[28] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, 1948. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[29] K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 346–363, 2020.
[30] S. Dou, C. Zhao, X. Jiang, S. Zhang, W.-S. Zheng, and W. Zuo, “Human co-parsing guided alignment for occluded person re-identification,” IEEE Trans. Image Process., vol. 32, pp. 458–470, 2023.
[31] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Lo, P. Dollár, and R. Girshick, “Segment Anything,” arXiv preprint arXiv:2304.02643, 2023. https://doi.org/10.48550/arxiv.2304.02643
[32] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D pose Estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, 2021. https://doi.org/10.1109/tpami.2019.2929257
[33] B. Li, Y. Shen, J. Yang, Y. Wang, J. Ren, T. Che, J. Zhang, and Z. Liu, "Sparse mixture-of-experts are domain generalizable learners," arXiv preprint arXiv:2206.04046, Jan. 2023. https://arxiv.org/abs/2206.04046
[34] J. Deng, W. Dong, R. Socher, L. -J. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.
[35] Yunpeng Gong. A general multi-modal data learning method for person re-identification. arXiv preprint arXiv:2101.08533, 2021.
[36] X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning and generalization capacities via ibn-net,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 464–479.
[37] Shengcai Liao and Ling Shao. Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In European Conference on Computer Vision, pages 456–474. Springer, 2020
[38] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[39] Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Yaojin Lin, Shaozi Li, and Nicu Sebe. Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6277–6286, 2021
[40] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi, “Generalizing across domains via cross-gradient training,” arXiv preprint arXiv:1804.10745, 2018.
[41] K. Zhou, Y. Yang, T. Hospedales, and T. Xiang, “Learning to generate novel domains for domain generalization,” in European Conference on Computer Vision. Springer, 2020, pp. 561–578.
[42] G. Jocher, ”ultralytics/yolov5: v7.0-yolov5 sota realtime instance segmentation,” Zenodo, Nov., 2022.
[43] L. Van Der Maaten and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579-2605, 2008. [Online]. Available: http://isplab.tudelft.nl/sites/default/files/vandermaaten08a.pdf..
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95477-
dc.description.abstract近年來,隨著對公共安全需求的增加以及監控系統的廣泛應用,行人重識別技術在相關研究中備受關注。儘管基於監督學習的方法在公開資料集上取得了顯著成果,但現實環境與訓練資料之間存在的領域差異,而且為每次佈署建立有標註的資料集是十分耗費人力的,所以這些方法並不能夠輕鬆地移轉至現實生活中的應用。

無監督域適應和完全無監督學習雖然在一定程度上解決了資料標注的問題,但它們往往會過度擬合於目標領域的風格,這對於多變的現實環境並不利。

鑒於上述問題,我們提出了一種基於域泛化學習的全新方法。該方法包括兩個關鍵模組:一是具備語義感知能力的注意力遮罩生成模組,二是專家混合與域不變注意層。前者有助於模型有效學習不同人體部位的特徵,後者通過域不變注意去除域相關資訊,並使用專家混合使模型使用不同參數處理不同數據,從而避免過度擬合於源域的分布

我們的方法在實驗中表現優異,超越了許多現有方法。這一創新方法將為行人重識別系統的實際應用帶來新的可能性,同時也推動了該領域的進一步發展。
zh_TW
dc.description.abstractIn recent years, with the increasing demand for public safety and the widespread application of surveillance systems, person re-identification technology has received significant attention in related research. Although supervised learning-based methods have achieved remarkable results on public datasets, the domain discrepancy between real-world environments and training data poses a challenge. Additionally, creating labeled datasets for each deployment is labor-intensive, making it difficult for these methods to be easily transferred to real-life applications.

While unsupervised domain adaptation and fully unsupervised learning partially address the issue of data annotation, they often tend to overfit to the target domain style, which is disadvantageous for the dynamic nature of real-world environments.

Given the aforementioned challenges, we propose a novel method based on domain generalization learning. This method comprises two key modules: a Semantic Aware Mask Generator and a Mixture of Experts with Domain-invariant Attention layer. The former helps the model effectively learn features of different body parts, while the latter removes domain-specific information through domain-invariant attention and employs a mixture of experts to process different data with different parameters, thereby avoiding overfitting to the source domain distribution.

Our method performs exceptionally well in experiments, surpassing many existing methods. This innovative approach opens up new possibilities for the practical application of person re-identification systems and advances the field further.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-10T16:16:32Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-09-10T16:16:32Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vii
LIST OF TABLES x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Literature Review 3
1.2.1 Fully Supervised Person Re-ID 3
1.2.2 Domain Generalization Person Re-ID 6
1.2.3 Relevance-aware Mixture of Experts (RaMoE) 7
1.2.4 Meta Batch-Instance Normalization (MetaBIN) 8
1.2.5 Meta Distribution Alignment (MDA) 9
1.2.6 Style Normalization and Restitution (SNR) 10
1.2.7 Domain Invariant Representation Learning (DIR-ReID) 11
1.2.8 Part-Aware-Transformer (PAT) 12
1.3 Contribution 14
1.4 Thesis Organization 15
Chapter 2 Preliminaries 17
2.1 Deep Learning and Networks 17
2.1.1 Multi-Layer Perceptron 17
2.1.2 Attention Mechanism 18
2.1.3 Generative Adversarial Network (GAN) and Gradient Reversal Layer (GRL) 20
2.2 Transformer Networks 22
2.2.1 Transformer Encoder 22
2.2.2 Vision Transformer (ViT) 23
2.3 Objective Functions 25
2.3.1 Cross Entropy Loss 25
2.3.2 Triplet Loss 26
Chapter 3 Generalizable Person Re-identification 27
3.1 Framework Overview 27
3.2 Semantic Aware Mask Generator 28
3.2.1 Part Mask Transformation 30
3.3 Mixture of Expert Domain Invariant Attention (MoE/DIA) Transformer 31
3.3.1 Global Feature Extraction 33
3.3.2 Part Feature Extraction 34
3.3.3 Mixture of Experts and Domain Invariance Attention (MoE/DIA) Layer 34
3.4 Part Feature Learning 42
3.5 Global Feature Learning 44
3.6 Objective Functions 46
3.7 Inference Stage 46
Chapter 4 Experimental Results 48
4.1 System Configuration 48
4.2 Person Re-identification Datasets 50
4.2.1 Market-1501 Dataset 50
4.2.2 DukeMTMC-reID Dataset 50
4.2.3 CUHK03 Dataset 51
4.2.4 MSMT17 51
4.3 Implementation Detail 52
4.4 Comparison with SOTA 52
4.4.1 Multi-source DG ReID 52
4.4.2 Cross Domain ReID 53
4.5 Ablation Studies 55
4.6 System Implementation and Results 57
4.6.1 System implementation 57
4.6.2 Implementation Result 58
(1) Demo on Video from National Taiwan University Advanced Control Laboratory 58
(2) Demo on Security System in Office Building 60
(3) The attention map visualization when occlusion 62
Chapter 5 Conclusions 64
REFERENCES 65
-
dc.language.isoen-
dc.subject語義分割zh_TW
dc.subject領域泛化zh_TW
dc.subject專家混合zh_TW
dc.subject深度學習zh_TW
dc.subject行人重識別zh_TW
dc.subjectDeep Learningen
dc.subjectPerson Re-identificationen
dc.subjectDomain Generalizationen
dc.subjectMixture of Expertsen
dc.subjectSemantic Segmentationen
dc.title利用專家混合與域不變注意機制之域泛化行人重識別系統zh_TW
dc.titleGeneralizable Person Re-Identification System with Mixture of Experts and Domain Invariance Attentionen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee王鈺強;傅楸善;黃世勳;黃正民zh_TW
dc.contributor.oralexamcommitteeYu-Chiang Frank Wang;Chiou-Shann Fuh;Shih-Shinh Huang;Cheng-Ming Huangen
dc.subject.keyword深度學習,行人重識別,領域泛化,專家混合,語義分割,zh_TW
dc.subject.keywordDeep Learning,Person Re-identification,Domain Generalization,Mixture of Experts,Semantic Segmentation,en
dc.relation.page67-
dc.identifier.doi10.6342/NTU202403721-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-08-09-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
dc.date.embargo-lift2029-08-06-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
  未授權公開取用
4.68 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved