Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79072
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成(Li-Chen Fu)
dc.contributor.authorYan-Ting Linen
dc.contributor.author林彥廷zh_TW
dc.date.accessioned2021-07-11T15:41:42Z-
dc.date.available2024-12-31
dc.date.copyright2021-03-04
dc.date.issued2021
dc.date.submitted2021-02-17
dc.identifier.citation[1] L. Zheng, Y. Yang, and A. J. A. Hauptmann, 'Person Re-identification: Past, Present and Future,' ArXiv vol. abs/1610.02984, 2016.
[2] A. Kläser, M. Marszalek, and C. Schmid, 'A Spatio-Temporal Descriptor Based on 3D-Gradients,' in British Machine Vision Conference (BMVC), 2008.
[3] D. G. Lowe, 'Distinctive Image Features from Scale-Invariant Keypoints,' International Journal of Computer Vision, vol. 60 , pp. 91–110, 2004.
[4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, 'Object Detection with Discriminatively Trained Part-Based Models,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, 2010.
[5] C. Cortes and V. Vapnik, 'Support-vector networks,' Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, 'The Pascal Visual Object Classes (VOC) Challenge,' International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[7] T. Hsu, Y. Yang, T. Yeh, A. Liu, L. Fu, and Y. Zeng, 'Privacy free indoor action detection system using top-view depth camera based on key-poses,' in 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4058-4063, 2016.
[8] T. Tseng, A. Liu, P. Hsiao, C. Huang, and L. Fu, 'Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras,' in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4077-4082, 2014.
[9] S. Lin, A. Liu, T. Hsu, and L. Fu, 'Representative Body Points on Top-View Depth Sequences for Daily Activity Recognition,' in 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2968-2973, 2015.
[10] R. B. Girshick, J. Donahue, T. Darrell, J. Malik, 'Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
[11] N. Dalal and B. Triggs, 'Histograms of oriented gradients for human detection,' in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, vol. 1, pp. 886-893 vol. 1.
[12] Y. Zhang, C.-Y. Wang, X. Wang, W. Zeng, and W. Liu, 'FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking,' ArXiv, vol. abs/2004.01888, 2020.
[13] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep Residual Learning for Image Recognition,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[14] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, 'Omni-Scale Feature Learning for Person Re-Identification,' in IEEE International Conference on Computer Vision (ICCV), 2019.
[15] Z. Dai, M. Chen, X. Gu, S. Zhu, P. Tan, 'Batch DropBlock Network for Person Re-Identification and Beyond,' in IEEE International Conference on Computer Vision (ICCV), 2019.
[16] F. Zheng, X. Sun, X. Jiang, X. Guo, Z. Yu, and F. J. A. Huang, 'A Coarse-to-fine Pyramidal Model for Person Re-identification via Multi-Loss Dynamic Training,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8514-8522, 2019.
[17] T. Chen, S. Ding, J. Xie, Y. Yuan, W. Chen, Y. Yang, Z. Ren, and Z. Wang, 'ABD-Net: Attentive but Diverse Person Re-Identification,'in IEEE International Conference on Computer Vision (ICCV), pp. 8350-8360, 2019.
[18] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, 'Dual Attention Network for Scene Segmentation,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3141-3149, 2019.
[19] N. Bansal, X. Chen, and Z. J. A. Wang, 'Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?,' ArXiv vol. abs/1810.09102, 2018.
[20] K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, 'Identity-Guided Human Semantic Parsing for Person Re-Identification,' in European Conference on Computer Vision (ECCV), 2020.
[21] X. Chen, C. Fu, Y. Zhao, F. Zheng, J. Song, R. Ji, and Y. Yang, 'Salience-Guided Cascaded Suppression Network for Person Re-Identification,' in 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3297-3307, 2020.
[22] H. Chen, B. Lagadec, and F. Bremond, 'Learning Discriminative and Generalizable Representations by Spatial-Channel Partition for Person Re-Identification,' in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2472-2481, 2020.
[23] Z. Zhu, X. Jiang, F. Zheng, X. Guo, F. Huang, W. Zheng, and X. Sun, 'Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification,' in Association for the Advancement of Artificial Intelligence (AAAI), 2020.
[24] A. Krizhevsky, I. Sutskever, and G. Hinton, 'ImageNet Classification with Deep Convolutional Neural Networks,' in Advances in Neural Information Processing Systems (NIPS), 2012.
[25] O. Russakovsky, J. Deng, H. Su., Jonathan K., S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F.-F. Li 'ImageNet Large Scale Visual Recognition Challenge,' International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
[26] M. D. Zeiler and R. Fergus, 'Visualizing and Understanding Convolutional Networks,' in European Conference on Computer Vision (ECCV), 2014.
[27] A. Bochkovskiy, C. Wang, and H. Liao, 'YOLOv4: Optimal Speed and Accuracy of Object Detection,' ArXiv vol. abs/2004.10934, 2020.
[28] K. Simonyan and A. Zisserman, 'Two-Stream Convolutional Networks for Action Recognition in Videos,' in Advances in Neural Information Processing Systems (NIPS), 2014.
[29] C. Feichtenhofer, A. Pinz, and A. Zisserman, 'Convolutional Two-Stream Network Fusion for Video Action Recognition,' in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933-1941, 2016.
[30] K. Simonyan and A. Zisserman, 'Very Deep Convolutional Networks for Large-Scale Image Recognition,' in International Conference on Learning Representations (ICLR), 2015.
[31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, C. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, 'Going deeper with convolutions,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
[32] K. Hornik, M. Stinchcombe, and H. White, 'Multilayer feedforward networks are universal approximators,' Neural Networks, vol. 2, no. 5, pp. 359-366, 1989.
[33] S. Hochreiter, 'The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,' International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, pp. 107-116, 1998.
[34] F. Schroff, D. Kalenichenko, J. J. I. C. o. C. V. Philbin, and P. Recognition, 'FaceNet: A unified embedding for face recognition and clustering,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 815-823, 2015.
[35] A. Hermans, L. Beyer, and B. J. A. Leibe, 'In Defense of the Triplet Loss for Person Re-Identification,' ArXiv vol. abs/1703.07737, 2017.
[36] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, 'A Discriminative Feature Learning Approach for Deep Face Recognition,' in European Conference on Computer Vision (ECCV), 2016.
[37] Z. Zhong, L. Zheng, D. Cao, and S. Li, 'Re-ranking Person Re-identification with k-Reciprocal Encoding,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3652-3661, 2017.
[38] S. Ioffe and C. Szegedy, 'Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,' ArXiv abs/1502.03167, 2015.
[39] D. Ulyanov, A. Vedaldi, and V. Lempitsky, 'Instance Normalization: The Missing Ingredient for Fast Stylization,' ArXiv vol. abs/1607.08022, 2016.
[40] G. E. Hinton, O. Vinyals, and J. Dean, 'Distilling the Knowledge in a Neural Network,' ArXiv vol. abs/1503.02531, 2015.
[41] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, 'Scalable Person Re-identification: A Benchmark,' in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1116-1124, 2015.
[42] X. Wang, R. Girshick, A. Gupta, and K. He, 'Non-local Neural Networks,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[43] X. Pan, P. Luo, J. Shi, and X. Tang, 'Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net,' in European Conference on Computer Vision (ECCV), 2018.
[44] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, 'Random Erasing Data Augmentation,' in Association for the Advancement of Artificial Intelligence (AAAI), 2020.
[45] X. Fan, W. Jiang, H. Luo, and M. Fei, 'SphereReID: Deep Hypersphere Manifold Embedding for Person Re-Identification,' Journal of Visual Communication and Image Representation, vol. 60, pp. 51-58, 2019.
[46] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, 'Rethinking the Inception Architecture for Computer Vision,' in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818-2826, 2016.
[47] E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, C. Tomasi, 'Performance Measures and a Data Set for Multi-target, Multi-camera Tracking,' ArXiv vol. abs/1609.01775, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79072-
dc.description.abstract近年來行人重識別領域受到越來越多的關注,主要是因為深度學習的盛行與其帶來廣大的應用場域,像是智慧家庭、健康照護以及監視系統。行人重識別的困難點在於環境造成的因素居多,像是背景的雜亂、不同視角、光線還有行人被遮蔽的問題,不同的人也擁有相似的外表、衣著與身形也會對行人重識別造成很大的影響。基於上述提到的挑戰,這篇論文設計了一個能夠在應用在實際場域的人物重新識別深度學習模型,具有極高的準確度同時兼顧了模型的複雜度。
過往的監督式人物重新識別模型往往都在硬標籤的監督之下學習,在這種訓練模式下不同類別之間的相對關係往往會被忽略,本論文採用知識蒸餾的概念,將模型深層產生出來的類間關係做為用來訓練模型淺層的軟標籤,搭配設計的知識接收模組,使得深層網路所學到的知識能夠成功地傳遞到淺層網路,模型也針對捲積神經網路的短版整合了非局部注意機制模組,使得模型能夠抓取全局的視野,做更好的預測。
對於背景所造成的雜亂,這篇論文提出了一個新的加權池化方法,能夠聚合特徵圖上所有響應高的特徵並且抑制背景區域響應低的特徵,加權池化能夠改善平均池化與最大池化的缺點,有效的剔除對於判別人物無益的背景雜亂,也考量到了人物身上會具有多個具有判別性的特徵並加以整合。
為了證明此方法的有效性,我們在兩個具有挑戰性的公開資料集上進行了充足的實驗,實驗證明我們的方法表現優於大多數現今最新的方法。
zh_TW
dc.description.abstractIn recent years, person re-identification (Re-ID) has raised lots of attention in the area of computer vision. It is because of the prevalence of deep learning and a wide range of applications including smart home, elderly care and surveillance systems. In general, Re-ID is challenged by background clutter, occlusion, different camera viewpoints and multiplehuman identities that with similar human appearances. The shape of a human body may look completely different from different viewpoints. Hence, tracking humans from distinct cameras remains a challenging problem. These factors hinder the process of extracting robust and discriminate representations.
Most of the fully-supervised person Re-ID models are trained by using one hot ground truth label. The informative inter-class relationships produced by the model itself are not fully exploited. It has been shown that the neural network with deeper layers can produce features that are more discriminative and informative than that with shallow layers. Hence, we propose a self-distilled framework that can distill knowledge of inter-class relationships within the network itself. The learned knowledge in the deeper portion of the networks can be transmitted into the shallow ones by the proposed Knowledge Receiver (RK). We also integrate the spatial non-local attention (SNLA) mechanism into the network to aggregate semantically similar pixels in the spatial domain. With the aid of SNLA, long-range (global) dependencies in feature maps can be captured.
For tackling the abovementioned background clutter issue, we also propose a new Re-weighted Average Pooling (RAP) that takes advantage of average pooling and max pooling. RAP can enlarge the difference of response value between salient points and unimportant regions and aggregates the salient pixels. It can spatially merge all the important points in a feature map for final prediction.
In the conducted experiment, it is shown that the proposed method in this thesis outperforms the state-of-the-art methods.
en
dc.description.provenanceMade available in DSpace on 2021-07-11T15:41:42Z (GMT). No. of bitstreams: 1
U0001-0802202119562700.pdf: 4859424 bytes, checksum: 25c14d5f63126935a2f7df03f66e977d (MD5)
Previous issue date: 2021
en
dc.description.tableofcontentsCONTENTS
口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS v
LIST OF FIGURES viii
LIST OF TABLES x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Literature Review 3
1.2.1 Human Detection 3
1.2.2 Person Re-Identification 5
1.2.3 OSNet 5
1.2.4 BDB 6
1.2.5 Pyramid-Net 7
1.2.6 ABD-Net 8
1.2.7 ISP 9
1.2.8 SCSN 9
1.2.9 SCR 10
1.2.10 VA-reID 12
1.3 Contribution 12
1.4 Thesis Organization 13
Chapter 2 Preliminaries 15
2.1 Convolutional Neural Networks (CNNs) 15
2.2 Residual Network 18
2.3 Objective Functions 19
2.3.1 Cross Entropy Loss 19
2.3.2 Triplet Loss with PK Batch and Batch Hard 20
2.3.3 Center Loss 22
2.4 Information Retrieval 22
2.5 Re-ranking Algorithm 24
2.6 Normalization Techniques 25
2.6.1 Batch Normalization 25
2.6.2 Instance Normalization 26
2.6.3 Comparison of Batch and Instance Normalization 27
Chapter 3 Person Re-Identification 29
3.1 System Overview 29
3.2 Knowledge Distillation 30
3.3 Self-Knowledge Distillation Re-ID 31
3.3.1 Knowledge Formulation 31
3.3.2 Knowledge Receiver (KR) 34
3.4 Re-weighted Average Pooling (RAP) 36
3.4.1 Pros and Cons of GAP and GMP 36
3.4.2 The proposed RAP 37
3.4.3 The Gradient Analysis of GAP, GMP and RAP 38
3.5 Objective Functions 42
Chapter 4 Experiments 44
4.1 Configuration 44
4.2 Training Details 44
4.3 Person Re-identification Datasets 45
4.3.1 Market-1501 Dataset 46
4.3.2 DukeMTMC-reID Dataset 47
4.3.3 Evaluation Metrics 48
4.4 Ablation Studies 50
4.4.1 Effectiveness of the proposed method 51
4.4.2 Knowledge distillation temperatures 51
4.4.3 Effectiveness of Knowledge Receiver 52
4.5 Comparison with SOTA 53
4.6 Visualization 54
Chapter 5 Conclusion 56
Chapter 6 Future works 57
REFERENCE 58
dc.language.isozh-TW
dc.subject深度學習zh_TW
dc.subject資料檢索zh_TW
dc.subject行人重識別zh_TW
dc.subjectPerson re-identificationen
dc.subjectDeep learningen
dc.subjectInformation retrievalen
dc.title透過知識蒸餾學習暨權重平均池化方法於線上行人重識別系統
zh_TW
dc.titleSelf-Knowledge Distillation and Re-weighted Average Pooling for Person Re-identificationen
dc.typeThesis
dc.date.schoolyear109-1
dc.description.degree碩士
dc.contributor.oralexamcommittee王鈺強(Yu-Chiang Frank Wang),莊永裕(Yung-Yu Chuang),黃正民(Cheng-Ming Huang),蘇木春(Mu-Chun Su)
dc.subject.keyword深度學習,資料檢索,行人重識別,zh_TW
dc.subject.keywordDeep learning,Information retrieval,Person re-identification,en
dc.relation.page62
dc.identifier.doi10.6342/NTU202100686
dc.rights.note有償授權
dc.date.accepted2021-02-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
dc.date.embargo-lift2024-12-31-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
U0001-0802202119562700.pdf
  未授權公開取用
4.75 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved