對光線變化具有強健適應的人物重新識別系統輔以基於群聚的損失函數

Yu-Hung Liu; 劉宇閎

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74611

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	傅立成(Li-Chen Fu)
dc.contributor.author	Yu-Hung Liu	en
dc.contributor.author	劉宇閎	zh_TW
dc.date.accessioned	2021-06-17T08:45:39Z	-
dc.date.available	2022-08-16
dc.date.copyright	2019-08-16
dc.date.issued	2019
dc.date.submitted	2019-08-06
dc.identifier.citation	[1] L. Zheng, Y. Yang, and A. G. Hauptmann, 'Person re-identification: Past, present and future,' arXiv preprint arXiv:1610.02984, 2016. [2] A. Klaser, M. Marszałek, and C. Schmid, 'A spatio-temporal descriptor based on 3d-gradients,' in BMVC 2008-19th British Machine Vision Conference, 2008, pp. 275: 1-10: British Machine Vision Association. [3] D. G. Lowe, 'Distinctive image features from scale-invariant keypoints,' International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004. [4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, 'Object detection with discriminatively trained part-based models,' IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627-1645, 2010. [5] C. Cortes and V. Vapnik, 'Support-vector networks,' Machine learning, vol. 20, no. 3, pp. 273-297, 1995. [6] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The pascal visual object classes (voc) challenge,' International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010. [7] T.-W. Hsu, Y.-H. Yang, T.-H. Yeh, A.-S. Liu, L.-C. Fu, and Y.-C. Zeng, 'Privacy free indoor action detection system using top-view depth camera based on key-poses,' in 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016, pp. 004058-004063: IEEE. [8] T.-E. Tseng, A.-S. Liu, P.-H. Hsiao, C.-M. Huang, and L.-C. Fu, 'Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras,' in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 4077-4082: IEEE. [9] S.-C. Lin, A.-S. Liu, T.-W. Hsu, and L.-C. Fu, 'Representative body points on top-view depth sequences for daily activity recognition,' in 2015 IEEE International Conference on Systems, Man, and Cybernetics, 2015, pp. 2968-2973: IEEE. [10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich feature hierarchies for accurate object detection and semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587. [11] N. Dalal and B. Triggs, 'Histograms of oriented gradients for human detection,' in international Conference on computer vision & Pattern Recognition (CVPR'05), 2005, vol. 1, pp. 886--893: IEEE Computer Society. [12] R. Girshick, 'Fast r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. [13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [14] J. Redmon and A. Farhadi, 'YOLO9000: better, faster, stronger,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271. [15] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster r-cnn: Towards real-time object detection with region proposal networks,' in Advances in neural information processing systems, 2015, pp. 91-99. [16] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, 'Person re-identification by multi-channel parts-based cnn with improved triplet loss function,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335-1344. [17] D. Gray, S. Brennan, and H. Tao, 'Evaluating appearance models for recognition, reacquisition, and tracking,' in Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), 2007, vol. 3, no. 5, pp. 1-7: Citeseer. [18] M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani, 'Person re-identification by symmetry-driven accumulation of local features,' in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2360-2367: IEEE. [19] W. Li, R. Zhao, T. Xiao, and X. Wang, 'Deepreid: Deep filter pairing neural network for person re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152-159. [20] M. Saquib Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, 'A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 420-429. [21] Y.-J. Cho and K.-J. Yoon, 'Improving person re-identification via pose-aware multi-shot matching,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1354-1362. [22] C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, 'Pose-driven deep convolutional model for person re-identification,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3960-3969. [23] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, 'Human semantic parsing for person re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1062-1071. [24] K. Gong, X. Liang, D. Zhang, X. Shen, and L. Lin, 'Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 932-940. [25] E. Ristani and C. Tomasi, 'Features for multi-target multi-camera tracking and re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6036-6046. [26] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, 'Realtime multi-person 2d pose estimation using part affinity fields,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291-7299. [27] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, 'Performance measures and a data set for multi-target, multi-camera tracking,' in European Conference on Computer Vision, 2016, pp. 17-35: Springer. [28] Y.-J. Li, F.-E. Yang, Y.-C. Liu, Y.-Y. Yeh, X. Du, and Y.-C. Frank Wang, 'Adaptation and re-identification network: An unsupervised deep transfer learning approach to person re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 172-178. [29] M. Geng, Y. Wang, T. Xiang, and Y. Tian, 'Deep transfer learning for person re-identification,' arXiv preprint arXiv:1611.05244, 2016. [30] P. Peng et al., 'Unsupervised cross-dataset transfer learning for person re-identification,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1306-1315. [31] Y. Ganin and V. Lempitsky, 'Unsupervised domain adaptation by backpropagation,' arXiv preprint arXiv:1409.7495, 2014. [32] I. Goodfellow et al., 'Generative adversarial nets,' in Advances in neural information processing systems, 2014, pp. 2672-2680. [33] W. Deng, L. Zheng, Q. Ye, G. Kang, Y. Yang, and J. Jiao, 'Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 994-1003. [34] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, 'Unpaired image-to-image translation using cycle-consistent adversarial networks,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232. [35] R. Hadsell, S. Chopra, and Y. LeCun, 'Dimensionality reduction by learning an invariant mapping,' in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 2, pp. 1735-1742: IEEE. [36] F. Schroff, D. Kalenichenko, and J. Philbin, 'Facenet: A unified embedding for face recognition and clustering,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823. [37] R. J. Campello, D. Moulavi, and J. Sander, 'Density-based clustering based on hierarchical density estimates,' in Pacific-Asia conference on knowledge discovery and data mining, 2013, pp. 160-172: Springer. [38] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, 'A density-based algorithm for discovering clusters in large spatial databases with noise,' in Kdd, 1996, vol. 96, no. 34, pp. 226-231. [39] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, 'An efficient k-means clustering algorithm: Analysis and implementation,' IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 7, pp. 881-892, 2002. [40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet classification with deep convolutional neural networks,' in Advances in neural information processing systems, 2012, pp. 1097-1105. [41] O. Russakovsky et al., 'Imagenet large scale visual recognition challenge,' International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015. [42] M. D. Zeiler and R. Fergus, 'Visualizing and understanding convolutional networks,' in European conference on computer vision, 2014, pp. 818-833: Springer. [43] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014. [44] K. Hornik, M. Stinchcombe, and H. White, 'Multilayer feedforward networks are universal approximators,' Neural networks, vol. 2, no. 5, pp. 359-366, 1989. [45] S. Hochreiter, 'The vanishing gradient problem during learning recurrent neural nets and problem solutions,' International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107-116, 1998. [46] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [47] G. Varol et al., 'Learning from synthetic humans,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 109-117. [48] I. B. Barbosa, M. Cristani, B. Caputo, A. Rognhaugen, and T. Theoharis, 'Looking beyond appearances: Synthetic training data for deep cnns in re-identification,' Computer Vision and Image Understanding, vol. 167, pp. 50-62, 2018. [49] S. Bak, P. Carr, and J.-F. Lalonde, 'Domain adaptation through synthesis for unsupervised person re-identification,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 189-205. [50] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, 'Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 480-496. [51] Z. Zheng, L. Zheng, and Y. Yang, 'Pedestrian alignment network for large-scale person re-identification,' IEEE Transactions on Circuits and Systems for Video Technology, 2018. [52] W. Li, X. Zhu, and S. Gong, 'Person re-identification by deep joint learning of multi-loss classification,' arXiv preprint arXiv:1705.04724, 2017. [53] H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, 'Bags of Tricks and A Strong Baseline for Deep Person Re-identification,' arXiv preprint arXiv:1903.07071, 2019. [54] L. v. d. Maaten and G. Hinton, 'Visualizing data using t-SNE,' Journal of machine learning research, vol. 9, no. Nov, pp. 2579-2605, 2008. [55] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, 'Adversarial discriminative domain adaptation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7167-7176. [56] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, 'A discriminative feature learning approach for deep face recognition,' in European conference on computer vision, 2016, pp. 499-515: Springer. [57] G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, 'Learning discriminative features with multiple granularities for person re-identification,' in 2018 ACM Multimedia Conference on Multimedia Conference, 2018, pp. 274-282: ACM. [58] B. De Brabandere, D. Neven, and L. Van Gool, 'Semantic instance segmentation with a discriminative loss function,' arXiv preprint arXiv:1708.02551, 2017. [59] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, 'Arcface: Additive angular margin loss for deep face recognition,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690-4699. [60] A. Hermans, L. Beyer, and B. Leibe, 'In defense of the triplet loss for person re-identification,' arXiv preprint arXiv:1703.07737, 2017. [61] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, 'Scalable person re-identification: A benchmark,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1116-1124. [62] Z. Zheng, L. Zheng, and Y. Yang, 'Unlabeled samples generated by gan improve the person re-identification baseline in vitro,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3754-3762. [63] X. Glorot, A. Bordes, and Y. Bengio, 'Deep sparse rectifier neural networks,' in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315-323. [64] L. Bottou, 'Large-scale machine learning with stochastic gradient descent,' in Proceedings of COMPSTAT'2010: Springer, 2010, pp. 177-186. [65] L. Wu, C. Shen, and A. van den Hengel, 'Deep linear discriminant analysis on fisher networks: A hybrid architecture for person re-identification,' Pattern Recognition, vol. 65, pp. 238-250, 2017. [66] Y.-C. Chen, X. Zhu, W.-S. Zheng, and J.-H. Lai, 'Person re-identification by camera correlation aware feature augmentation,' IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 2, pp. 392-408, 2017. [67] Y. Sun, L. Zheng, W. Deng, and S. Wang, 'Svdnet for pedestrian retrieval,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3800-3808. [68] Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang, 'Camera style adaptation for person re-identification,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5157-5166. [69] S. Liao, Y. Hu, X. Zhu, and S. Z. Li, 'Person re-identification by local maximal occurrence representation and metric learning,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2197-2206. [70] Z. Zheng, L. Zheng, and Y. Yang, 'A discriminatively learned cnn embedding for person reidentification,' ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 1, p. 13, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74611	-
dc.description.abstract	近年來，人物重新識別系統受到大量的關注，因為其有廣大的應用場域像是智慧家庭、健康照護以及監視系統。但是隨著視角的改變以及拍攝相機的位置不同，人的輪廓外觀也會跟著不同，這造成了從不同的視角進行行人追蹤仍然是個挑戰。除此之外，在實際應用的場域中在各個相機之間的光線照射位置及程度都是不同的，而在當前的人物重新識別模組中往往只會透過現有的資料集來學習這使得模型不具有能應付光亮變化的能力。最後我們在分析了當前最先進的人物重新識別期刊文章後我們也發現，目前大多數的研究為了要訂出閾值來區分正樣本及負樣本，都會採用指標損失函數來做模型優化，雖然指標損失函數可以有效的區分正負樣本，但使用這種損失函數卻需要面臨因時間複雜度過高而使得訓練冗長，導致我們將模型移轉到新環境時即使已經蒐集好新環境的資料仍然需要花費大量時間進行模型校正。首先若要解決光亮變化的問題，最直觀的方式就是搜集大量光亮變化的人物重新識別資料集，但是這件事情卻是相當困難且需耗費相當多時間以及人力的。基於上述原因，本論文提出了一種透過合成資料來協助模型訓練以提取無關光亮變化的特徵向量。而針對指標損失函數時間複雜度過高的缺點，本研究也提出了一種基於群聚的損失函數以降低時間複雜度並且效能更優於指標損失函數。。並在最終實驗也證明本論文提出的方法及損失函數在人物重新識別的任務中超過其他行人重新識別方法。	zh_TW
dc.description.abstract	Nowadays, person re-identification has raised lots of attention in the area of computer vision, because of its wide applications, including smart home, elderly care, and surveillance systems. From different viewpoints, the shape of the human body looks completely different, hence tracking human from different camera remains a challenging problem. In addition, the locations and levels of light illumination can be different among cameras in the field of actual application. However, the existing person re-identification module is often learned only through the available dataset, which make model fail to be robust to situations with illumination change. Finally, after analyzing the recent literature on pedestrian re-recognition, we also found that most of the current researches use the metric loss function to optimize the model with an appropriate threshold to distinguish the positive samples from the negative ones. Despite the metric loss function can perform objective distinction as mentioned, it remains to have a disadvantage of having highly complex, which make the training process lengthy. First of all, to solve the problem of brightness changes, the most intuitive way is to collect an even larger person re-identification dataset subject to various brightness levels, which however is very expensive to collect and label. Therefore, this thesis proposes an illumination-invariant feature vector that assists model training based on synthetic data. To remove the shortcomings of the time complexity of the metric loss function, we propose the clustering-based loss function to reduce the time complexity, and we also show that the performance of the proposed loss function is better than metric loss function. In the final experiment, it is also proved that the proposed method in this thesis excels the state-of-the-art methods on resolving person re-identification problems.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:45:39Z (GMT). No. of bitstreams: 1 ntu-108-R06921066-1.pdf: 15125192 bytes, checksum: 3de62dad724b8d2cef62b494185f0ce8 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 I 摘要 II ABSTRACT III TABLE OF CONTENTS V LIST OF FIGURES VIII LIST OF TABLES XI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Literature Review 3 1.2.1 Human Detection 3 1.2.2 Person Re-Identification 7 1.2.3 Domain Adaptation 10 1.3 Contributions 11 1.4 Thesis Organization 12 Chapter 2 Preliminaries 14 2.1 Cluster Analysis and K-means 14 2.2 Convolutional Neutral Network 17 2.2.1 Convolutional Layers 18 2.2.2 Residual Network 20 2.3 Real-time Pose Estimation Module 23 2.4 Information Retrieval 24 Chapter 3 Person Re-Identification 26 3.1 Learn an Illumination-Invariant Feature 26 3.1.1 Synthetic Dataset 26 3.1.2 Learn from synthetic data 28 3.1.3 Domain Adaptation by Adversarial Learning 30 3.2 Assist by Clustering 33 3.2.1 Clustering Loss 34 3.2.2 Adaptive Weighted Clustering Loss 36 3.2.3 Hard Clustering Mining 37 Chapter 4 ACL Re-Identification Dataset 39 4.1 Environment setting 39 4.2 Preprocessing 42 Chapter 5 Experiments 46 5.1 Configuration 46 5.2 Implementation Details 47 5.2.1 Network design 47 5.2.2 Training Details 48 5.3 Person Re-Identification Dataset 50 5.3.1 Market-1501 Dataset 50 5.3.2 DukeMTMC-reID Dataset 51 5.3.3 Evaluation Metrics 53 5.4 Cross-Illumination Classification Result 54 5.5 Person Re-Identification Result 55 5.5.1 Ablation study 55 5.5.2 The Result of Market-1501 Dataset 58 5.5.3 The Result of DukeMTMC-reID Dataset 59 5.5.4 The Result of ACL-reID 61 Chapter 6 Conclusion and Future Works 62 REFERENCE 63
dc.language.iso	en
dc.title	對光線變化具有強健適應的人物重新識別系統輔以基於群聚的損失函數	zh_TW
dc.title	Person Re-Identification Robust to Illumination Change with Clustering-based Loss Function	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃正民(Cheng-Ming Huang),張文中(Wen-Chung Chang),王鈺強(Yu-Chiang Wang),傅楸善(Chiou-Shann Fuh)
dc.subject.keyword	深度學習,資料檢索,人物重新識別,聚合損失函數,	zh_TW
dc.subject.keyword	Deep learning,Information retrieval,Person re-identification,Clustering-based loss function,	en
dc.relation.page	67
dc.identifier.doi	10.6342/NTU201902471
dc.rights.note	有償授權
dc.date.accepted	2019-08-06
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	14.77 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。