基於語義的人像解析及注意機制之強化式行人再識別系統

Chuan Kuo; 郭權

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49198

Title:	基於語義的人像解析及注意機制之強化式行人再識別系統 Enhanced Person Re-identification Based on Semantic Human Parsing with Attention Mechanism
Authors:	Chuan Kuo 郭權
Advisor:	傅立成(Li-Chen Fu)
Keyword:	深度學習,資訊檢索,行人再辨識,人類語義分割,注意力機制, Deep learning,Information retrieval,Person re-identification,Human semantic parsing,Attention mechanism,
Publication Year :	2020
Degree:	碩士
Abstract:	近年來，人物重新識別系統已經在計算機視覺領域引起了大量的關注，因為其具有多面向的應用：包括智慧家庭、健康照護以及監視系統。但是實際應用上面臨著許多不同面向的挑戰：穿著相似但身份不同、照相機視角、背景雜亂、行人姿態、遮蔽等因素。而這些降低了特徵上的辨別性以及穩定性，使得人物重新識別成為了一項艱鉅的任務。近年的研究已經嘗試結合人類語義分割或外部屬性的資訊提供，以幫助捕捉更有效的捕捉人的區域以幫助深度學習。另一部分則是想藉由注意力機制去使得深度學習模型可以自行學習，著重於人的資訊而不是背景等不重要的區域。然而，僅基於卷積網路去提取特徵以及損失函數的設計，是很難更有效率學習到重要特徵的。最後，有許多方法已經顯示分部資訊可以進一步的改善表現，但是合理的將特徵劃分成代表豐富信息的合理分部特徵是值得近一步研究的領域。針對上述挑戰，包括穿著相似但身份不同、照相機視角、背景雜亂、行人姿態等，我們結合了人類語意分割和注意力機制的優勢，提出了基於語意分割的注意力機制。為了提供不同的人類資訊給注意力機制，語意人類解析模型將生成五個不同的解析遮罩，包括完整、人、非人、上半和下半遮罩。在解析遮罩的引導下，注意力機制可以更好地理解信息並更有效的增強特徵。最後基於分部特徵的概念，我們提出了空間相關特徵提取，可以有效利用人類語意分割中的信息並生成保留空間資訊且更穩定的特徵。除此之外，透過空間相關特徵提取，也可以有效的解決遮蔽問題。為了驗證我們方法的有效性，我們在兩個具有挑戰性的資料集上進行了一系列實驗:Market1501 和 DukeMTMCreID。其中我們在 Rank1/mAP 的結果，在 Market1501 達到了 95.37% / 89.15%，在 DukeMTMCreID 達到了 89.68% / 78.81%，這樣的表現高過於大多數現今最新的方法。 Nowadays, person reidentification has raised lots of attention in the area of computer vision, because of its full applications, including smart home, elderly care, and surveillance systems. Due to factors such as different identi ties with similar human appearance, camera viewpoint changes, background clutter, posture variation, and occlusion, person reidentification is a chal lenging task. These elements degrade the process of extracting robust and discriminative representations, hence preventing different identities from be ing successfully distinguished. Recent studies have attempted to incorporate semantic human parsing re sults or externally predefined attributes to help capture human parts or impor tant object regions to improve representation learning. Other works introduce attention mechanisms to focus on more important parts in human images in stead of the background. However, only based on CNN feature maps and loss function supervision, it is hard to learn well by attention without any fur ther information. Last, many approaches have proved that part features can further improve the performance, but how to well divide global feature into several reasonable part features that represent rich information is a field that needs to be studied more. In this thesis, to meet the aforementioned challenges for reidentification, we leverage the strengths of semantic human parsing and attention mecha nism and propose the Parsingbased Attention Block (PAB). Specifically, in order to provide different pieces of human information to the attention mechanism, semantic human parsing model will generate five different parsing masks, including Full, Human, Nonhuman, Upper, and Lower masks. Under the guidance of parsing masks, attention can better understand the informa tion and enhance the feature. Next, based on the idea of part feature, we propose the Spacerelated Feature Extraction (SFE), which effectively uti lizes the knowledge from semantic human parsing and generates more stable features that preserve spatial information. Besides, by SFE, we can also elim inate the problem of occlusion while performing reidentification task. Last, with welldesign structure and loss function, we integrate PAB and SFE into backbone model to construct a robust ReID model. To verify the effectiveness of our approach, we perform a series of experi ments on two challenging benchmarks: Market1501 and DukeMTMCreID. The experimental results show that our proposed method achieves 95.37% / 89.15% in Rank1/mAP on the Market1501 and 89.68% / 78.81% on the DukeMTMCreID, which outperform most state-of-the-art methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49198
DOI:	10.6342/NTU202003139
Fulltext Rights:	有償授權
Appears in Collections:	電機工程學系

Files in This Item:

File	Size	Format
U0001-1208202018265700.pdf Restricted Access	21.93 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets