使用深度時域對比網絡之人臉情緒辨識

Zi-Jun Li; 黎子駿

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7539

標題:	使用深度時域對比網絡之人臉情緒辨識 Deep Temporal-Contrastive Network for Facial Expression Recognition
作者:	Zi-Jun Li 黎子駿
指導教授:	傅立成
關鍵字:	臉部情緒辨識,卷積神經網絡,對比表達, Facial Expression Recognition,Convolution Neural Network,Contrastive Representation,
出版年 :	2018
學位:	碩士
摘要:	臉部情緒反映了人類心理活動，因此情緒識別是人機互動的關鍵要素。臉部情緒識別，甚至對於人類來說，也是一個具有挑戰性的任務。這主要是因為每個人都有自己表達情緒的強度和方式。為了從不同的個體裡提取出各種表情的共性，個體的個性造成對情緒判別的影響要盡可能地縮小。在本論文中，我們提出使用時域對比的深度網絡來實現一個基於視頻的臉部情緒辨識系統。該深度網絡利用時域上的特徵來減少個體個性造成的影響。外表特徵和幾何特徵分別從人臉照片和人臉關鍵點的坐標通過卷積神經網絡（CNN）和深度神經網絡（DNN）提取出來。為了使模型從相鄰幀（情緒類別、強度相似）提取出來的特徵是相似的，我們使用了額外的損失函數。緊接著，我們通過比較視頻幀在高維空間的距離來挑選出一段視頻中最有代表性的兩幀。我們利用那兩幀在高維空間中的對比表達來做情緒分類。我們使用聯合微調來結合以人臉照片和人臉關鍵點作為輸入的兩個模型。兩個模型相輔相成，使得整個系統得到更好的識別率。我們在兩個廣泛使用在情緒識別的數據集（CK+和 Oulu-CASIA）進行實驗。實驗結果體現出我們提出的方法能夠有效地提取出關鍵幀，而且在情緒識別準確率上優於現今較好的方法。 Facial expression reflects psychological activities of human and it is key factor in interaction between human and machines. Facial expression recognition is a challenging task even for human since individuals have their own way to express their feelings with different intensity. In order to extract commonality of facial expressions from different individuals, personality effect of individual needs to be minimized as much as possible. In this thesis, we construct a video-based facial expression recognition system by using a deep temporal-contrastive network(DTCN) that utilizes the temporal feature to remove the personality effect. Appearance and geometry feature are extracted by CNN and DNN from face image and coordinate of facial landmark, respectively. In order to let our CNN framework be able to extract similar features from adjacent frames, special loss function is introduced. Then, the two most representative frames of a video/image sequence are picked out through comparison of distances among frames. Facial expressions can be classified by the so-called contrastive representation between expressions of those two key frames in high dimension space. We utilize joint fine-tuning to combine two models which take face image and facial landmark as input, respectively. Those two models are complementary and the recognition accuracy is improved by this combination. We conducted our experiment in the most widely used databases (CK+ and Oulu-CASIA) for facial expression recognition. The experiment results show that the proposed method outperforms those from the state-of-the-art methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7539
DOI:	10.6342/NTU201802214
全文授權:	同意授權(全球公開)
電子全文公開日期:	2023-07-31
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	4.12 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。