請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7539完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 傅立成 | |
| dc.contributor.author | Zi-Jun Li | en |
| dc.contributor.author | 黎子駿 | zh_TW |
| dc.date.accessioned | 2021-05-19T17:45:56Z | - |
| dc.date.available | 2023-07-31 | |
| dc.date.available | 2021-05-19T17:45:56Z | - |
| dc.date.copyright | 2018-07-31 | |
| dc.date.issued | 2018 | |
| dc.date.submitted | 2018-07-30 | |
| dc.identifier.citation | [1] Edwinn Gamborino, Vicente Queiroz, Zih-Yun Chiu, Zi-Jun Li, and L.-C. Fu, 'Interactive Reinforcement Learning based Assistive Robot Action Planner for the Emotional Support of Children,' Under Review, 2018.
[2] Y. Sun, X. Wang, and X. Tang, 'Deep learning face representation from predicting 10,000 classes,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1891-1898. [3] H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, 'Joint fine-tuning in deep neural networks for facial expression recognition,' in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2983-2991: IEEE. [4] Y. Kim, B. Yoo, Y. Kwak, C. Choi, and J. Kim, 'Deep generative-contrastive networks for facial expression recognition,' arXiv preprint arXiv:1703.07140, 2017. [5] X. Zhao et al., 'Peak-piloted deep network for facial expression recognition,' in European Conference on Computer Vision (ECCV), 2016, pp. 425-442: Springer. [6] S. Ren, X. Cao, Y. Wei, and J. Sun, 'Face alignment at 3000 fps via regressing local binary features,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685-1692. [7] K. Zhao, W.-S. Chu, and H. Zhang, 'Deep region and multi-label learning for facial action unit detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3391-3399. [8] P. Ekman and W. V. Friesen, Facial Action Coding System: Investigatoris Guide. Consulting Psychologists Press, 1978. [9] A. Klaser, M. Marszałek, and C. Schmid, 'A spatio-temporal descriptor based on 3d-gradients,' in British Machine Vision Conference (BMVC), 2008, pp. 275: 1-10: British Machine Vision Association. [10] P. Scovanner, S. Ali, and M. Shah, 'A 3-dimensional sift descriptor and its application to action recognition,' in Proceedings of the 15th ACM international conference on Multimedia, 2007, pp. 357-360: ACM. [11] G. Zhao and M. Pietikainen, 'Dynamic texture recognition using local binary patterns with an application to facial expressions,' IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 915-928, 2007. [12] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. PietikäInen, 'Facial expression recognition from near-infrared videos,' Image and Vision Computing, vol. 29, no. 9, pp. 607-619, 2011. [13] Y. Fan, X. Lu, D. Li, and Y. Liu, 'Video-based emotion recognition using CNN-RNN and C3D hybrid networks,' in Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016, pp. 445-450: ACM. [14] B. Hasani and M. H. Mahoor, 'Spatio-temporal facial expression recognition using convolutional neural networks and conditional random fields,' in Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on, 2017, pp. 790-795: IEEE. [15] J. Zhou, X. Hong, F. Su, and G. Zhao, 'Recurrent convolutional neural network regression for continuous pain intensity estimation in video,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 84-92. [16] B. Hasani and M. H. Mahoor, 'Facial expression recognition using enhanced deep 3D convolutional neural networks,' in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 2278-2288: IEEE. [17] P. Viola and M. J. Jones, 'Robust real-time face detection,' International journal of computer vision, vol. 57, no. 2, pp. 137-154, 2004. [18] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, 'A convolutional neural network cascade for face detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5325-5334. [19] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, 'S3FD: Single Shot Scale-invariant Face Detector,' arXiv preprint arXiv:1708.05237, 2017. [20] A. Zadeh, Y. C. Lim, T. Baltrusaitis, and L.-P. Morency, 'Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection,' in Proceedings of the IEEE International Conference on Computer Vision Workshop, 2017, pp. 2519-2528. [21] D. Merget, M. Rock, and G. Rigoll, 'Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 781-790. [22] M. Liu, S. Shan, R. Wang, and X. Chen, 'Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,' in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1749-1756: IEEE. [23] A. Kacem, M. Daoudi, B. B. Amor, and J. C. Alvarez-Paiva, 'A novel space-time representation on the positive semidefinite cone for facial expression recognition,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3180-3189. [24] O. M. Parkhi, A. Vedaldi, and A. Zisserman, 'Deep Face Recognition,' in British Machine Vision Conference (BMVC), 2015, vol. 1, no. 3, p. 6. [25] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014. [26] R. Ranjan, V. M. Patel, and R. Chellappa, 'Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition,' IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. [27] R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, 'An all-in-one convolutional neural network for face analysis,' in Proceedings of 12th IEEE International Conference on Automatic Face & Gesture Recognition, 2017, pp. 17-24: IEEE. [28] W. Li, F. Abtahi, Z. Zhu, and L. Yin, 'Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection,' in Proceedings of 12th IEEE International Conference on Automatic Face & Gesture Recognition, 2017, pp. 103-110: IEEE. [29] D. E. King, 'Dlib-ml: A machine learning toolkit,' Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1755-1758, 2009. [30] A.-S. Liu, Z.-J. Li, T.-H. Yeh, Y.-H. Yang, and L.-C. Fu, 'Partially transferred convolution neural network with cross-layer inheriting for posture recognition from top-view depth camera,' in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 4139-4143: IEEE. [31] S. Ioffe and C. Szegedy, 'Batch normalization: Accelerating deep network training by reducing internal covariate shift,' arXiv preprint arXiv:1502.03167, 2015. [32] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun, 'Bayesian face revisited: A joint formulation,' in European Conference on Computer Vision, 2012, pp. 566-579: Springer. [33] T.-W. Hsu, Y.-H. Yang, T.-H. Yeh, A.-S. Liu, L.-C. Fu, and Y.-C. Zeng, 'Privacy free indoor action detection system using top-view depth camera based on key-poses,' in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016, pp. 004058-004063: IEEE. [34] R. Hadsell, S. Chopra, and Y. LeCun, 'Dimensionality reduction by learning an invariant mapping,' in Proceedings of IEEE computer society conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 1735-1742: IEEE. [35] F. Schroff, D. Kalenichenko, and J. Philbin, 'Facenet: A unified embedding for face recognition and clustering,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823. [36] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, 'A discriminative feature learning approach for deep face recognition,' in European Conference on Computer Vision, 2016, pp. 499-515: Springer. [37] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, 'The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,' in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2010, pp. 94-101: IEEE. [38] M. Abadi et al., 'TensorFlow: A System for Large-Scale Machine Learning,' in USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, vol. 16, pp. 265-283. [39] M. Liu, S. Li, S. Shan, R. Wang, and X. Chen, 'Deeply learning deformable facial action parts model for dynamic expression analysis,' in Asian conference on computer vision, 2014, pp. 143-157: Springer. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7539 | - |
| dc.description.abstract | 臉部情緒反映了人類心理活動,因此情緒識別是人機互動的關鍵要素。臉部情緒識別,甚至對於人類來說,也是一個具有挑戰性的任務。這主要是因為每個人都有自己表達情緒的強度和方式。為了從不同的個體裡提取出各種表情的共性,個體的個性造成對情緒判別的影響要盡可能地縮小。
在本論文中,我們提出使用時域對比的深度網絡來實現一個基於視頻的臉部情緒辨識系統。該深度網絡利用時域上的特徵來減少個體個性造成的影響。外表特徵和幾何特徵分別從人臉照片和人臉關鍵點的坐標通過卷積神經網絡(CNN)和深度神經網絡(DNN)提取出來。為了使模型從相鄰幀(情緒類別、強度相似)提取出來的特徵是相似的,我們使用了額外的損失函數。緊接著,我們通過比較視頻幀在高維空間的距離來挑選出一段視頻中最有代表性的兩幀。我們利用那兩幀在高維空間中的對比表達來做情緒分類。 我們使用聯合微調來結合以人臉照片和人臉關鍵點作為輸入的兩個模型。兩個模型相輔相成,使得整個系統得到更好的識別率。 我們在兩個廣泛使用在情緒識別的數據集(CK+和 Oulu-CASIA)進行實驗。實驗結果體現出我們提出的方法能夠有效地提取出關鍵幀,而且在情緒識別準確率上優於現今較好的方法。 | zh_TW |
| dc.description.abstract | Facial expression reflects psychological activities of human and it is key factor in interaction between human and machines. Facial expression recognition is a challenging task even for human since individuals have their own way to express their feelings with different intensity. In order to extract commonality of facial expressions from different individuals, personality effect of individual needs to be minimized as much as possible.
In this thesis, we construct a video-based facial expression recognition system by using a deep temporal-contrastive network(DTCN) that utilizes the temporal feature to remove the personality effect. Appearance and geometry feature are extracted by CNN and DNN from face image and coordinate of facial landmark, respectively. In order to let our CNN framework be able to extract similar features from adjacent frames, special loss function is introduced. Then, the two most representative frames of a video/image sequence are picked out through comparison of distances among frames. Facial expressions can be classified by the so-called contrastive representation between expressions of those two key frames in high dimension space. We utilize joint fine-tuning to combine two models which take face image and facial landmark as input, respectively. Those two models are complementary and the recognition accuracy is improved by this combination. We conducted our experiment in the most widely used databases (CK+ and Oulu-CASIA) for facial expression recognition. The experiment results show that the proposed method outperforms those from the state-of-the-art methods. | en |
| dc.description.provenance | Made available in DSpace on 2021-05-19T17:45:56Z (GMT). No. of bitstreams: 1 ntu-107-R05921097-1.pdf: 4218329 bytes, checksum: e9b488479dba3c26f15789def2f19742 (MD5) Previous issue date: 2018 | en |
| dc.description.tableofcontents | 口試委員會審定書 #
誌謝 I 摘要 II ABSTRACT III TABLE OF CONTENTS IV LIST OF FIGURES VII LIST OF TABLES IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Literature Review 3 1.2.1 Facial Analysis 3 1.2.2 Facial Expression Recognition and Detection 4 1.2.3 Image-Based and Video-Based Methods for FER 5 1.3 Contribution 9 1.4 Thesis Organization 10 Chapter 2 Preliminaries 11 2.1 Face Detection 11 2.1.1 Introduction to Face Detection 11 2.1.2 Face Detection Based on Hand-Crafted Feature 11 2.1.3 Face Detection Based on Deep Learning 13 2.2 Face Alignment 14 2.2.1 Introduction to Face Alignment 14 2.2.2 Facial Landmark Localization 15 2.2.3 Face Alignment and Warping 16 2.3 Convolution Neural Network 17 2.3.1 Introduction to Convolution Neural Network 17 2.3.2 CNN for Facial Analysis 20 Chapter 3 Facial Expression Recognition 22 3.1 Preprocessing 22 3.2 Structure of DTCN 24 3.3 Temporal-Contrastive Appearance Network 25 3.3.1 Transfer Learning for TCAN 25 3.3.2 Contrastive Representation of TCAN 27 3.3.3 Training Process of TCAN 29 3.3.4 Loss Function 30 3.4 Temporal-Contrastive Geometry Network 34 3.4.1 Architecture of TCGN 34 3.4.2 Contrastive Representation of TCGN 35 3.4.3 Training Process of TCGN 37 3.5 Deep Temporal-Contrastive Network 38 3.5.1 DTCN: Combination of TCAN and TCGN 38 3.5.2 Attributes of DTCN 40 Chapter 4 Experiment 41 4.1 Configuration 41 4.2 Description of Dataset and Evaluation 42 4.2.1 The Extended Cohn-Kanade (CK+) 42 4.2.2 Oulu-CASIA 43 4.2.3 Evaluation 44 4.3 Quantity Results 46 4.3.1 Results on CK+ 46 4.3.2 Results on Oulu-CASIA 50 4.4 Quality Analysis 54 Chapter 5 Conclusion and Future Work 57 REFERENCE 58 | |
| dc.language.iso | zh-TW | |
| dc.title | 使用深度時域對比網絡之人臉情緒辨識 | zh_TW |
| dc.title | Deep Temporal-Contrastive Network for Facial Expression Recognition | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 106-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 洪一平,莊永裕,陳祝嵩,李明穗 | |
| dc.subject.keyword | 臉部情緒辨識,卷積神經網絡,對比表達, | zh_TW |
| dc.subject.keyword | Facial Expression Recognition,Convolution Neural Network,Contrastive Representation, | en |
| dc.relation.page | 60 | |
| dc.identifier.doi | 10.6342/NTU201802214 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2018-07-30 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2023-07-31 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-107-1.pdf | 4.12 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
