請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84437
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 傅立成(Li-Chen Fu) | |
dc.contributor.author | Hsueh-Wei Chen | en |
dc.contributor.author | 陳學韋 | zh_TW |
dc.date.accessioned | 2023-03-19T22:11:35Z | - |
dc.date.copyright | 2022-10-19 | |
dc.date.issued | 2022 | |
dc.date.submitted | 2022-09-23 | |
dc.identifier.citation | [1] Xiao Li, Dong Zhang, Ming Li, and Dah-Jye Lee. Accurate head pose estimation using image rectification and a lightweight convolutional neural network. IEEE Transactions on Multimedia, pages 1–1, 2022. [2] Prajval Kumar Murali, Mohsen Kaboli, and Ravinder Dahiya. Intelligent in-vehicle interaction technologies. Advanced Intelligent Systems, 4(2):2100122, 2022. [3] Hai Liu, Tingting Liu, Zhaoli Zhang, Arun Kumar Sangaiah, Bing Yang, and Youfu Li. Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Transactions on Industrial Informatics, 18(10):7107–7117, 2022. [4] Jamie Sherrah, Shaogang Gong, and Eng-Jon Ong. Understanding pose discrimination in similarity space. In BMVC, 1999. [5] Jamie Sherrah, Shaogang Gong, and Eng-Jon Ong. Face distributions in similarity space under varying head pose. Image Vis. Comput., 19:807–819, 2001. [6] Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 2144–2151, 2011. [7] Nicolas Gourier and James Crowley. Estimating face orientation from robust detection of salient facial structures. FG Net Workshop on Visual Observation of Deictic Gestures, 01 2004. [8] Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. Face alignment across large poses: A 3d solution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 146–155, 2016. [9] Gabriele Fanelli, Matthias Dantone, Juergen Gall, Andrea Fossati, and Luc Gool. Random forests for real time 3d face analysis. Int. J. Comput. Vision, 101(3):437–458, feb 2013. [10] Sankha S. Mukherjee and Neil Martin Robertson. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Transactions on Multimedia, 17(11):2094–2107, 2015. [11] Guido Borghi, Matteo Fabbri, Roberto Vezzani, Simone Calderara, and Rita Cucchiara. Face-from-depth for head pose estimation on depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3):596–609, 2020. [12] Nataniel Ruiz, Eunji Chong, and James M. Rehg. Fine-grained head pose estimation without keypoints. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2155–215509, 2018. [13] Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, and Yung-Yu Chuang. Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1087–1096, 2019. [14] Yijun Zhou and James Gregson. Whenet: Real-time fine-grained estimation for wide range head pose. In 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK, September 7-10, 2020. BMVA Press, 2020. [15] Naina Dhingra. Lwposr: Lightweight efficient fine grained head pose estimation. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1204–1214, 2022. [16] Hao Zhang, Mengmeng Wang, Yong Liu, and Yi Yuan. Fdn: Feature decoupling network for head pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):12789–12796, Apr. 2020. [17] Ardhendu Behera, Zachary Wharton, Pradeep Hewage, and Swagat Kumar. Rotation axis focused attention network (rafa-net) for estimating head pose. In Computer Vision–ACCV 2020: 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30–December 4, 2020, Revised Selected Papers, Part V, page 223–240, Berlin, Heidelberg, 2020. Springer-Verlag. [18] Hai Liu, Shuai Fang, Zhaoli Zhang, Duantengchuan Li, Ke Lin, and Jiazhang Wang. Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia, 24:2449–2460, 2022. [19] Donggen Dai, Wangkit Wong, and Zhuojun Chen. Rankpose: Learning generalised feature with rank supervision for head pose estimation. In 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK, September 7-10, 2020. BMVA Press, 2020. [20] Yuxuan Zhang, Xin Wang, M. Saad Shakeel, Hao Wan, and Wenxiong Kang. Learning upper patch attention using dual-branch training strategy for masked face recognition. Pattern Recogn., 126(C), jun 2022. [21] Qiangchang Wang and Guodong Guo. Dsa-face: Diverse and sparse attentions for face recognition robust to pose variation and occlusion. IEEE Transactions on Information Forensics and Security, 16:4534–4543, 2021. [22] Yande Li, Kun Guo, Yonggang Lu, and Li Liu. Cropping and attention based approach for masked face recognition. Applied Intelligence, 51(5):3012–3025, 2021. [23] Muhammad Sadiq, D Shi, and Junwei Liang. A robust occlusion-adaptive attentionbased deep network for facial landmark detection. Applied Intelligence, 52(8):9320–9333, 2022. [24] Yuyang Sha. Towards occlusion robust facial landmark detector. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–8, 2021. [25] Lu Zhou, Yingying Chen, Yunze Gao, Jinqiao Wang, and Hanqing Lu. Occlusion-aware siamese network for human pose estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX, page 396–412, Berlin, Heidelberg, 2020. Springer-Verlag. [26] Shanshan Zhang, Jian Yang, and Bernt Schiele. Occluded pedestrian detection through guided attention in cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [27] Vahid Kazemi and Josephine Sullivan. One millisecond face alignment with an ensemble of regression trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1867–1874, 2014. [28] Adrian Bulat and Georgios Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In 2017 IEEE International Conference on Computer Vision (ICCV), pages 1021–1030, 2017. [29] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, jun 1981. [30] Amit Kumar, Azadeh Alavi, and Rama Chellappa. Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 258–265, 2017. [31] Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. Towards fast, accurate and stable 3d dense face alignment. In Proceedings of the European Conference on Computer Vision (ECCV), 2020. [32] Miao Xin, Shentong Mo, and Yuanze Lin. Eva-gcn: Head pose estimation based on graph convolutional networks. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1462–1471, 2021. [33] Shentong Mo and Xin Miao. Osgg-net: One-step graph generation network for unbiased head pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 2465–2473, New York, NY, USA, 2021. Association for Computing Machinery. [34] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. [35] Naina Dhingra. Headposr: End-to-end trainable head pose estimation using transformer encoders. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–8, 2021. [36] Byungtae Ahn, Jaesik Park, and In So Kweon. Real-time head orientation from a monocular camera using deep neural network. In Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang, editors, Computer Vision – ACCV 2014, pages 82–96, Cham, 2015. Springer International Publishing. [37] Xiabing Liu, Wei Liang, Yumeng Wang, Shuyang Li, and Mingtao Pei. 3d head pose estimation with convolutional neural network trained on synthetic images. In 2016 IEEE International Conference on Image Processing (ICIP), pages 1289–1293, 2016. [38] Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee. Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Transactions on Multimedia, 21(4):1035–1046, 2019. [39] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5738–5746, 2019. [40] Vincent Lepetit and Pascal Fua. Monocular model-based 3d tracking of rigid objects: A survey. Foundations and Trends® in Computer Graphics and Vision, 1(1):1–89, 2005. [41] Zhiwen Cao, Zongcheng Chu, Dongfang Liu, and Yingjie Chen. A vector-based representation to enhance head pose estimation. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1187–1196, 2021. [42] Thorsten Hempel, Ahmed A. Abdelrahman, and Ayoub Al-Hamadi. 6d rotation representation for unconstrained head pose estimation. ArXiv, abs/2202.12555, 2022. [43] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018. [44] Yi Zhang, Keren Fu, Jiang Wang, and Peng Cheng. Learning from discrete gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing, 407:259–269, 2020. [45] Francois Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [46] Xiangxin Zhu and Deva Ramanan. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2879–2886, 2012. [47] Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1):121–135, 2019. [48] Yepeng Liu, Zaiwang Gu, Shenghua Gao, Dong Wang, Yusheng Zeng, and Jun Cheng. MOS: A low latency and lightweight framework for face detection, landmark localization, and head pose estimation. In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021, page 266. BMVA Press, 2021. [49] Roberto Valle, José M. Buenaposada, and Luis Baumela. Multi-task head pose estimation in-the-wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8):2874–2881, 2021. [50] David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1150–1157. IEEE, 1999. [51] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005. [52] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 807–814, Madison, WI, USA, 2010. Omnipress. [53] Jack Kiefer and Jacob Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952. [54] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [55] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015. [56] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. [57] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. [58] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017. [59] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [60] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 3–19, Cham, 2018. Springer International Publishing. [61] Yichao Liu, Zongru Shao, and Nico Hoffmann. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv preprint arXiv:2112.05561, 2021. [62] Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 510–519, 2019. [63] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [64] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 568–578, 2021. [65] Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22–31, 2021. [66] Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style convnets great again. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13728–13737, 2021. [67] Aqeel Anwar and Arijit Raychowdhury. Masked face recognition for secure authentication. CoRR, abs/2008.11104, 2020. [68] Mingzhen Shao, Zhun Sun, Mete Ozay, and Takayuki Okatani. Improving head pose estimation with a combined loss and bounding box margin adjustment. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–5, 2019. [69] Bin Huang, Renwen Chen, Wang Xu, and Qinbang Zhou. Improving head pose estimation using two-stage ensembles with top-k regression. Image Vision Comput., 93(C), jan 2020. [70] Chull Hwan Song, Hye Joo Han, and Yannis Avrithis. All the attention you need: Global-local, spatial-channel attention for image retrieval. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 439–448, 2022. [71] Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, and Kilian Weinberger. Deep feature interpolation for image content changes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6090–6099, 2017. [72] Yulin Wang, Gao Huang, Shiji Song, Xuran Pan, Yitong Xia, and Cheng Wu. Regularizing deep networks with semantic data augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3733–3748, 2022. [73] Jing Yang, Qingshan Liu, and Kaihua Zhang. Stacked hourglass network for robust facial landmark localisation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2025–2033, 2017. [74] Wenyan Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. Look at boundary: A boundary-aware face alignment algorithm. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2129–2138, 2018. [75] Francois Chollet. Xception Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [76] Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. Detecting masked faces in the wild with lle-cnns. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 426–434, 2017. [77] Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In 2013 IEEE International Conference on Computer Vision Workshops, pages 397–403, 2013. [78] Xiangxin Zhu and Deva Ramanan. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2879–2886, 2012. [79] Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, and Neeraj Kumar. Localizing parts of faces using a consensus of exemplars. In CVPR 2011, pages 545–552, 2011. [80] Erjin Zhou, Haoqiang Fan, Zhimin Cao, Yuning Jiang, and Qi Yin. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In 2013 IEEE International Conference on Computer Vision Workshops, pages 386–391, 2013. [81] Erik Murphy-Chutorian and Mohan Manubhai Trivedi. Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4):607–626, 2009. [82] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23:1499–1503, 2016. [83] Shuang Li, Xin Ning, Lina Yu, Liping Zhang, Xiaoli Dong, Yuan Shi, and Wei He. Multi-angle head pose classification when wearing the mask for face recognition under the covid-19 coronavirus epidemic. In 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), pages 1–5, 2020. [84] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advancesin Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. [85] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84437 | - |
dc.description.abstract | 近年來,頭部姿態估計任務在電腦視覺領域愈來愈受重視和關注,並應用深度卷積網路取得了顯著性的進步,其任務在於預測影像或是影片中人臉的三維角度資訊。對於許多任務而言,準確了解頭部姿態估計資訊是相當重要,像是用來監督乘客和駕駛者的狀態以防止行車意外事故發生、在人機互動系統中判斷頭部的狀態來做對應的指令動作。由於當今受COVID-19疫情的影響,人們在公眾場合中需佩帶口罩,即使是在車上密閉空間中也都是需要佩帶。先前的頭部姿態估計研究在人臉遮蔽情境下仍是具有挑戰性。因此,如何解決口罩遮蔽問題變得相當重要。 本論文針對人臉口罩遮蔽情境下的頭部姿態估計問題提出對應之解決方案,我們設計基於端對端訓練的深度卷積網路架構之深度學習模型架構,並加入注意機制模組用來增強區域特徵和全域特徵中重要的資訊。另外,使用特徵插值正規化模組和多任務學習策略來優化模型學習到的特徵和從臉部關鍵點偵測任務中學習額外的資訊來提升模型效能和強健姓。此外,為了解決原本資料集上較少口罩遮蔽的情境,我們使用資料擴增技術來生成人臉口罩資料以輔助模型學習。 為了驗證本研究的可行性,本研究使用了頭部姿態估計公開訓練集300W-LP和BIWI訓練模型,並於測試資料集AFLW2000、BIWI和MAFA進行評估。我們首先對設計的模組進行消融研究,以證明提出的方法能提升關注任務的效能。其次,與其他先進的方法進行數據上的比較,由實驗結果顯示本方法獲得了具有競爭力的結果。 | zh_TW |
dc.description.abstract | In recent years, using deep convolutional networks to estimate head pose accurately has gained significant interest in computer vision. The aim of the head pose estimation task is to predict the three-dimensional orientation information of human faces in images or videos. For many applications, precisely realizing head pose estimation information is essential and beneficial, such as monitoring passengers' and drivers' status to prevent traffic accidents and determining the human faces' status to ensure the appropriate command in the human-computer interaction systems. Recently, due to the impact of the COVID-19 pandemic, people need to wear facial masks in almost all the public places, sometimes even including the interior of a vehicle, but the previous researches on head pose estimation have become even more challenging in face occlusion situations. Therefore, how to solve this challenging situation becomes quite important. In this thesis, we propose a solution to tackle the head pose estimation task, which can be more robust in the facial mask situation. Therefore, we design a deep learning model through end-to-end training and incorporate the attention mechanism to enhance the critical information on local and global features. In addition, we introduce the feature interpolation regularization module and multi-task learning strategy to optimize the feature embedding for head pose estimation and to learn additional information from the facial landmark detection task for performance improvement and model robustness. Furthermore, in order to solve the situation where the original dataset is short of data samples with facial masks, we synthesize the samples with facial masks as a way of data augmentation during training for model learning. To validate the proposed research, the model is trained on the public dataset BIWI and 300W-LP for head pose estimation, and is tested on the three datasets, AFLW2000, BIWI, and MAFA datasets. Our model will be first evaluated in different configurations to determine whether the proposed approach is effective. Second, through extensive experiments comparing our work with previous competitive methods, our proposed method has been shown to perform highly promisingly on these datasets. | en |
dc.description.provenance | Made available in DSpace on 2023-03-19T22:11:35Z (GMT). No. of bitstreams: 1 U0001-0109202217003700.pdf: 4237001 bytes, checksum: 1693eb4707c23bdfc0fbfb465b42e716 (MD5) Previous issue date: 2022 | en |
dc.description.tableofcontents | 誌謝 i 中文摘要 iii ABSTRACT iv CONTENTS vi LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Objective 4 1.4 Related Work 5 1.4.1 Head Pose Estimation with Landmark-based Method 5 1.4.2 Head Pose Estimation with Landmark-free Method 6 1.4.3 Rotation Representation for Head Pose Estimation 6 1.4.4 Attention Mechanism for Head Pose Estimation 7 1.4.5 Head Pose Estimation with Multi-task Learning 8 1.5 Contributions 8 1.6 Thesis Organization 9 Chapter 2 Preliminaries 11 2.1 Convolutional Neural Network 11 2.1.1 Convolutional Layer 12 2.1.2 Pooling Layer 14 2.1.3 Activation Function 15 2.1.4 Fully Connected Layer 17 2.1.5 Loss Functions 18 2.1.6 Optimizer 19 2.2 Backbone Network in CNN 20 2.2.1 VGGNet 20 2.2.2 Residual Net 21 2.3 Attention Mechanism in Deep Learning 23 2.4 Head Pose Estimation 24 2.4.1 Rotation Matrix Representation 24 2.4.2 6DRepNet 25 Chapter 3 Methodology 27 3.1 Problem Formulation 27 3.2 Architecture Overview 29 3.3 Network Architecture Design 30 3.3.1 Preprocessing 31 3.3.2 Backbone Network 32 3.3.3 Global-Local Attention Module 34 3.3.4 Head Pose Estimation Head 39 3.3.5 Feature Interpolation Regularization Module 40 3.3.6 Facial Landmark Detection Head 42 3.4 Loss Function 43 Chapter 4 Experiments 46 4.1 Datasets 46 4.1.1 300W across Large Poses Dataset 46 4.1.2 AFLW2000 Dataset 47 4.1.3 BIWI Dataset 48 4.1.4 MAFA Dataset 49 4.2 Evaluation Protocols and Metrics 51 4.3 Implementation Details 53 4.4 Ablation Study 54 4.4.1 The effect of different components 55 4.4.2 Analysis of the effect of Facial Mask Synthesis 56 4.4.3 Analysis of Structure Component for MTL 56 4.4.4 Analysis of Arrangement for GLAM 57 4.4.5 Analysis of Attention Modules in GLAM 57 4.5 Quantitative Results 58 4.6 Qualitative Results 60 Chapter 5 Conclusion 64 REFERENCE 65 | |
dc.language.iso | en | |
dc.title | 輔以注意機制和臉部關鍵點偵測之深度卷積網路應用於戴有口罩人臉的頭部姿態估計 | zh_TW |
dc.title | A Deep Convolutional Network for Head Pose Estimation of Humans Wearing Facial Masks Enhanced by Attention Mechanism and Landmark Detection | en |
dc.type | Thesis | |
dc.date.schoolyear | 110-2 | |
dc.description.degree | 碩士 | |
dc.contributor.coadvisor | 蕭培墉(Pei-Yung Hsiao) | |
dc.contributor.oralexamcommittee | 黃世勳(Shih-Shinh Hua),方瓊瑤(Chiung-Yao Fang),林忠緯(Chung-Wei Lin) | |
dc.subject.keyword | 頭部姿態估計,深度卷積網路,注意機制,特徵插值正規化,臉部關鍵點,資料擴增, | zh_TW |
dc.subject.keyword | Head Pose Estimation,Deep Convolutional Network,Attention Mechanism,Feature Interpolation Regularization,Facial Landmark,Data Augmentation, | en |
dc.relation.page | 77 | |
dc.identifier.doi | 10.6342/NTU202203074 | |
dc.rights.note | 同意授權(限校園內公開) | |
dc.date.accepted | 2022-09-26 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
dc.date.embargo-lift | 2025-09-23 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-0109202217003700.pdf 目前未授權公開取用 | 4.14 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。