Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94839
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成zh_TW
dc.contributor.advisorLi-Chen Fuen
dc.contributor.author劉璟鴻zh_TW
dc.contributor.authorJing-Hong Liuen
dc.date.accessioned2024-08-19T17:23:51Z-
dc.date.available2024-08-29-
dc.date.copyright2024-08-19-
dc.date.issued2024-
dc.date.submitted2024-08-06-
dc.identifier.citationA. A. Abdelrahman, T. Hempel, A. Khalifa, A. Al-Hamadi, and L. Dinges. L2cs- net : Fine-grained gaze estimation in unconstrained environments. In 2023 8th International Conference on Frontiers of Signal Processing (ICFSP), pages 98–102, 2023.
R. T. Azuma. A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments, 6(4):355–385, 08 1997.
A. Bulat and G. Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In International Conference on Computer Vision, 2017.
Y. Cheng, S. Huang, F. Wang, C. Qian, and F. Lu. A coarse-to-fine adaptive network for appearance-based gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):10623–10630, Apr. 2020.
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005.
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun. Repvgg: Making vgg-style convnets great again. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13728–13737, 2021.
Z. DING. Glpose: Global-local attention network with feature interpolation regularization for head pose estimation of people wearing facial masks. In 33rd British Machine Vision Conference, 2022.
G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van Gool. Random forests for real time 3d face analysis. International journal of computer vision, 101:437–458, 2013.
T. Fischer, H. J. Chang, and Y. Demiris. Rt-gene: Real-time eye gaze estimation in natural environments. In Proceedings of the European conference on computer vision (ECCV), pages 334–352, 2018.
M. Flores, J. Armingol, and A. de la Escalera. Driver drowsiness warning system using visual information for both diurnal and nocturnal illumination conditions. EURASIP journal on advances in signal processing, 2010:1–23, 2010.
R. Fusek. Pupil localization using geodesic distance. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11241 LNCS:433–444, 2018.
C. George, D. Buschek, A. Ngao, and M. Khamis. Gazeroomlock: Using gaze and head-pose to improve the usability and observation resistance of 3d passwords in virtual reality. In L. T. De Paolis and P. Bourdot, editors, Augmented Reality, Virtual Reality, and Computer Graphics, pages 61–81, Cham, 2020. Springer International Publishing.
N. Gourier and J. Crowley. Estimating face orientation from robust detection of salient facial structures. FG Net Workshop on Visual Observation of Deictic Gestures, 01 2004.
A. Grinshpoon, S. Sadri, G. J. Loeb, C. Elvezio, and S. K. Feiner. Hands-free interaction for augmented reality in vascular interventions. In 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 751–752. IEEE, 2018.
Y. Guan, Z. Chen, W. Zeng, Z. Cao, and Y. Xiao. End-to-end video gaze estimation via capturing head-face-eye spatial-temporal interaction context. IEEE Signal Processing Letters, 30:1687–1691, 2023.
E. D. Guestrin and M. Eizenman. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on biomedical engineering, 53(6):1124–1133, 2006.
A. Gupta, K. Thakkar, V. Gandhi, and P. Narayanan. Nose, eyes and ears: Head pose estimation by locating facial keypoints. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1977– 1981. IEEE, 2019.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
T. Hempel, A. A. Abdelrahman, and A. Al-Hamadi. 6d rotation representation for unconstrained head pose estimation. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2496–2500, 2022.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
H.-W. Hsu, T.-Y. Wu, S. Wan, W. H. Wong, and C.-Y. Lee. Quatnet: Quaternion- based head pose estimation with multiregression loss. IEEE Transactions on Multimedia, 21(4):1035–1046, 2019.
G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
J. Huang, X. Shao, and H. Wechsler. Face pose discrimination using support vector machines (svm). In Proceedings. fourteenth international conference on pattern recognition (Cat. No. 98EX170), volume 1, pages 154–156. IEEE, 1998.
I. Kayadibi, G. E. Güraksın, U. Ergün, and N. Özmen Süzme. An eye state recognition system using transfer learning: alexnet-based deep convolutional neural network. International Journal of Computational Intelligence Systems, 15(1):49, 2022.
P. Kellnhofer, A. Recasens, S. Stent, W. Matusik, and A. Torralba. Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6912–6921, 2019.
K. W. Kim, H. G. Hong, G. P. Nam, and K. R. Park. A study of deep cnn-based classification of open and closed eyes using a visible light camera sensor. Sensors, 17(7):1534, 2017.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
A. Królak and P. Strumillo. Eye-blink detection system for human–computer interaction. Universal Access in the Information Society, 11:1–11, 11 2011.
Y.-S. Lai, Y. Chen, L.-C. Fu, P.-Y. Hsiao, and Y.-C. Wang. Fin: A deep multi-task model for face information detection. In 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 448–455. IEEE, 2023.
S. Liu, E. Johns, and A. J. Davison. End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1871–1880, 2019.
D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1150–1157. IEEE, 1999.
J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1930–1939, 2018.
I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross-stitch networks for multi- task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3994–4003, 2016.
V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 807–814, Madison, WI, USA, 2010. Omnipress.
A. Nakazawa and C. Nitschke. Point of gaze estimation through corneal surface reflection in an active illumination environment. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II 12, pages 159–172. Springer, 2012.
E. N. A. Neto, R. M. Barreto, R. M. Duarte, J. P. Magalhaes, C. A. Bastos, T. I. Ren, and G. D. Cavalcanti. Real-time head pose estimation for mobile devices. In Intelligent Data Engineering and Automated Learning-IDEAL 2012: 13th International Conference, Natal, Brazil, August 29-31, 2012. Proceedings 13, pages 467–474. Springer, 2012.
J. Ng and S. Gong. Multi-view face detection and pose estimation using a composite support vector machine across the view sphere. In Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV’99 (Cat. No. PR00378), pages 14– 21. IEEE, 1999.
J. Ng and S. Gong. Composite support vector machines for detection of faces across views and pose estimation. Image and Vision Computing, 20(5):359–368, 2002.
R. Oyini Mbouna, S. G. Kong, and M.-G. Chun. Visual analysis of eye state and head pose for driver alertness monitoring. IEEE Transactions on Intelligent Transportation Systems, 14(3):1462–1469, 2013.
G. Pan, L. Sun, Z. Wu, and S. Lao. Eyeblink-based anti-spoofing in face recognition from a generic webcamera. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8, 2007.
A. Patney, J. Kim, M. Salvi, A. Kaplanyan, C. Wyman, N. Benty, A. Lefohn, and D. Luebke. Perceptually-based foveated virtual reality. In ACM SIGGRAPH 2016 Emerging Technologies, SIGGRAPH ’16, New York, NY, USA, 2016. Association for Computing Machinery.
N. Ruiz, E. Chong, and J. M. Rehg. Fine-grained head pose estimation without keypoints. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2155–215509, 2018.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
S. S. Shadrin and A. A. Ivanova. Analytical review of standard sae j3016 «taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles» with latest updates. Avtomobil’. Doroga. Infrastruktura., (3 (21)):10, 2019.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
L. N. Smith and N. Topin. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, volume 11006, pages 369–386. SPIE, 2019.
F. Song, X. Tan, X. Liu, and S. Chen. Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients. Pattern Recognition, 47(9):2825–2838, 2014.
T. Soukupova and J. Cech. Eye blink detection using facial landmarks. In 21st computer vision winter workshop, Rimske Toplice, Slovenia, volume 2, 2016.
K. Wang, R. Zhao, and Q. Ji. Human computer interaction with head pose, eye gaze and body gestures. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 789–789. IEEE, 2018.
P. Werner, F. Saxen, and A. Al-Hamadi. Landmark based head pose estimation benchmark and method. In 2017 IEEE International Conference on Image Processing (ICIP), pages 3909–3913, 2017.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
Y.-S. Wu, T.-W. Lee, Q.-Z. Wu, and H.-S. Liu. An eye state recognition method for drowsiness detection. In 2010 IEEE 71st Vehicular Technology Conference, pages 1–5, 2010.
T.-Y. Yang, Y.-T. Chen, Y.-Y. Lin, and Y.-Y. Chuang. Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1087–1096, 2019.
W. Zeng, Y. Xiao, S. Wei, J. Gan, X. Zhang, Z. Cao, Z. Fang, and J. T. Zhou. Real- time multi-person eyeblink detection in the wild for untrimmed video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13854–13863, 2023.
X. Zhang, Y. Sugano, and A. Bulling. Evaluation of appearance-based methods and implications for gaze-based applications. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, page 1–13, New York, NY, USA, 2019. Association for Computing Machinery.
X. Zhang, Y. Sugano, M. Fritz, and A. Bulling. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1):162–175, 2019.
Z. Zhang, Y. Hu, M. Liu, and T. Huang. Head pose estimation in seminar room using multi view face detectors. In R. Stiefelhagen and J. Garofolo, editors, Multimodal Technologies for Perception of Humans, pages 299–304, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
L. Zhao, Z. Wang, G. Zhang, Y. Qi, and X. Wang. Eye state recognition based on deep integrated neural network and transfer learning. Multimedia Tools and Applications, 77:19415–19438, 2018.
Y. Zhou, C. Barnes, J. Lu, J. Yang, and H. Li. On the continuity of rotation representations in neural networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5738–5746, 2019.
X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li. Face alignment across large poses: A 3d solution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 146–155, 2016.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94839-
dc.description.abstract近年來,智慧車輛相關技術正迅速發展,其中設計能在車輛上運行的駕駛者監控系統成為一項重要課題。這類系統透過分析駕駛者的面部資訊來判斷其當前的注意力狀態,幫助評估駕駛者控制車輛的能力。
在本論文中,我們提出了一種輕量級的多任務深度學習模型,該模型利用面部圖像同時檢測頭部姿態、注視方向和眼睛狀態,專為整合到駕駛者監控系統而設計,旨在判斷駕駛者是否專注於道路。為了使模型能夠部署於車輛上的嵌入式裝置,我們採用多任務學習並設計了合適的任務分支,創建一個高效且輕量級的系統。每個任務由專門的分支支持,以確保提取必要的特徵,並通過部位感知監督增強對相關面部區域的關注。此外,我們引入線索特徵來編碼各部位特定的隱式知識,提供更佳的任務特定特徵初始化並提升整體模型性能。我們在AFLW2000、BIWI、Gaze360 和 CEW 數據集上對我們的模型進行評估,結果顯示我們的方法在減少參數成本的同時,達到了具競爭力的性能,並在相似的輕量級條件下表現更佳,展示了我們的方法在實際應用中的有效性。
zh_TW
dc.description.abstractIn recent years, technologies related to intelligent vehicles have seen rapid devel-opment, making the design of driver monitoring systems that can operate on vehicles acritical area of focus. These systems analyze facial information of drivers to determinetheir current attention status, helping assess the driver’s ability to control the vehicle.
In this thesis, we propose a lightweight, multi-task deep learning model designed to simultaneously detect head pose, gaze direction, and eye state from facial images. This model is specifically designed for integration into driver monitoring systems, aiming to determine whether the driver is focused on the road. To facilitate deployment on embedded vehicle devices, our approach leverages multi-task learning with carefully designed task branches to create an efficient and lightweight system. Each task is supported by dedicated branches to ensure the extraction of necessary features, with part-aware supervision enhancing the focus on relevant facial regions. Furthermore, we introduce clue features to encode part-specific implicit knowledge, providing improved initialization for task-specific features and enhancing overall model performance.We evaluate our model using the AFLW2000, BIWI, Gaze360, and CEW datasets, demonstrating that our method achieves competitive performance with reduced parameter costs. Our results show superior performance under lightweight conditions compared to existing methods, demonstrating the effectiveness of our approach in real-world applications.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-19T17:23:50Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-08-19T17:23:51Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝 i
摘要 ii
Abstract iii
Contents v
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
1.1 Background 1
1.1.1 Head Pose Estimation 2
1.1.2 Gaze Estimation 4
1.1.3 Eye State Estimation 5
1.2 Motivation 6
1.3 Challenges 7
1.3.1 Data Difficulties in Integrating Facial Tasks 7
1.3.2 Leveraging Prior Knowledge for Facial Task Modeling 8
1.3.3 Balancing Performance and Efficiency 8
1.4 Objectives 8
1.4.1 Efficient and Effective Facial Task Handling 9
1.4.2 Part-Aware Supervision and Information Fusion 9
1.4.3 Clue Features with Lightweight Feature Interaction 9
1.5 Related Work 10
1.5.1 Head Pose Estimation 10
1.5.2 Gaze Estimation 10
1.5.3 Eye State Estimation 11
1.6 Contributions 12
1.7 Thesis Organization 12
Chapter 2 Preliminaries 14
2.1 Convolutional Neural Network 14
2.1.1 Convolution Layer 15
2.1.2 Depthwise Separable Convolution 16
2.1.3 Pooling Layer 18
2.1.4 Activation Function 19
2.1.5 Loss Function 22
2.2 Backbone Network 23
2.2.1 VGG 23
2.2.2 ResNet 24
2.2.3 MobileNet 25
2.3 Rotation Matrix Representation 26
Chapter 3 Methodology 29
3.1 Architecture Overview 30
3.2 Shared Feature Extractor 31
3.3 Clue Features 33
3.4 Feature Aggregation Module 34
3.5 Clue Interaction Module 36
3.6 Task-Specific Heads 39
3.6.1 Part-Aware Supervision Head 40
3.6.2 Head Pose Estimation Head 40
3.6.3 Gaze Estimation Head 41
3.6.4 Eye State Detect Module 41
3.7 Loss Function 43
Chapter 4 Experiments 46
4.1 Dataset 46
4.1.1 300W-LP Dataset 46
4.1.2 BIWI Dataset 47
4.1.3 AFLW2000 Dataset 47
4.1.4 Gaze360 Dataset 48
4.1.5 MRL Eye Dataset 49
4.1.6 CEW Dataset 51
4.2 Evaluation Metrics 51
4.3 Implementation Details 53
4.4 Head Pose Estimation 54
4.5 Gaze Estimation 55
4.6 Eye State Detection 56
4.7 Ablation Study 56
Chapter 5 Conclusion 59
References 61
-
dc.language.isoen-
dc.subject頭部姿態估計zh_TW
dc.subject深度學習zh_TW
dc.subject開閉眼狀態檢測zh_TW
dc.subject輕量化模型zh_TW
dc.subject視線注視估計zh_TW
dc.subjectGaze Estimationen
dc.subjectLightweight Modelen
dc.subjectEye State Detectionen
dc.subjectHead Pose Estimationen
dc.subjectDeep Learningen
dc.title使用多線索交互和部位感知監督增強之輕量級深度卷積網路應用於面部資訊偵測zh_TW
dc.titleA Lightweight Deep Convolutional Network for Face Information Detection Enhanced by Multi-Clue Interaction and Part-Aware Supervisionen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蕭培墉;黃世勳;方瓊瑤;傅楸善zh_TW
dc.contributor.oralexamcommitteePei-Yung Hsiao;Shih-Shinh Huang;Chiung-Yao Fang;Chiou-Shann Fuhen
dc.subject.keyword深度學習,頭部姿態估計,視線注視估計,開閉眼狀態檢測,輕量化模型,zh_TW
dc.subject.keywordDeep Learning,Head Pose Estimation,Gaze Estimation,Eye State Detection,Lightweight Model,en
dc.relation.page69-
dc.identifier.doi10.6342/NTU202403315-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-08-09-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2029-08-05-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
  未授權公開取用
12.11 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved