使用基於任務的跨維度注意力模組之高效多任務學習深度卷積網路應用於面部資訊偵測

賴奕善; Yi-Shan Lai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90701

標題:	使用基於任務的跨維度注意力模組之高效多任務學習深度卷積網路應用於面部資訊偵測 An Efficient Deep Convolutional Network for Face Information Detection with Multi-Task Learning Enhanced by Task-based Cross-Dimensional Attention Module
作者:	賴奕善 Yi-Shan Lai
指導教授:	傅立成 Li-Chen Fu
關鍵字:	深度學習,多任務學習,資料擴增,輕量化模型, deep learning,multi-task learning,data augmentation,light-weight model,
出版年 :	2023
學位:	碩士
摘要:	近年來，隨著自動駕駛技術的發展，儘管自動駕駛系統的性能日益提高，然而在面對緊急情況時，仍然需要駕駛進行控制權接管的限制成為關注的重點。為了解決這個問題，設計一個可以運行在車輛上的駕駛員監測系統變得越來越重要。透過監測駕駛員的狀態，如頭部姿態、視線方向與開閉眼狀態等資訊，可以幫助自動駕駛系統判斷駕駛人者是否有專注於道路狀況並有能力接管駕駛任務。本論文提出了第一個以全臉影像作為輸入同時進行頭部姿態估計、眼睛狀態偵測和視線狀態估計的深度學習模型，能夠應用在駕駛員狀態監測系統來幫助自動駕駛系統判斷駕駛者是否能夠接管控制權。由於車輛上的自動駕駛系統有硬體上的限制，我們使用多任務學習的方式，以單一模型進行推論，進而避免使用多個模型進行預測時需要大量記憶體的情況。為了能讓模型在每個任務分支中能夠分辨出自身需要的特徵，我們設計跨維度注意力模組來為每個分支增強並篩選出對於各自任務所需要的特徵。此外，為了解決開閉眼狀態資料集沒有包含多角度頭部姿態的問題，我們提出生成具有眼睛狀態標註資料的資料擴增技術，提升模型在不同角度頭部姿態開閉眼偵測的穩定性，並且在公開CEW資料集驗證其可行性，並設立一個以全臉影像為輸入的眼睛狀態偵測基準線。 In recent years, with the development of autonomous driving technology, although the performance of autonomous driving systems has been improved, the limitation of requiring drivers to take control in emergency situations still remains a major concern. It has become increasingly important to design a driver monitoring system that can run on vehicles. By monitoring the driver's status, such as head posture, gaze direction, and open/closed eyes, the autonomous driving system can determine whether the driver is focusing on the road conditions and has the ability to take over driving tasks. In this thesis, we propose the first deep learning model that takes full-face images as input and simultaneously performs head pose estimation, eye state detection, and gaze estimation. The model is applicable to driver-state monitoring systems to assist autonomous driving systems in determining whether the driver is capable of taking control. Due to hardware limitations in vehicle-based autonomous driving systems, we adopt a multi-task learning approach to perform inference using a single model, thereby avoiding the need for multiple models and excessive memory usage during inference. To enable the model to distinguish the relevant features for each task branch, we propose a task-based cross-dimensional attention module that selectively filters and enhances the features required for each respective task. Additionally, to address the lack of multiple head pose angles in the eye state dataset, we propose a data augmentation technique that generates augmented data with eye state labels, improving the stability of the model in detecting eye states under different head pose angles. Finally, we validate the feasibility of the proposed approach by conducting extensive experiments and the performance on AFLW2000, BIWI, and Gaze360 is competitive to the SOTA work. Furthermore, we employ the CEW dataset and introduce a baseline for eye state detection using full-face images as input.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90701
DOI:	10.6342/NTU202303294
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2026-08-07
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	9.68 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。