利用多重中階表徵進行第一人稱視角影片動作辨識

Peng-Ju Hsieh; 謝朋儒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53994

標題:	利用多重中階表徵進行第一人稱視角影片動作辨識 Egocentric Activity Recognition by Leveraging Multiple Mid-level Representations
作者:	Peng-Ju Hsieh 謝朋儒
指導教授:	徐宏民(Winston Hsu)
關鍵字:	動作辨識,第一人稱視角辨識,特徵融合, Activity Recognition,Egocentric Video,Feature Fusion,
出版年 :	2015
學位:	碩士
摘要:	現有的第一人稱視角影像動作辨識主要著重在單一模型（例如：偵測互動的物件）來推斷活動類型。然而，因為攝影機的與實驗者的視角不一致的關係，導致在影像裡重要的物件可能會被部分遮蔽或者沒有顯示。這些因素將導致偵測互動的物件模型準確度大幅降低。再者，我們發現實驗者在何處（where）與如何（how）與物件互動的資訊在先前的第一人稱視角影像動作辨識研究裡幾乎被忽略。因此為了解決上述的困難點，我們使用多重中階表徵來提高第一人稱視角影像動作辨識的準確度。具體地來說，我們利用多重模型（例如：背景資訊、物件的使用與手部的動作模式）來補足單一模型的不足，並且共同地考慮使用者在與什麼（what）、何處（where）與如何（how）互動的資訊來建立起多重模型來進行第一人稱視角影像動作辨識。為了測試我們所提出的多重中階表徵模型，我們收集了新的第一人稱視角影像動作辨識資料庫，其中包含了第一人稱視角影像與手部三軸加速度。在公開的資料集（ADL）中我們的多重中階表徵模型勝過目前最新穎的方法從36.8%到46.7%，在我們自己收集的資料集裡，我們的方法勝過目前最新穎的方法從32.5%到60.0%。除此之外，我們也做了一系列的實驗來發掘各個模型的相對價值。 Existing approaches for egocentric activity recognition mainly rely on a single modality (e.g., detecting interacting objects) to infer the activity category. However, due to the inconsistency between camera angle and subject's visual field, important objects may be partially occluded or missing in the video frames. Moreover, where the objects are and how we interact with the objects are usually ignored in prior works. To resolve these difficulties, we propose to leverage multiple mid-level representations to improve egocentric activity classification accuracy. Specifically, we aim at utilizing multimodal representations (e.g., background context, objects manipulated by a user, and motion patterns of hands) to compensate the insufficiency of a single modality, and jointly consider what, where, and how a subject is interacting with. To evaluate the method, we introduce a new and challenging egocentric activity dataset (ADL+) that contains video and wrist-worn accelerometer data of people performing daily-life activities. Our approach significantly outperforms the state-of-the-art method on the ADL dataset (i.e., 36.8% to 46.7%) and our ADL+ dataset (i.e., 32.5% to 60.0%) in terms of classification accuracy. In addition, we also conduct a series of analyses to explore relative merits of each modality to egocentric activity recognition.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53994
全文授權:	有償授權
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	8.65 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。