以全域性空間時間特徵輔以序列式生成機制完成多工之動作辨識及產生光流影像

Tso-Hsin Yeh; 葉佐新

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76404

標題:	以全域性空間時間特徵輔以序列式生成機制完成多工之動作辨識及產生光流影像 Using Global Spatiotemporal Features with Sequentially Pooling Mechanism for Multi-tasking of Action Recognition and Optical Flow Estimation
作者:	Tso-Hsin Yeh 葉佐新
指導教授:	傅立成(Li-Chen Fu)
關鍵字:	動作辨識,光流,序列式機制, action recognition,optical flow estimation,sequential mechanism,
出版年 :	2018
學位:	碩士
摘要:	在本篇中,我們提出了一個嶄新的深度學習架構來做動作辨識,近期的深度學習研究議題中,動作辨識是一個越來越重要的領域,深度學習方法已經被廣泛地運用且有能力產生泛化的模型,目前大部分存在的方法不是使用 Two-Stream 就是使用 3D ConvNet 的方法,前者使用了單張彩色影像與多張疊在一起的光流影像當作架構的輸入,而後者則是將輸入多張疊在一起的彩色影像,但會花較大時間及記憶體上的代價,本篇提出的 ResFlow 使用一個光流的數據庫以得到預處理的模型,並且在動作的數據庫上作微調處理,此模型可視為一個提取高維度特徵的模組來做動作辨識,在第一階段中,使用光流數據庫當作一個預先學習的基礎,整合空間時間的特徵透過自動編碼機得架構會從中間的高維度空間中被提取出來,在微調階段中透過影片中分解出的影像可以得到一組區域性整合空間時間的特徵,並且利用設計的序列式機制,可以得到每一個區域性整合空間時間特徵的信心分數,而利用這個信心分數可以有效率地得到全域性整合空間時間的特徵,而這全域性整合空間時間的特徵可被拿來做動作辨識. In this thesis, we propose a brand-new architecture for action recognition. Recently, action recognition has been a rising issue in computer vision field. Deep learning method has been widely-used and is capable of generating generic model. Most of existing methods use either two-stream, RGB and stacking optical flow as inputs, or C3D, concatenating several RGB images as input, which cost much prices on time and memory. ResFlow, the proposed method, is pre-trained by optical flow dataset, FlyingChairs, and fine-tuned with action dataset, UCF101 and HMDB51, as a high level feature extractor. With optical flow pre-trained in first stage, spatiotemporal features are encoded in the latent high dimensional space in the middle of autoencoder architecture. In fine-tuning stage, the extracted spatiotemporal features from a set of frames from a video clip are given confidence scores by a designed Sequential Mechanism. This Sequential Mechanism takes ordered feature from the feature set as input and gives a confidence score to each feature to aggregate sequential features into a condensed feature which is leveraged for action recognition. This kind of design use only RGB images as input but with temporal information encoded, pre-trained by optical flow, and sequentially aggregate local spatiotemporal features into global spatiotemporal features in high efficiency for action recognition.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76404
DOI:	10.6342/NTU201802828
全文授權:	同意授權(全球公開)
電子全文公開日期:	2023-08-21
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-R05921061-1.pdf	24.32 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。