Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76404
Title: 以全域性空間時間特徵輔以序列式生成機制完成多工之動作辨識及產生光流影像
Using Global Spatiotemporal Features with Sequentially Pooling Mechanism for Multi-tasking of Action Recognition and Optical Flow Estimation
Authors: Tso-Hsin Yeh
葉佐新
Advisor: 傅立成(Li-Chen Fu)
Keyword: 動作辨識,光流,序列式機制,
action recognition,optical flow estimation,sequential mechanism,
Publication Year : 2018
Degree: 碩士
Abstract: 在本篇中,我們提出了一個嶄新的深度學習架構來做動作辨識,近期的深度學習研究議題中,動作辨識是一個越來越重要的領域,深度學習方法已經被廣泛地運用且有能力產生泛化的模型,目前大部分存在的方法不是使用 Two-Stream 就是使用 3D ConvNet 的方法,前者使用了單張彩色影像與多張疊在一起的光流影像當作架構的輸入,而後者則是將輸入多張疊在一起的彩色影像,但會花較大時間及記憶體上的代價,本篇提出的 ResFlow 使用一個光流的數據庫以得到預處理的模型,並且在動作的數據庫上作微調處理,此模型可視為一個提取高維度特徵的模組來做動作辨識,在第一階段中,使用光流數據庫當作一個預先學習的基礎,整合空間時間的特徵透過自動編碼機得架構會從中間的高維度空間中被提取出來,在微調階段中 透過影片中分解出的影像可以得到一組區域性整合空間時間的特徵,並且利用設計
的序列式機制,可以得到每一個區域性整合空間時間特徵的信心分數,而利用這個信心分數可以有效率地得到全域性整合空間時間的特徵,而這全域性整合空間時間的特徵可被拿來做動作辨識.
In this thesis, we propose a brand-new architecture for action recognition. Recently, action recognition has been a rising issue in computer vision field. Deep learning method has been widely-used and is capable of generating generic model. Most of existing methods use either two-stream, RGB and stacking optical flow as inputs, or C3D, concatenating several RGB images as input, which cost much prices on time and memory. ResFlow, the proposed method, is pre-trained by optical flow dataset, FlyingChairs, and fine-tuned
with action dataset, UCF101 and HMDB51, as a high level feature extractor. With optical flow pre-trained in first stage, spatiotemporal features are encoded in the latent high dimensional space in the middle of autoencoder architecture. In fine-tuning stage, the extracted spatiotemporal features from a set of frames from a video clip are given confidence scores by a designed
Sequential Mechanism. This Sequential Mechanism takes ordered feature from the feature set as input and gives a confidence score to each feature to aggregate sequential features into a condensed feature which is leveraged for
action recognition. This kind of design use only RGB images as input but with temporal information encoded, pre-trained by optical flow, and sequentially aggregate local spatiotemporal features into global spatiotemporal features in high efficiency for action recognition.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76404
DOI: 10.6342/NTU201802828
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2023-08-21
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-107-R05921061-1.pdf24.32 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved