即時人體動作辨識系統之特徵點萃取架構設計

Chun-Ting Yen; 顏君庭

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60178

標題:	即時人體動作辨識系統之特徵點萃取架構設計 Architecture Design of Feature Extraction for Real-time Action Recognition
作者:	Chun-Ting Yen 顏君庭
指導教授:	陳良基(Liang-Gee Chen)
關鍵字:	影片處理,動作辨識,即時視覺動態特徵選取,區塊性特徵點,特徵點萃取架構設計, Video Processing,Action Recognition,Real-time Feature Extraction Architecture,Block-based Features,Feature Extraction Architecture Design,
出版年 :	2016
學位:	碩士
摘要:	電腦視覺的相關研究已經進行多年，結合了機器學習演算法的幫助，電子產品能夠自動從網路等大量資料庫學習有用的知識，進行自我校正與進步。電腦視覺與機器學習的結合帶來許多不同的應用，使我們的生活更加迅捷與方便。電腦視覺的終極目標是發明一個智慧型機器人，使得此機器人能夠和具有與一般人無異的感知與互動。我們認為要達到此目標的第一步則是：使得機器能夠解讀動態影片背後所代表的實質意義。與靜態影像相比，擁有時空資訊的動態影片往往蘊含更多的知識。因此，人體動作辨識的應用則成為機器人視覺最重要的基礎之一。然而動態影片所包含的各種變化卻也大幅增加了分析的難度，許多研究學者專注於提高動作辨識的準確度。在過去的研究中，從動態影片中取出特徵值的演算法依然太過複雜以致於難以達到即時。在此論文中，我們首先介紹一些電腦視覺的基礎應用以及不同特徵點萃取的方法。比較這些演算法的優缺點後，我們選擇使用區域時空特徵法來萃取特徵點以用於動作辨識。考量到系統的效率以及準確度，我們使用MoFREAK特徵點萃取演算法來描述含有動作的影片。MoFREAK特徵點萃取法分別利用FREAK特徵來描述動作的靜態資訊、MIP特徵來描述動作的動態資訊。接著我們基於此演算法設計出一個硬體架構，並利用區塊性特徵點的技巧來節省架構頻寬、提升硬體效能。經過硬體架構之優化，我們所提出之硬體架構於TSMC 40 nm製成之電路合成結果達到即時運算的規格，為工作頻率200 MHz影片解析度為full HD (1920╳1080)，且只需要約1100 K 邏輯閘數目及7.9 Kbytes記憶體。此外，藉由我們所提出的區塊性特徵點技巧將鄰近的特徵點包裝起來再進行運算，可以提供1.2 K個區塊性特徵點於120幀率且頻寬為417.6 Mbytes/sec、0.5 K個區塊性特徵點於240幀率且頻寬為835.2 Mbytes/sec。上述的可提供的至多區塊性特徵點及幀率皆假設為最差的情況，也就是區塊性特徵點中的10個特徵點都符合特徵點皆需要進行描述。因此若是在一般情形，於相同的幀率我們的硬體架構可提供的至多區塊性特徵點數量可以再往上提升。 Computer vision has been developed for decades, and the help of machine learning algorithms, electronic devices are able to learn knowledge from big data such as the Internet. The combination of computer vision and machine learning has also brought a large amount of applications, making our lives more convenient. The ultimate goal of computer vision is to invent a brilliant robot. We think the first step is to understand the semantic meaning behind videos. Videos content spatial-temporal information that implies richer knowledge. Therefore, action recognition becomes a basic application that can be implemented in the vision of robots. The variations of videos increase the difficulty of analysis, leading many researchers to develop better algorithms aiming at raising the recognition accuracy on datasets. However, the computation complexity of feature extraction in videos is still too complicated to be real-time in past researches. In the thesis, we first introduce some computer vision applications and different approaches of feature extraction. Comparing several related algorithms and examining the pros and cons of each method, we choose to use space-time local features in our approach. Considering both the efficiency and accuracy, we adopt the MoFREAK feature extraction algorithm to generate robust descriptors of action videos. MoFREAK is a feature combines the appearance model and motion model independently. We capture static information by FREAK and dynamic information by MIP, and show good performance through datasets. Then, we implement the MoFREAK feature extraction into hardware architecture by introducing the block-based features technique to improve the hardware performance and reduction the bandwidth and solve the problem of irregular feature points. After the optimization, the synthesis results of our proposed design achieve the real-time specification with about 1100 K gate counts and 7.9 Kbytes memory usage, operate at 200 MHz with full HD (1920_1080) video resolution. Furthermore, because of the block-based keypoint technique, we can extract features from full HD resolution video sequence and offer 1.2 K block-based feature points at 120 fps with bandwidth 417.6 Mbytes/sec and 0.5 K block-based feature points at 240 fps with 835.2 Mbytes/sec bandwidth, which assume in the worst case that all 10 points in block-based features are detected as keypoints. If it is not the worst case, we can offer more feature points at the same frame rate.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60178
DOI:	10.6342/NTU201603731
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	7.06 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。