動作辨識之三維梯度方向直方圖架構設計

Ping-Han Chuang; 莊秉翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60173

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳良基(Liang-Gee Chen)
dc.contributor.author	Ping-Han Chuang	en
dc.contributor.author	莊秉翰	zh_TW
dc.date.accessioned	2021-06-16T10:00:55Z	-
dc.date.available	2017-02-08
dc.date.copyright	2017-02-08
dc.date.issued	2016
dc.date.submitted	2016-11-14
dc.identifier.citation	[1] T. Lan, M. Raptis, L. Sigal, and G. Mori, “From subcategories to visual composites: A multi-level framework for object detection,” in International Conference on Computer Vision (ICCV), 2013. [2] C. Hazirbas, “Feature selection and learning for semantic segmentation,” Master’s thesis, Technical University Munich, Germany, June, 2014. [3] M. Munaro, F. Basso, and E. Menegatti, “Tracking people within groups with rgb-d data,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2101–2107, IEEE, 2012. [4] M. Humphries, “Googles new self driving car has no steering wheel.” http://www.geek.com/news/ googles-new-self-driving-car-has-no-steering-wheel-1595053/, 2014. [Online; accessed 23-March-2016]. [5] J. Allsopp, “Pokmon go: Augmented reality, i choose you!.” http://www.proactiveinvestors.com/companies/news/128513/pokmon-go-augmented-reality-i-choose-you-128513.html, 2016. [6] J. Allsopp, “Robots: Japans future elderly care workers.” http://vrworld.com/2015/01/22/robots-japans-future-elderly-care-workers/, 2015. [7] M. Ryoo, “Cvpr action recognition tutorial.” http://michaelryoo.com/cvpr2014tutorial/. [8] Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. S. Huang, “Action detection in complex scenes with spatial and temporal ambiguities,” in 2009 IEEE 12th International Conference on Computer Vision, pp. 128–135, Sept 2009. [9] “Jetson tx1.” http://www.nvidia.com/object/jetson-tx1-module.html. [10] G. Johansson, “Visual perception of biological motion and a model for its analysis,” Perception & Psychophysics, vol. 14, no. 2, pp. 201–211, 1973. [11] J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden markov model,” in Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pp. 379–385, IEEE, 1992. [12] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257–267, 2001. [13] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol. 2, pp. 1395–1402, IEEE, 2005. [14] R. Polana and R. Nelson, “Low level recognition of human motion (or how to get your man without ﬁnding his body parts),” in Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on, pp. 77–82, IEEE, 1994. [15] A. A. Efros, A. C. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 726–733, IEEE, 2003. [16] I. Laptev, “On space-time interest points,” International Journal of Computer Vision, vol. 64, no. 2-3, pp. 107–123, 2005. [17] P. Doll´ar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72, IEEE, 2005. [18] H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” in BMVC 2009-British Machine Vision Conference, pp. 124–1, BMVA Press, 2009. [19] “Gopro hero 5 black.” http://zh.shop.gopro.com/International/cameras. [20] “iphone 7 speciﬁcation.” http://www.apple.com/tw/iphone-7/specs/. [21] A. Klaser, M. Marsza lek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 2008-19th British Machine Vision Conference, pp. 275–1, British Machine Vision Association, 2008. [22] A. Klaser, Learning human actions in video. PhD thesis, Institut National Polytechnique de Grenoble-INPG, 2010. [23] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local svm approach,” in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, pp. 32–36, IEEE, 2004. [24] Wikipedia, “Outline of object recognition — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Outline_of_object_recognition&oldid=739697630, 2016. [Online; accessed 5-October-2016]. [25] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in European conference on computer vision, pp. 44–57, Springer, 2008. [26] A. M. Burton, S. Wilson, M. Cowan, and V. Bruce, “Face recognition in poor-quality video: Evidence from security surveillance,” Psychological Science, vol. 10, no. 3, pp. 243–248, 1999. [27] R. Bradley, “Tesla autopilot the electric-vehicle maker sent its cars a software update that suddenly made autonomous driving a reality..” https://www.technologyreview.com/s/600772/10-breakthrough-technologies-2016-tesla-autopilot/, 2016. [28] “Google self-driving car.” https://www.google.com/selfdrivingcar/, 2016. [29] “Googles self-driving car vs tesla autopilot: 1.5m miles in 6 years vs 47m miles in 6 months.” https://electrek.co/2016/04/11/google-self-driving-car-tesla-autopilot/, 2016. [30] S. Edelstein, “Bmw activeassist system lets self-driving cars get sideways and keep you on the road.” http://www.digitaltrends.com/cars/bmw-activeassist-introduced-at-ces-2014/#!EkGS4, 2014. [31] Wikipedia, “Augmented reality — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Augmented_reality&oldid=742548832, 2016. [Online; accessed 6-October-2016]. [32] Wikipedia, “Pepper (robot) — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Pepper_(robot)&oldid=741892639, 2016. [Online; accessed 6-October-2016]. [33] M. Williams, “Japan starts 8k tv broadcasts in time for rio olympics.” http://www.pcworld.com/article/3102875/consumer-electronics/japan-starts-8k-tv-broadcasts-in-time-for-rio-olympics.html, 2016. [34] G. Jarboe, “Youtube changes at a rate of 33video marketing.” http://tubularinsights.com/youtube-changes-33-percent-a-year/,2015. [35] “statistical information.” https://www.youtube.com/yt/press/zh-TW/statistics.html. [36] Wikipedia, “Big data — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Big_data&oldid=742733640, 2016. [Online; accessed 5-October-2016]. [37] Wikipedia, “Activity recognition — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Activity_recognition&oldid=742434451, 2016. [Online; accessed 3-October-2016]. [38] J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Computing Surveys (CSUR), vol. 43, no. 3, p. 16, 2011. [39] R. Poppe, “A survey on vision-based human action recognition,” Image and vision computing, vol. 28, no. 6, pp. 976–990, 2010. [40] S. Gupta and R. J. Mooney, “Using closed captions to train activity recognizers that improve video retrieval,” in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 30–37, IEEE, 2009. [41] V. Ferrari, M. Marin-Jimenez, and A. Zisserman, “Progressive search space reduction for human pose estimation,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8, IEEE, 2008. [42] A. Agarwal and B. Triggs, “Recovering 3d human pose from monocular images,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 1, pp. 44–58, 2006. [43] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004. [44] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893, IEEE, 2005. [45] S.-F. Wong and R. Cipolla, “Extracting spatiotemporal interest points using global information,” in 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007. [46] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8, IEEE, 2008. [47] “Ian bishop, a man punches 13 times in a second.” http://www.mirror.co.uk/news/uk-news/lightning-strikes-watch-man-worlds-5262099. [48] M. Hiromoto and R. Miyamoto, “Hardware architecture for high-accuracy real-time pedestrian detection with cohog features,” in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 894–899, IEEE, 2009. [49] R. Kadota, H. Sugano, M. Hiromoto, H. Ochi, R. Miyamoto, and Y. Nakamura, “Hardware architecture for hog feature extraction,” in Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP’09. Fifth International Conference on, pp. 1330–1333, IEEE, 2009. [50] Y. Yazawa, T. Yoshimi, T. Tsuzuki, T. Dohi, and H. Fujiyoshi, “Fpga hardware with target-reconﬁgurable object detector by joint-hog,” Proceeding of SSII. Yokohama, Japan, 2011. [51] K. Negi, K. Dohi, Y. Shibata, and K. Oguri, “Deep pipelined one-chip fpga implementation of a real-time image-based human detection algorithm,” in Field-Programmable Technology (FPT), 2011 International Conference on, pp. 1–8, IEEE, 2011. [52] K. Mizuno, Y. Terachi, K. Takagi, S. Izumi, H. Kawaguchi, and M. Yoshimoto, “Architectural study of hog feature extraction processor for real-time object detection,” in 2012 IEEE Workshop on Signal Processing Systems, pp. 197–202, IEEE, 2012. [53] Wikipedia, “K-means clustering — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid=735453167, 2016. [Online; accessed 20-August-2016]. [54] Wikipedia, “Support vector machine — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Support_vector_machine&oldid=743684334, 2016. [Online; accessed 10-October-2016]. [55] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, pp. 357–360, ACM, 2007. [56] G. Willems, T. Tuytelaars, and L. Van Gool, “An eﬃcient dense and scale-invariant spatio-temporal interest point detector,” in European conference on computer vision, pp. 650–663, Springer, 2008. [57] Wikipedia, “Summed area table — wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Summed_area_table&oldid=726337475, 2016. [Online; accessed 21-June-2016]. [58] “Hog3d website.” https://lear.inrialpes.fr/people/klaeser/software_3d_video_descriptor.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60173	-
dc.description.abstract	隨著傳統數位影像處理越來越成熟，另一個影像處理領域關於教導機器以人類的方式去看周遭世界的事物變得越來越熱門，這個領域就叫做電腦視覺。我們在第一章節介紹許多電腦視覺的應用，一些較低階的應用如物件辨識和語意切割，這些應用可以去實現更高階的應用，例如：智慧監視器系統，自動駕駛汽車，機器人，其中我們發現動作辨識是許多應用的核心技術，如果搭配動作辨識功能，攝影機可以分辨出緊急事件的發生已通報當局，自動駕駛汽車可以知道行人的速度已決定要加速或停下來，還有許多應用需要擁有動作辨識的功能，所以我們想要提供一個動作辨識的硬體架構去解決動作辨識所遇到的問題，讓電腦視覺領域再向前邁進。我們瀏覽許多動作辨識相關的方法並將其分成三類。在瀏覽的過程中，我們發現兩個指標能判斷動作辨識系統的好壞，首先是辨識動作的準確度，再來是系統的執行速度，大部分的動作辨識方法可以得到蠻好的準確度，但是需要花費相當長的時間以致於無法即時運算，即使是處理低解析度的影片也是如此，我們在第一章節和第二章節詳細說明這個概念，而這也是為什麼我們要將動作辨識時坐在ASIC的原因，我們的目標是高幀率特徵抽取引擎應用在穿戴式裝置上，相關規格在第二章做定義，相較於其他類似的作品，我們的規格是最高的。當我們比較過動作辨識中的特徵抽取方法後，我們決定採取HOG3D當作我們的特徵，但是原始的HOG3D演算法並不適合硬體實作，所以我們對不同參數做實驗以及更改演算法以適合硬體實作，這些內容在第三章節，有了這些結果之後我們在第四章節提出硬體架構設計，主要的貢獻在於移除演算法中非線性運算的部分使得更多資料可以重複使用，另外，也採用了平行運算的技巧去達到即時運算，和運算資源共享以減少硬體面積。在第四章節最後，我們分析晶片上記憶體和系統頻寬的取捨，也比較我們所提出的四種硬體架構設計。	zh_TW
dc.description.abstract	With the traditional digital image processing technologies becoming more and more mature, another image processing field which is about teaching machines to see things in the real world like human doing has become more and more popular. This field is called computer vision. In chapter 1, we introduce a lot of application of computer vision. Some lower level applications are object recognition and semantic segmentation. These techniques make higher level applications like intelligent surveillance system, self-driving car and robot becomes realizable. We can find the fact that action recognition is a core key technique for these applications. With action recognition, surveillance can tell urgent events and call the authorities and self-driving car can know the speed of pedestrians to decide to accelerate or stop. So many fascinating applications need action recognition on them. So, we decide to provide a hardware architecture to solve the problems in action recognition to make advance in computer vision field. We have surveyed lots of related work to recognize human actions and categorized these methods into three categories. During our survey of papers about action recognition, we find two critical issue to check whether an action recognition system is good enough. One is the accuracy of the system and the other is the processing time of the system. For most algorithms of action recognition, the accuracy of them is high enough while the processing time can not reach the real-time requirement even for low resolution video sequences. In chapter 1 and chapter 2, we state these concepts in detailed and they motivate us to implement action recognition on ASIC. Our target is a high frame rate feature extraction engine for wearable devices. The specification of our hardware is defined in chapter 2 and it is the highest specification compared with other similar works. After we compare some feature extraction methods for action recognition, we choose HOG3D descriptor as our feature. However, original algorithm of HOG3D descriptor is not hardware-friendly. Therefore, we do experiments to choose parameters and modify original HOG3D algorithm to make it more hardware-friendly in chapter 3. Following the results we get from chapter 3, we propose our architecture design in chapter 4. The main contribution comes from removing non-linear operations in algorithm making more data reuse. Besides, the technique of parallel computing and source sharing help to reach real-time requirement and reduce chip area. At last of chapter 4, we analysis the trade-off between on-chip memory and the bus bandwidth and compare engines of four versions we have proposed.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T10:00:55Z (GMT). No. of bitstreams: 1 ntu-105-R03943049-1.pdf: 3774154 bytes, checksum: 2cd8cabbcdf52da349c79b976316befe (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	The Authorization of Oral Members for Research Dissertation i Acknowledgement iii Abstract in Chinese v Abstract vii Bibliography ix 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Applications of Computer Vision . . . . . . . . . . . . . 2 1.3 Motivation of Action Recognition ASIC . . . . . . . . . . . . 5 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 10 2 Challenge of Action Recognition and HOG3D 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Overview of Action Recognition Methods . . . . . . . . . . . 13 2.2.1 Human model based methods . . . . . . . . . . . . . 14 2.2.2 Holistic methods . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Local feature methods . . . . . . . . . . . . . . . . . 18 2.3 Overview of HOG3D Engine . . . . . . . . . . . . . . . . . . 19 2.3.1 Comparison between HOG3D and other features . . . 19 2.3.2 Speciﬁcation Deﬁnition . . . . . . . . . . . . . . . . . 20 2.3.3 Related Works of Hardware Architecture . . . . . . . 22 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Proposed Robust Action Recognition System 25 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 HOG3D Feature Extraction . . . . . . . . . . . . . . . . . . 29 3.3.1 3D Gradient Computation . . . . . . . . . . . . . . . 32 3.3.2 3D Orientation Quantization . . . . . . . . . . . . . . 32 3.3.3 Cell Histogram Computation . . . . . . . . . . . . . . 34 3.3.4 HOG3D Descriptor Computation . . . . . . . . . . . 34 3.4 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 Proposed Architecture Design of HOG3D Descriptor 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Hardware Architecture of Direct Implementation . . . . . . . 48 4.2.1 Data Flow and Data Reuse . . . . . . . . . . . . . . 48 4.2.2 Direct Mapping from Algorithm to Architecture . . . 52 4.2.3 Report of Direct Implementation . . . . . . . . . . . 57 4.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3.1 Another Modiﬁcation of HOG3D Algorithm . . . . . 58 4.3.2 Hardware Architecture of Optimized Implementation 60 4.3.3 Report of Optimized Implementation . . . . . . . . . 63 4.4 Analysis of Architectures with Data in DRAM . . . . . . . . 64 4.4.1 Architecture Modiﬁed from Direct Mapping . . . . . 65 4.4.2 Architecture Modiﬁed from Optimized Version . . . . 65 4.4.3 Comparison of Proposed Four Architectures . . . . . 66 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5 Conclusion 69 Bibliography 73
dc.language.iso	en
dc.title	動作辨識之三維梯度方向直方圖架構設計	zh_TW
dc.title	Architecture Design of Histograms of 3D Gradient Orientations for Action Recognition	en
dc.type	Thesis
dc.date.schoolyear	105-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃朝宗(Chao-Tsung Huang),賴永康(Yong-Kang Lai),陳美娟(Mei-Juan Chen),叢培貴(Pei-Kuei Tsung)
dc.subject.keyword	HOG3D特徵抽取,HOG3D特徵引擎,高幀率架構設計,資料重複使用設計,有效率面積優化架構,	zh_TW
dc.subject.keyword	HOG3D feature extraction,HOG3D feature engine,High frame rate architecture,Data reuse design,Area efficient architecture,	en
dc.relation.page	80
dc.identifier.doi	10.6342/NTU201603736
dc.rights.note	有償授權
dc.date.accepted	2016-11-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	3.69 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。