請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91569
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳良基 | zh_TW |
dc.contributor.advisor | Liang-Gee Chen | en |
dc.contributor.author | 陳菀瑜 | zh_TW |
dc.contributor.author | Wan-Yu Chen | en |
dc.date.accessioned | 2024-01-28T16:34:26Z | - |
dc.date.available | 2024-01-29 | - |
dc.date.copyright | 2024-01-28 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-08-10 | - |
dc.identifier.citation | [1] Chunhui Gu, Pablo Arbelaez, Yuanqing Lin, Kai Yu and Jitendra Malik, “Multi-component Models for Object Detection,” in ECCV. Springer, 2012, pp. 445–458.
[2] W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, “Understanding indoor scenes using 3d geometric phrases,” pp. 33–40, 2013. [3] V. R. Society, “Virtual Reality and Education,” http://www.vrs.org.uk/virtual-reality-education/, 2023, [Online; accessed 05-Jul-2023]. [4] S. Cangeloso, “Self-driving cars could save lives, gas,” http://www.geek.com/geek-cetera/self-driving-cars-1447453/, 2011, [Online]. [5] M. Munaro, F. Basso, and E. Menegatti, “Tracking people within groups with rgb-d data,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 2101–2107. [6] R. Shapovalov, “Object detection vs. Semantic segmentation,” http://computerblindness.blogspot.tw/2010/06/object-detection-vs-semantic.html, 2010, [Online]. [7] Wikipedia, “Outline of object recognition,”https://en.wikipedia.org/wiki/Outline of object recognition, 2023, [Online; accessed 05-July-2023]. [8] A. ESpingardeiro, “Antonio develops P37 S65 Elderly Care Bot,” http://www.redorbit.com/video/antonio-develops-p37-s65-elderly-care-bot-022713/, 2013, [Online;accessed 05-July-2023]. [9] W. Garage, “Pr2 Overview,” http://www.willowgarage.com/pages/pr2/overview, 2023, [Online]. [10] J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel,“Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding,” in Robotics and Automation (ICRA), 2010 IEEE International Conference on. IEEE, 2010, pp. 2308–2315. [11] S. I. E. E. Limited., “PlayStation VR (PSVR),” http://vr.cravelweb.com/special/playstation-vr-cinematic-mode-matome, 2023, [Online;accessed 05-July-2023]. [12] C. Group, “CG Garage Podcast No.61 Shane Scranton IrisVR,” https://labs.chaosgroup.com/index.php/cg-garage-podcast/cg-garage-podcast-61-shane-scranton-irisvr/, 2023, [Online; accessed 05-July-2023]. [13] E. Betters, “360-degree cameras: The best VR cams, no matter your budget,” http://www.pocket-lint.com/news/ 137301-360-degree-cameras-the-best-vr-cams-no-matter-your-budget, 2023, [Online; accessed 05-October-2016]. [14] Wikipedia, “Intelligent vehicle technologies,” https://en.wikipedia.org/wiki/Intelligent vehicle technologies, 2023, [Online; accessed 05-July-2023]. [15] S. Edelstein, “BMW Activeassist System Lets Self-Driving Cars Get Sideways And Keep You On The Road,” http://www.digitaltrends. com/cars/bmw-activeassist-introduced-at-ces-2014/#!EkGS4, 2014,[Online; accessed 05-October-2023]. [16] Wikipedia, “Google self-driving car,” https://en.wikipedia.org/wiki/Google self-driving car, 2023, [Online; accessed 05-July-2023]. [17] J. Xiao and L. Quan, “Multiple view semantic segmentation for street view images,” in 2009 IEEE 12th international conference on computer vision. IEEE, 2009, pp. 686–693. [18] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in European conference on computer vision. Springer, 2008, pp. 44–57. [19] Koffka, Kurt. , “Principles of Gestalt psychology.,” 1935. [20] Blum, H., “Biological shape and visual science. Journal of The oretical Biology,” in Journal of The oretical Biology, 1973, pp. 205–287. [21] Zhang, D. et al, “Evaluation of mpeg-7 shape descriptors against other shape descriptors,” in Multimedia Systems, 2003, pp. 15–30. [22] Pentland, Alex P, “Perceptual organization and the representation of natural form,” in Artificial Intelligence 28.3 , 1996, pp. 293–331. [23] Biederman, Irving., “Recognition-by-components: a theory of human image understanding.,” in Psychological review 94.2 , 1987. [24] Belongie Serge, Jitendra Malik, and Jan Puzicha., “Shape matching and object recognition using shape contexts.,” in Pattern Analysis and Machine Intelligence, IEEE Transactions, 2002, pp. 509–522. [25] Ferrari Vittorio et al, “Groups of adjacent contour segments for object detection.,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, pp. 36–51. [26] Wikipedia, “Big data,” https://en.wikipedia.org/wiki/Big data, 2023,[Online; accessed 05-July-2023]. [27] IBM, “What is cloud computing?” https://www.ibm.com/cloud-computing/what-is-cloud-computing, 2023, [Online; accessed 05-July-2023]. [28] Wikipedia, “List of most popular websites,”https://en.wikipedia.org/wiki/List of most popular websites, 2023, [Online; accessed 05-July-2023]. [29] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” inProceedings of the IEEE conference on computer vision and patternrecognition, 2014, pp. 580–587. [30] S. Tang and Y. Yuan, “Object detection based on convolutional neural network.” [31] S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware cnn model,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1134–1142. [32] “Rule of thumb for memory cost,” https://developer.arm.com/documentation/102540/0100/Analyzing-memory-bandwidth,2022/01/08, [Online7]. [33] Mohit Arora “The Art of Hardware Architecture: Design Methods and Techniques for Digital Design,” in Springer ISBN : 978-1-4614-0396-8, https://link.springer.com/book/10.1007/978-1-4614-0397-5, 2012. [34] K. Charafeddine, Faissal Ouardi , “Novel methodology to determine leakage power in standard cell library design ,” in Heliyon, 2020. [35] Domenick Helms, “Leakage Power Model for High Level Power Estimation,” in Thesis Report, https://www.researchgate.net/publication/304662183 Leakage Models for High Level Power Estimation, November 2009. [36] Weiping Liao, Lei He, and Kevin M. Lepak “ Temperature and Supply Voltage Aware Performance and Power Modeling at Microarchitecture Level ,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 2005. [37] Ping Liu et al,’ Human Action Recognition based on 3D SIFT and LDA Model. ’ in Riss, 2011. [38] Alexander Klaser et al,’A Spatio-Temporal Descriptor Based on 3D Gradients. ’ in BMVC, 2008. [39] P. Scovanner et al,’ A 3-dimensional SIFT descriptor and its application to action recognition. ’ in ACMM, 2007. [40] Ivan Laptev et al,’ Tutorial on Statistical and Structural Recognition of Human Actions. ’ in ECCV, 2010. [41] Heng Wang et al.’ Evaluation of local spatio-temporal features for action recognition. ’ in BMVC, 2009. [42] Yanli Ji et al.’ A Compact 3D Descriptor in ROI for Human ActionRecognition. ’ in IEEE TENCON, 2010. [43] C.-C. Chang and C.-J. Lin.’ LIBSVM: a library for support vector machines. ’ in ACM Transactions on Intelligent Systems and Technology,2011. [44] J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,”ACM Computing Surveys, vol. 43, no. 3, p. 16, 2011. [45] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” Tenth IEEE International Conference onComputer Vision, vol. 2, pp. 1395–1402, 2005. [46] I. Laptev, “On space-time interest points,” International Journal of Computer Vision, vol. 64, no. 2-3, pp. 107–123, 2005. [47] P. Doll ́ar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE, 2005, pp. 65–72. [48] M. A. Goodale and A. D. Milner, “Separate visual pathways for perception and action,” Trends in neurosciences, vol. 15, no. 1, pp. 20–25,1992. [49] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp.91–110, 2004. [50] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robustfeatures (surf),” Computer vision and image understanding, vol. 110,no. 3, pp. 346–359, 2008. [51] E. Rosten and T. Drummond, “Machine learning for high-speed cornerdetection,” in European conference on computer vision. Springer, 2006,pp. 430–443. [52] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in 2011 International conference on computer vision. IEEE, 2011, pp. 2548–2555. [53] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference oncomputer vision. IEEE, 2011, pp. 2564–2571. [54] A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina key-point,” in Computer vision and pattern recognition (CVPR),. IEEE,2012, pp. 510–517. [55] C. Whiten, R. Laganiere, and G.-A. Bilodeau, “Efficient action recognition with mofreak,” in Computer and Robot Vision (CRV), 2013 In-ternational Conference on. IEEE, 2013, pp. 319–325. [56] M. Ryoo, C.-C. Chen, J. Aggarwal, and A. Roy-Chowdhury, “An overview of contest on semantic description of human activities (sdha) 2010,” in Recognizing Patterns in Signals, Speech, Images and Videos. Springer, 2010, pp. 270–285. [57] “WHO: Number of people over 60 years set to double by 2050,” in cdc report, https://reurl.cc/k11MGn, 2015. [58] “Injuries Among Children and Teens,” in cdc report, https://www.cdc.gov/injury/features/child-injury/index.html#:∼:text=More\%20than\%207\%2C000\%20children\%20and,Child\%20injury\%20is\%20often\%20preventable., 2019. [59] Chang, Wan-Jung and Chen, Liang-Bi and Sie, Cheng-You and Yang,Ching-Hsiang, ‘An Artificial Intelligence Edge Computing-Based Assistive System for Visually Impaired Pedestrian Safety at Zebra Cross-ings,” in IEEE Transactions on Consumer Electronics. IEEE, vol. 67,no. 1, pp. 3–11, 2021. [60] “Internet of Things: Towards a smarter tomorrow market industry report,”https://www.mordorintelligence.com/industry-reports/internet-of-things-moving-towards-a-smarter-tomorrow-market-industry, 2022/01/08, [Online7]. [61] Jhang, Chuan-Jia and Xue, Cheng-Xin and Hung, Je-Min and Chang, Fu-Chun and Chang, Meng-Fan “Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,” in IEEE Transactions on Circuits and Systems I: Regular Papers, https://10.1109/TCSI.2021.3064189, vol. 68, no. 5, pp. 1773–1786, 2021. [62] Lemley, Joseph and Kar, Anuradha and Drimbarean, Alexandru and Corcoran, Peter“Convolutional Neural Network Implementation for Eye-Gaze Estimation on Low-Quality Consumer Imaging Systems,”in IEEE Transactions on Consumer Electronics, vol. 65, no. 2, pp.179–187, 2019. [63] Papaioannidis, Christos and Mademlis, Ioannis and Pitas, Ioannis“Fast CNN-based Single-Person 2D Human Pose Estimation for Autonomous Systems,” in IEEE Transactions on Circuits and Systems for Video Technology,pp. 1–1, 2022. [64] LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey “Deep Learning,” in nature. nature, vol. 521, no. 7553, pp. 436–444, 2015. [65] Song, Kaiyou and Yang, Hua and Yin, Zhouping“Multi-Scale Attention Deep Neural Network for Fast Accurate Object Detection,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 10, pp. 2972–2985, 2019. [66] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, andL. Fei-Fei, “Large-scale video classification with convolutional neuralnetworks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732. [67] Nie, Qiang and Wang, Xin and Wang, Jiangliu and Wang, Manlin and Liu, Yunhui ‘A child caring robot for the dangerous behavior detection based on the object recognition and human action recognition,” in 2018 IEEE International Conference on Robotics and Biomimetics(ROBIO). IEEE, 2018, pp. 1921-1926. [68] D. Xu, Tong. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang, J. Crowcroft, and P. Hui, and A. Rabinovich, “Edge Intelligence: Architectures, Challenges, and Applications,” in Computer Science arXiv: Networking and Internet Architecture, 2020, pp. 1–9. [69] Capra, Maurizio and Bussolino, Beatrice and Marchisio, Alberto and Shafique, Muhammad and Masera, Guido and Martina, Maurizio “An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks,” in Future Internet, https://www.mdpi.com/1999-5903/12/7/113, vol. 12, no. 7, pp. 436–444, 2020. [70] A. Sengupta and R. Chaurasia , ‘Secured Convolutional Layer IP Core in Convolutional Neural Network Using Facial Biometric,” in IEEE Transactions on Consumer Electronics. IEEE, vol. 68, no. 3, pp. 291–306, 2022. [71] Lou, Xiaoxuan and Guo, Shangwei and Li, Jiwei and Zhang, Tianwei“Ownership Verification of DNN Architectures via Hardware Cache Side Channels,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 8078–8093, 2022. [72] Chang, Jung-Woo and Kang, Keon-Woo and Kang, Suk-Ju , ‘An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution,” in IEEE Transactions on Circuits and Systems for Video Technology. IEEE,, vol. 30, no. 1, pp. 281–295, 2020. [73] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015, pp. 161–170. [74] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song et al., “Going deeper with embedded fpga platform for con-volutional neural network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM,2016, pp. 26–35. [75] Lemley, Joseph and Kar, Anuradha and Drimbarean, Alexandru and Corcoran, Peter, ‘A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/WINT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration,” in 2022 IEEE International Solid- State Circuits Conference (ISSCC). IEEE, vol. 65,pp. 1–3, 2022. [76] Yang, Yirong and Liu, Chunsheng and Chang, Faliang and Lu, Yansha and Liu, Hui“Driver Gaze Zone Estimation via Head Pose Fusion Assisted Supervision and Eye Region Weighted Encoding,” in IEEE Transactions on Consumer Electronics, vol. 67, no. 4, pp. 275–284, 2021. [77] A. G. Howard et al “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” in arXiv:1704.04861. IEEE, 2017. [78] Simonyan, Karen and Zisserman, Andrew “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in Computer Vision andPattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences. IEEE, 2014. [79] He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian“Deep Residual Learning for Image Recognition,” in Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences. IEEE, 2015. [80] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mo-bile devices,” in arXiv:1807.07928. [Online]. Available: https://arxiv.org/abs/1807.07928. IEEE, 2018. [81] Ju, Yuhao and Gu, Jie , ‘A 65nm Systolic Neural CPU Processor for Combined Deep Learning and General-Purpose Computing with 95% PE Utilization, High Data Locality and Enhanced End-to-End Performance,” in IEEE International Solid- State Circuits Conference(ISSCC). IEEE, vol. 65,pp. 1–3, 2022. [82] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE,vol. 86, no. 11, pp. 2278–2324, 1998. [83] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-tion with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [84] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012. [85] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [86] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385, 2015. [87] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolu-tions,” in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2015, pp. 1–9. [88] Song, Ruibing and Huang, Kejie and Wang, Zongsheng and Shen,Haibin , ‘A Reconfigurable Convolution-in-Pixel CMOS Image Sensor Architecture,” in IEEE Transactions on Circuits and Systems forVideo Technology. IEEE,, vol. 32, no. 10, pp. 7212–7225, 2022. [89] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, “14.5 eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in 2016 IEEE International Solid-State Circuits Conference(ISSCC). IEEE, 2016, pp. 262–263. [90] Cavigelli, Lukas and Benini, Luca“Origami: A 803-GOp/s/W Convolutional Network Accelerator,” in IEEE Transactions on Circuits and Systems for Video Technology, 10.1109/TCSVT.2016.2592330, vol. 27, no. 11, pp. 2461–2475, 2017. [91] Han, Song and Liu, Xingyu and Mao, Huizi and Pu, Jing and Pedram, Ardavan and Horowitz, Mark A. and Dally, William J. ‘EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in 2016 IEEEHot Chips 28 Symposium (HCS). arXiv, 2016. [92] Yuan, Zhe and Liu, Yongpan and Yue, Jinshan and Yang, Yixiong and Wang, Jingyu and Feng, Xiaoyu and Zhao, Jian and Li, Xueqing and Yang, Huazhong , “STICKER: An Energy-Efficient Multi-Sparsity Compatible Accelerator for Convolutional Neural Networks in 65-nm CMOS,” in IEEE Journal of Solid-State Circuits. IEEE, 2020. [93] Yang, Jianxun and Tu, Fengbin and Li, Yixuan and Wang, Yiqi and Liu, Leibo and Wei, Shaojun and Yin, Shouyi “GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating,” in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 10, pp. 4069–4082, 2022. [94] L.-C. Chiu, T.-S. Chang, J.-Y. Chen, and N. Y.-C. Chang, “Fast sift design for real-time visual feature extraction,” IEEE Transactions on Image Processing journal, vol. 22, no. 8, pp. 3158–3167, 2013. [95] L. Liu, W. Zhang, C. Deng, S. Yin, S. Cai, and S. Wei, “Surfex: A 57fps 1080p resolution 220mw silicon implementation for simplified speeded-up robust feature with 65nm process,” in Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. IEEE, 2013, pp. 1–4. [96] J.-S. Park, H.-E. Kim, and L.-S. Kim, “A 182 mw 94.3 f/s in full hd pattern-matching based image recognition accelerator for an embedded vision system in 0.13-cmos technology,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 5, pp. 832–845, 2013. [97] R. Roberts, C. Potthast, and F. Dellaert, “Learning general optical flow subspaces for egomotion estimation and detection of motion anomalies,” in Computer Vision and Pattern Recognition, (CVPR),. IEEE, 2009, pp. 57–64. [98] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971–987, 2002. [99] L. Yeffet and L. Wolf, “Local trinary patterns for human action recognition,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 492–497. [100] E. Rosten, “FAST Corner Detection – Edward Rosten,” https://www.edwardrosten.com/work/fast.html, 2015, [Online; accessed 05-Jul-2023]. [101] A. Canclini, M. Cesana, A. Redondi, M. Tagliasacchi, J. Ascenso, and R. Cilla, “Evaluation of low-complexity visual feature detectors and descriptors,” in Digital Signal Processing (DSP), 2013 18th International Conference on. IEEE, 2013, pp. 1–7. [102] A. Kl et al ̈aser, “Evaluation of local features for ac-tion recognition,” https://lear.inrialpes.fr/people/klaeser/research/descriptor evaluation, 2010, [Online; accessed 05-Jul-2023]. [103] E. D. Bello and P. A. Salvadeo et al, “An image descriptors extraction hardware-architecture inspired on human retina,” in Programmable Logic (SPL), 2014 IX Southern Conference on. IEEE, 2014, pp. 1–6. [104] Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. S. Huang et al, “Action detection in complex scenes with spatial and temporal ambiguities,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 128–135. [105] R. Poppe et al, “A survey on vision-based human action recognition,”Image and vision computing, vol. 28, no. 6, pp. 976–990, 2010. [106] A. F. Bobick and J. W. Davis et al, “The recognition of human movement using temporal templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257–267, 2001. [107] Y. Ke, R. Sukthankar, and M. Hebert et al, “Event detection in crowded videos,” in 2007 IEEE 11th International Conference on Computer Vision. IEEE, 2007, pp. 1–8. [108] M. Rodriguez et al, “Spatio-temporal maximum average correlation height templates in action recognition and video summarization,” 2010. [109] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A biologically inspired system for action recognition,” in ICCV 2007. Ieee, 2007, pp. 1–8. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91569 | - |
dc.description.abstract | 邊緣智慧視覺處理技術在現今許多應用扮演不可或缺的角色,例如事件檢測、圖像識別、動作識別、圖像質量增強、人機交互和監控等等。在本論文中,我們將探討邊緣智慧影像處理系統晶片設計的方法與策略。我們首先討論了邊緣智慧處理應用程序的系統要求、挑戰和規範。同時,我們介紹了功耗定理、效能模型,面積成本模型與系統頻寬模型和高效能設計的方法論,並且使用人工智慧電腦視覺的兩大領域關鍵技術的硬體實作,來進行實體範例的實作、分析與驗證。 本論文分為兩大部分。 在第一部份我們介紹利用三維特徵偵測演算法開發之高效率人工智慧視覺動作辨識系統架構設計。
此視覺動作辨識系統建立在電腦視覺及機器學習技術之上。我們分析並選用以人類視網膜理論為基礎之MoFreak特徵擷取,支援高辨識度的動作偵測。此系統利用機器學習演算法達到高準確率的辨識結果與低運算量的效能需求。 為了達到即時處理之需求,我們利用大型積體電路設計來實現本系統。我們探討硬體設計最佳化之流程及方法,在不影響準確率的情況下,降低硬體資源需求及消耗。我們利用40奈米半導體製程實現此智慧動作特徵擷取系統晶片。此晶片成本為1100 Kgate 邏輯閘數與7.9 Kbytes記憶體數。此晶片之時脈為200MHz。此系統晶片能支援FHD (1920x1080) 120fps動作特徵擷取追蹤。藉由我們所提出的區塊特徵點處理技術將鄰近的特徵點整合起來進行運算來共用記憶體頻寬使用量與運算,並且高斯過濾器的水平垂直解構得以解省處理單元的需求。對於MoFreak動作特徵偵測,此晶片在FHD解析度下能夠達到120fps之處理效率,且僅使用417.6 Mbytes/sec的頻寬需求。 在第二部分中,我們介紹了基於深度神經網絡技術的邊緣人工智能視覺處理系統。我們介紹了功耗定理和相關的現狀研究成果。我們解決了他們的高內存使用和複雜性問題的挑戰。此外,我們介紹所使用的一種用於 DNN 網絡的高效深度神經網路算法。我們的框架非常適合具有各種電池和面積限制的邊緣人工智慧應用程序。第四部分描述了設計方法、挑戰、創新和建議的硬件架構。 為了達到即時處理之需求,我們利用大型積體電路設計來實現本系統。我們探討硬體設計最佳化之流程及方法,在不影響準確率的情況下,降低硬體資源需求及消耗。我們利用28奈米半導體製程實現此邊緣人工智慧視覺處理系統。 我們探討硬體架構優化以減少硬件面積和功耗。第五部分從幾個方面介紹了實驗結果。此晶片大小為1.02mm2。此晶片之時脈為370MHz,核心與輸入輸出腳位電壓分別為1V。此硬體設計能夠達到3.53-6.7TOPS/W功率效率以及207.4GOPS/mm2面積效率。此系統晶片能夠最多支援同時288 乘加運算單元。藉由邊緣人工智慧視覺處理器,此晶片能夠提升1.62倍之功率效率以及提升至少1.79倍之面積效率。對於MobileNet V2人工智慧處理,此晶片在VGA解析度下能夠達到30fps之處理效率。此晶片之平均功率消耗為31.02-64.38mW,同時達到3.53-6.7 TOPS/W之功率效率。相對於目前文獻的最佳做法,我們的做法能功耗效能提高了 5.34 倍,面積效率提高了 11.58 倍。此晶片之平均功率消耗為31.02-64.38mW。 最後,結論在第六部分提出。我們總結了主要貢獻。並在第七章提供了未來的方向。 | zh_TW |
dc.description.abstract | Artificial Intelligent (AI) vision processing nowadays plays an essential role in many applications such as event detection, image recognition, action recognition, image quality enhancement, image synthesis, human-machine interaction, and surveillance. Meanwhile, battery life and package size are essential constraints for AI applications on edge devices. Thus, an efficient hardware architecture is essential to support edge AI applications. This dissertation explores high-efficient and scalable vision-processing hardware architecture concepts and techniques developed with efficient artificial intel-
ligent algorithms. We investigate the general design methodology and realize the techniques for the efficient vision processing hardware framework. The dissertation is divided into two parts. In the first part, we present an intelligent vision-based action recognition system based on computer vision and machine learning techniques with specific feature extraction. We discuss the system requirements and specifications for vision-based action recognition applications. High-accurate and efficient action recognition is achieved through the machine learning-based method. We present an efficient spatial first 3D HoG algorithm for efficient action recognition tasks firstly. Then we further analyzed the robust spatiotemporal MoFreak feature-based algorithm to achieve state-of-the-art accuracy with the trade-off of higher computation. To achieve real-time specification, we implement the system with VLSI hardware acceleration. Architecture optimization is considered to reduce the hardware cost without significantly degrading the accuracy. We show an intelligent vision SoC implemented with 40nm CMOS process technology. We design a two-phase architecture to balance the throughput difference between feature detection and feature description. Binary-mask image is adopted to detect feature point locations efficiently. For feature description, to reduce the high bandwidth requirement for spatial-temporal MoFREAK features, a block-based keypoint technique is proposed to reduce bandwidth for grouped features. The synthesis result of our proposed architecture in TSMC 40nm technology works at 200 MHz with 1039K gate counts, providing 12K features points at 120 fps. Combining binary-mask image and block-based key points reduce about 81 percent of the feature extraction system bandwidth. The second part explores the high-efficient edge AI processing systems developed with efficient deep neural network (DNN) algorithms to support general vision applications. Unlike the feature extraction-based traditional machine learning method, the general edge AI processor can support several DNN algorithms and applications. We implemented the artificial intelligent vision processor SoC implemented in a 28nm CMOS process. Conventional DNN AI processors exploit complex memory pads, dedicated processing element (PE) buffers, and mass shift registers to support data reuse for memory bandwidth reduction. However, such architectures incur significant area overhead and power consumption. This dissertation proposes a novel channel-interleaved memory (CIM) footprint and dual-level memory pad (DLMP) control to enhance memory bandwidth utilization and simplify the memory pad circuit. Interleaved channel data are read from the memory bus with a single access and stored in a ping-pong buffer for reuse. Dynamic power is reduced by replacing the shift register PE mechanism with simplified mux selection. A hybrid memory buffer reduces on-chip memory use through dynamic memory allocation. Finally, a joint stationary data reuse approach is adopted to process interleaved channel data efficiently. Experimental results demonstrate that the proposed architecture achieves a state-of-the-art area efficiency of 207.4 GOPS/mm2 while maintaining a high power efficiency of 3.53 TOPS/W. The die size is 1.02 mm2. 3.53-6.7 TOPS/W power efficiency and 207.4GOPS/mm2 area efficiency is achieved. The system supports at most 640x480 30fps MobileNet V2 computation. It raises a 5.34x improvement in power efficiency and an 11.58x improvement in area efficiency. The work achieves 31.02-64.38mW power consumption. The dissertation is divided into seven chapters. In the first chapter, we introduce the edge AI vision processing system and applications. Feature extraction-based and deep neural network-based Edge AI vision processing systems are both introduced in this chapter. Chapter II presents the power theorem and discusses the system requirements, hardware challenge, specification, and design concept for edge AI processor architecture. In Chapter III, an efficient feature extraction algorithm and architecture design for real-time action recognition is introduced. We address the challenge of their high memory usage and significant complexity problems. In Chapter IV, efficient deep neural network algorithms for multi-DNN networks are discussed. Our framework is developed for edge AI applications with various battery and area budgets. The design methodology, innovations, and proposed hardware architecture are described in Chapter V. To achieve real-time criteria, we implement the system in VLSI. Architecture optimization is investigated to reduce hardware area and power. In Chapter V, experimental results are also presented in several aspects. The conclusion is presented in Chapter VI. We summarize the principal contribution. Finally, we provide future directions in Chapter VII. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-01-28T16:34:26Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-01-28T16:34:26Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Abstract xi
1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Applications of Computer Vision . . . . . . . . . . . . . . . 3 1.3 Cognitive Science . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Feature Extraction Based Vision Recognition . . . . . . . . . 8 1.4.1 Deep Neural Network Based Vision Recognition . . . 9 1.5 Dissertation Organization . . . . . . . . . . . . . . . . . . . 14 2 Edge Artificial Intelligence Processing System Design Methodology 15 2.1 Design Considerations and Our Contributions . . . . . . . . 15 2.1.1 Edge AI Processor System Architecture Metric . . . . 15 2.1.2 Power Constraint . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Power Theorem and Power Model . . . . . . . . . . . 18 2.1.4 Dynamic Voltage Frequency Scaling . . . . . . . . . . 21 2.1.5 Performance Model . . . . . . . . . . . . . . . . . . . 22 2.1.6 Area Model . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.7 Bandwidth Model . . . . . . . . . . . . . . . . . . . . 23 2.1.8 System Methodology . . . . . . . . . . . . . . . . . . 23 3 Algorithm Development and Architecture Design for Efficient Edge AI Feature Based Vision Processing Systems 25 3.1 Target Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Resource Sharing . . . . . . . . . . . . . . . . . . . . 41 3.2.2 Keypoint Detector . . . . . . . . . . . . . . . . . . . 42 3.3 Architecture Optimization . . . . . . . . . . . . . . . . . . . 46 3.3.1 Implementation Results . . . . . . . . . . . . . . . . 48 4 High Efficiency Deep Neural Network based Edge AI Processor Algorithm and Architecture Consideration 53 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.1 Edge AI Hardware Demand and Use Case . . . . . . 54 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Development and History of Essential Deep Neural Networks 61 4.3.1 LeNet . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.2 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3.3 VGG Net . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.4 ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3.5 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . 67 4.3.6 MobileNet . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.1 DNN VLSI Architectures . . . . . . . . . . . . . . . . 70 4.5 Algorithms for an Efficient DNN AI Processor . . . . . . . . 72 4.5.1 3 × 3 Convolution Filter . . . . . . . . . . . . . . . . 73 4.5.2 1 × 1 Convolution . . . . . . . . . . . . . . . . . . . . 73 4.5.3 3 × 3 Depthwise Filter . . . . . . . . . . . . . . . . . 73 4.5.4 3 × 3 Maximum Pooling . . . . . . . . . . . . . . . . 74 4.5.5 Rectified Linear Unit . . . . . . . . . . . . . . . . . . 74 5 Architecture Design of the DNN based Efficient Vision Processing Architecture 79 5.1 Design Methodology and Data Flow of the Proposed DNN Hardware Architecture . . . . . . . . . . . . . . . . . . . . . 79 5.1.1 System Methodology . . . . . . . . . . . . . . . . . . 80 5.1.2 Challenge and Innovation . . . . . . . . . . . . . . . 82 5.1.3 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . 82 5.1.4 Channel-Interleaved Memory (CIM) Footprint . . . . 85 5.1.5 Dual-Level Memory Pad (DLMP) Architecture . . . . 86 5.1.6 Joint Stationary Data Reuse (JSDR) Architecture . . 88 5.2 Data Reuse Scheme for an Efficient DNN AI Processor . . . 90 5.3 Hardware Architecture Deep Dive of an Efficient DNN AI Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.1 Hybrid Memory Buffer (HMB) Architecture . . . . . 92 5.3.2 Input Cropping and Zero Padding . . . . . . . . . . . 94 5.3.3 Channel-Interleaved Memory (CIM) Footprint . . . . 94 5.3.4 Dual-Level Memory Pad (DLMP) Architecture Detail Description . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.5 Joint Stationary Memory Access Architecture Detail Description . . . . . . . . . . . . . . . . . . . . . . . 96 5.3.6 Other Implementation Details . . . . . . . . . . . . . 101 5.3.7 Computation Reduction . . . . . . . . . . . . . . . . 103 5.4 Processing Data Flow Evaluation . . . . . . . . . . . . . . . 107 5.4.1 Input Stationary . . . . . . . . . . . . . . . . . . . . 107 5.4.2 Row Stationary . . . . . . . . . . . . . . . . . . . . . 107 5.4.3 Weight Stationary . . . . . . . . . . . . . . . . . . . 107 5.4.4 Output Stationary . . . . . . . . . . . . . . . . . . . 108 5.4.5 Joint Stationary . . . . . . . . . . . . . . . . . . . . . 108 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 108 5.5.1 Implementation Results and Comparison . . . . . . . 108 5.5.2 Scalable Area and Power Efficiency . . . . . . . . . . 114 5.5.3 Power and Area Efficiency . . . . . . . . . . . . . . . 115 5.5.4 Power and Area Reduction Breakdown Analysis . . . 115 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Conclusion 117 7 Future Work 121 7.1 Performance, Power, Area and Bandwidth model Consolidation . . . . . . 121 7.2 Edge AI Processor Generalization . . . . . . . . . . . . . . . 122 7.3 High Definition Image Quality Enhancement Framework . . 122 Reference 125 | - |
dc.language.iso | en | - |
dc.title | 針對高效能邊緣人工智慧視覺處理系統之探討與可調式硬體架構設計 | zh_TW |
dc.title | Exploration and Scalable Hardware Architecture Design of High-Efficiency Edge Artificial Intelligence Vision Processing System | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 博士 | - |
dc.contributor.oralexamcommittee | 賴永康;蔡宗漢;陳美娟 ;楊佳玲;簡韶逸;李佩君;黃毓文 | zh_TW |
dc.contributor.oralexamcommittee | Yeong-Kang Lai;Tsung-Han Tsai ;Mei-juan Chen;CL Yang;Shao-Yi Chien;Pei-Jun Lee;Yu-Wen Huang | en |
dc.subject.keyword | 邊緣人工智慧,積體電路硬體架構,視覺處理系統,深度神經網路,高效能, | zh_TW |
dc.subject.keyword | Edge AI,VLSI Hardware,Vision Processing System,Deep Neural Network,High Efficiency, | en |
dc.relation.page | 139 | - |
dc.identifier.doi | 10.6342/NTU202301042 | - |
dc.rights.note | 未授權 | - |
dc.date.accepted | 2023-08-11 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 電子工程學研究所 | - |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 目前未授權公開取用 | 4.23 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。