針對臉部偵測之卷積神經網路於可程式邏輯裝置之架構設計

Bin-Syh Yu; 于賓四

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78225

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸
dc.contributor.author	Bin-Syh Yu	en
dc.contributor.author	于賓四	zh_TW
dc.date.accessioned	2021-07-11T14:46:44Z	-
dc.date.available	2021-07-05
dc.date.copyright	2016-07-05
dc.date.issued	2016
dc.date.submitted	2016-06-24
dc.identifier.citation	[1] S. S. Farfade, M. J. Saberian, and L.-J. Li, “Multi-view face detection us- ing deep convolutional neural networks,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR ’15), 2015, pp. 643–650. [2] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural net- work cascade for face detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 5325–5334. [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. in Neural Information Pro- cessing Systems (NIPS 2012), 2012, p. 4. [4] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolu- tional networks,” in Proc. International Conference on Learning Represen- tations (ICLR2014). CBLS, April 2014. [5] “Net surgery,” https://github.com/BVLC/caffe/blob/master/examples/net surgery.ipynb, 2015, [Online]. [6] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep con- volutional neural networks for object recognition,” in Proc. IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 1131–1135. [7] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in Proc. 23rd International Symposium on Field-Programmable Gate Arrays (FPGA), 2015. [8] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful vi- sual performance model for multicore architectures,” Communications of the ACM, vol. 52, pp. 65–76, April 2009. [9] Xilinx, “Zynq-7000 all programmable soc overview (ds190 v1.9),” Tech. Rep., January 2016. [10] P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, pp. 137–154, May 2004. [11] B. Froba and A. Ernst, “Face detection with the modified census transform,” in Proceedings. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp. 91–96. [12] H.Jin,Q.Liu,H.Lu,andX.Tong,“Facedetectionusingimprovedlbpunder bayesian framework,” in Proc. Third International Conference on Image and Graphics (ICIG), vol. 57, December 2004, pp. 306–309. [13] L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li, “Face detection based on multi-block lbp representation,” in Proceedings of the 2007 international conference on Advances in Biometrics (ICB), August 2007, pp. 11–18. [14] S. Yan, S. Shan, X. Chen, and W. Gao, “Locally assembled binary (lab) feature with feature-centric cascade for fast and accurate face detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008, pp. 1–7. [15] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human detector with partial occlusion handling,” in Proc. IEEE 12th International Conference on Com- puter Vision, September 2009, pp. 32–39. [16] T.Mita,T.Kaneko,andO.Hori,“Jointhaar-likefeaturesforfacedetection,” in Proc. Tenth IEEE International Conference on Computer Vision (ICCV), vol. 2, October 2005, pp. 1619–1626. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Sur- passing human-level performance on imagenet classification,” CoRR, vol. abs/1502.01852, 2015. [18] A. Karpathy and F. Li, “Deep visual-semantic alignments for generating im- age descriptions,” CoRR, vol. abs/1412.2306, 2014. [19] N. Jaitly, V. Vanhoucke, and G. E. Hinton, “Autoregressive product of multi- frame predictions can improve the accuracy of hybrid models,” in Proc. INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, 2014, pp. 1905–1909. [20] X. Zhang and Y. LeCun, “Text understanding from scratch,” CoRR, vol. abs/1502.01710, 2015. [21] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting parameters in deep learning,” CoRR, vol. abs/1306.0543, 2013. [22] V. Jain and E. Learned-Miller, “Fddb: A benchmark for face detection in unconstrained settings,” University of Massachusetts, Amherst, Tech. Rep. UM-CS-2010-009, 2010. [23] M. Kstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization,” in Proc. IEEE International Conference on Computer Vision Workshops (ICCV Workshops), October 2011, pp. 2144–2151. [24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014. [25] P.-H. Hung and S.-Y. Chien, “Deep neural networks on video sensor net- works: Quantized and distributed approaches,” Master’s thesis, National Tai- wan University, June 2015. [26] S.Gupta,A.Agrawal,K.Gopalakrishnan,andP.Narayanan,“Deeplearning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015, pp. 1737–1746. [27] Y.BengioandX.Glorot,“Understandingthedifficultyoftrainingdeepfeed- forward neural networks,” in Proceedings of AISTATS 2010, vol. 9, May 2010, pp. 249–256. [28] J. Cong and B. Xiao, “Minimizing computation in convolutional neural net- works,” Artificial Neural Networks and Machine Learning (ICANN), pp. 281–290, 2014. [29] L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, “Polyhedral-based data reuse optimization for configurable computing,” in Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA), 2013, pp. 29–38.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78225	-
dc.description.abstract	卷積神經網路(Convolutional Neural Networks)在影像辨識和物件偵測達到絕佳的準確率,然而而受限於龐大的計算、存儲需求和大量的傳輸,使得它們難以被實現於行動及嵌入式裝置。在這篇論文當中 , 基於一個針對臉部偵測的卷積神經網路串聯 ( CNN Cascade),在功率的限制下,一些優化被採用以提高計算速率。在這些改進之下,可以同時節省運算量及存儲和頻寬需求。首先,卷積神經網路串聯的第一級轉換成了一個完全卷積神經網路,藉此減少了第一級 83%的運算量。再者,一個有效率之量化方法被採用以量化模型參數, 將原先用 32 位元浮點數之參數改用用 2 位元定點數來表示,藉此省下 93.75%的參數存儲空間需求。最後,一個卷積神經網路加速器(CNN Accelerator)被實現於可程式邏輯裝置(Field Programmable Gate Array)。透過系統性的分析方法,可以找到消耗最少頻寬及可程式邏輯裝置資源下,最高速率的架構。除此之外,透過先前量化的改變可以提高最終架構的計算能力。	zh_TW
dc.description.abstract	Convolutional neural networks (CNNs) have emerged to provide powerful dis- criminative capability, especially in the world of image recognition and object detection. However, their massive computation requirements, storage and memory accesses make them hard to be deployed on mobile or embedded systems. In this thesis, a few optimizations based on a CNN cascade architecture for face detection are proposed to increase throughput while minimizing computation, storage and bandwidth requirement under power constraints. First, the first net of the CNN cascade is converted to a fully convolutional network to reduce 83% of the computation in the first stage. Second, an efficient method is applied to quantize the model parameters. This is done by adopting a retraining method, reducing the word length of the parameters from 32-bit floating points to 2-bit fixed points, resulting in 93.75% less parameter memory size. Finally, a CNN accelerator is implemented on a Xilinx XC7020 FPGA board. We quantitatively analyze the computing throughput and required bandwidth us- ing the roofline model, an analytical design scheme, to find the solution with best performance and lowest FPGA resource requirement. Furthermore, we show that more computational ability benefits from the quantizing optimization.	en
dc.description.provenance	Made available in DSpace on 2021-07-11T14:46:44Z (GMT). No. of bitstreams: 1 ntu-105-R02943121-1.pdf: 3035764 bytes, checksum: 13cc288bc845a8baa0b4fd997ed478fb (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	Abstract i List of Figures vii List of Tables ix 1 Introduction 1 1.1 Face Detection ........................... 1 1.2 Deep Learning ........................... 2 1.2.1 Introduction......................... 2 1.2.2 Output Hypothesis ..................... 4 1.2.3 Feedforward Process.................... 4 1.2.4 Backpropagation ...................... 5 1.3 Challenges.............................. 6 1.4 Thesis Organization......................... 6 2 Face Detection Architecture and Fully Convolutional Networks 7 2.1 Introduction............................. 7 2.2 A CNN Cascade for Face Detection ................ 7 2.2.1 Overall Framework..................... 8 2.2.2 Calibration Nets ...................... 9 2.2.3 Training Process ...................... 9 2.3 Fully Convolutional Networks ................... 10 2.3.1 Introduction......................... 10 2.3.2 Example .......................... 10 2.3.3 Applications ........................ 11 2.4 Fully Convolutional Network Version of CNN Cascade . . . 12 2.4.1 Architecture......................... 12 2.4.2 Computation Reduction .................. 14 2.5 Summary .............................. 15 3 Quantizing 17 3.1 Introduction............................. 17 3.2 Hard Quantizing........................... 19 3.3 Stochastic Rounding ........................ 19 3.4 Modifying the Cost Function.................... 21 3.5 Network Retraining......................... 22 3.6 Summary .............................. 23 4 Hardware Implementation 25 4.1 Introduction............................. 25 4.2 Potential Accelerations of a Typical CNN . . . . . . . . . . . . . 25 4.3 The Roofline Model......................... 26 4.4 LoopTiling ............................. 27 4.5 Computation Optimization ..................... 28 4.5.1 Loop Unrolling....................... 28 4.5.2 Tile Size Selection ..................... 30 4.6 CTC Ratio and Benefits from Quantizing . . . . . . . . . . . . . 30 4.7 Optimal Parameters......................... 32 4.8 Block Diagram ........................... 32 4.9 Computation Engine ........................ 33 4.10 Experiments............................. 34 4.10.1 Development Device.................... 34 4.10.2 Performance Comparison to CPU . . . . . . . . . . . . 36 4.11 Summary .............................. 36 5 Conclusion 37 Reference 39
dc.language.iso	en
dc.title	針對臉部偵測之卷積神經網路於可程式邏輯裝置之架構設計	zh_TW
dc.title	Architecture Design of Convolutional Neural Networks for Face Detection on FPGA Platforms	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	楊明玄,黃朝宗,莊永裕,傅立成
dc.subject.keyword	臉部偵測,卷積神經網路,可程式邏輯裝置,架構設計,	zh_TW
dc.subject.keyword	Architecture Design,Neural Networks,Face Detection,FPGA Platforms,	en
dc.relation.page	42
dc.identifier.doi	10.6342/NTU201600483
dc.rights.note	有償授權
dc.date.accepted	2016-06-26
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-R02943121-1.pdf 目前未授權公開取用	2.96 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。