請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78225
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 簡韶逸 | |
dc.contributor.author | Bin-Syh Yu | en |
dc.contributor.author | 于賓四 | zh_TW |
dc.date.accessioned | 2021-07-11T14:46:44Z | - |
dc.date.available | 2021-07-05 | |
dc.date.copyright | 2016-07-05 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-06-24 | |
dc.identifier.citation | [1] S. S. Farfade, M. J. Saberian, and L.-J. Li, “Multi-view face detection us- ing deep convolutional neural networks,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR ’15), 2015, pp. 643–650.
[2] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural net- work cascade for face detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 5325–5334. [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. in Neural Information Pro- cessing Systems (NIPS 2012), 2012, p. 4. [4] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolu- tional networks,” in Proc. International Conference on Learning Represen- tations (ICLR2014). CBLS, April 2014. [5] “Net surgery,” https://github.com/BVLC/caffe/blob/master/examples/net surgery.ipynb, 2015, [Online]. [6] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep con- volutional neural networks for object recognition,” in Proc. IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 1131–1135. [7] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in Proc. 23rd International Symposium on Field-Programmable Gate Arrays (FPGA), 2015. [8] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful vi- sual performance model for multicore architectures,” Communications of the ACM, vol. 52, pp. 65–76, April 2009. [9] Xilinx, “Zynq-7000 all programmable soc overview (ds190 v1.9),” Tech. Rep., January 2016. [10] P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, pp. 137–154, May 2004. [11] B. Froba and A. Ernst, “Face detection with the modified census transform,” in Proceedings. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp. 91–96. [12] H.Jin,Q.Liu,H.Lu,andX.Tong,“Facedetectionusingimprovedlbpunder bayesian framework,” in Proc. Third International Conference on Image and Graphics (ICIG), vol. 57, December 2004, pp. 306–309. [13] L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li, “Face detection based on multi-block lbp representation,” in Proceedings of the 2007 international conference on Advances in Biometrics (ICB), August 2007, pp. 11–18. [14] S. Yan, S. Shan, X. Chen, and W. Gao, “Locally assembled binary (lab) feature with feature-centric cascade for fast and accurate face detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008, pp. 1–7. [15] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human detector with partial occlusion handling,” in Proc. IEEE 12th International Conference on Com- puter Vision, September 2009, pp. 32–39. [16] T.Mita,T.Kaneko,andO.Hori,“Jointhaar-likefeaturesforfacedetection,” in Proc. Tenth IEEE International Conference on Computer Vision (ICCV), vol. 2, October 2005, pp. 1619–1626. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Sur- passing human-level performance on imagenet classification,” CoRR, vol. abs/1502.01852, 2015. [18] A. Karpathy and F. Li, “Deep visual-semantic alignments for generating im- age descriptions,” CoRR, vol. abs/1412.2306, 2014. [19] N. Jaitly, V. Vanhoucke, and G. E. Hinton, “Autoregressive product of multi- frame predictions can improve the accuracy of hybrid models,” in Proc. INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, 2014, pp. 1905–1909. [20] X. Zhang and Y. LeCun, “Text understanding from scratch,” CoRR, vol. abs/1502.01710, 2015. [21] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting parameters in deep learning,” CoRR, vol. abs/1306.0543, 2013. [22] V. Jain and E. Learned-Miller, “Fddb: A benchmark for face detection in unconstrained settings,” University of Massachusetts, Amherst, Tech. Rep. UM-CS-2010-009, 2010. [23] M. Kstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization,” in Proc. IEEE International Conference on Computer Vision Workshops (ICCV Workshops), October 2011, pp. 2144–2151. [24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014. [25] P.-H. Hung and S.-Y. Chien, “Deep neural networks on video sensor net- works: Quantized and distributed approaches,” Master’s thesis, National Tai- wan University, June 2015. [26] S.Gupta,A.Agrawal,K.Gopalakrishnan,andP.Narayanan,“Deeplearning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015, pp. 1737–1746. [27] Y.BengioandX.Glorot,“Understandingthedifficultyoftrainingdeepfeed- forward neural networks,” in Proceedings of AISTATS 2010, vol. 9, May 2010, pp. 249–256. [28] J. Cong and B. Xiao, “Minimizing computation in convolutional neural net- works,” Artificial Neural Networks and Machine Learning (ICANN), pp. 281–290, 2014. [29] L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, “Polyhedral-based data reuse optimization for configurable computing,” in Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA), 2013, pp. 29–38. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78225 | - |
dc.description.abstract | 卷積神經網路(Convolutional Neural Networks)在影像辨識和物件偵測達到絕佳的準確率,然而而受限於龐大的計算、存儲需求和大量的傳輸,使得它們難以被實現於行動及嵌入式裝置。
在 這 篇 論 文 當 中 , 基 於 一 個 針 對 臉 部 偵 測 的 卷 積 神 經 網 路 串 聯 ( CNN Cascade),在功率的限制下,一些優化被採用以提高計算速率。在這些改進之下,可以同時節省運算量及存儲和頻寬需求。 首先,卷積神經網路串聯的第一級轉換成了一個完全卷積神經網路,藉此減少了第一級 83%的運算量。再者,一個有效率之量化方法被採用以量化模型參數, 將原先用 32 位元浮點數之參數改用用 2 位元定點數來表示,藉此省下 93.75%的參數存儲空間需求。 最後,一個卷積神經網路加速器(CNN Accelerator)被實現於可程式邏輯裝置(Field Programmable Gate Array)。透過系統性的分析方法,可以找到消耗最少頻寬及可程式邏輯裝置資源下,最高速率的架構。除此之外,透過先前量化的改變可以提高最終架構的計算能力。 | zh_TW |
dc.description.abstract | Convolutional neural networks (CNNs) have emerged to provide powerful dis- criminative capability, especially in the world of image recognition and object detection. However, their massive computation requirements, storage and memory accesses make them hard to be deployed on mobile or embedded systems.
In this thesis, a few optimizations based on a CNN cascade architecture for face detection are proposed to increase throughput while minimizing computation, storage and bandwidth requirement under power constraints. First, the first net of the CNN cascade is converted to a fully convolutional network to reduce 83% of the computation in the first stage. Second, an efficient method is applied to quantize the model parameters. This is done by adopting a retraining method, reducing the word length of the parameters from 32-bit floating points to 2-bit fixed points, resulting in 93.75% less parameter memory size. Finally, a CNN accelerator is implemented on a Xilinx XC7020 FPGA board. We quantitatively analyze the computing throughput and required bandwidth us- ing the roofline model, an analytical design scheme, to find the solution with best performance and lowest FPGA resource requirement. Furthermore, we show that more computational ability benefits from the quantizing optimization. | en |
dc.description.provenance | Made available in DSpace on 2021-07-11T14:46:44Z (GMT). No. of bitstreams: 1 ntu-105-R02943121-1.pdf: 3035764 bytes, checksum: 13cc288bc845a8baa0b4fd997ed478fb (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | Abstract i
List of Figures vii List of Tables ix 1 Introduction 1 1.1 Face Detection ........................... 1 1.2 Deep Learning ........................... 2 1.2.1 Introduction......................... 2 1.2.2 Output Hypothesis ..................... 4 1.2.3 Feedforward Process.................... 4 1.2.4 Backpropagation ...................... 5 1.3 Challenges.............................. 6 1.4 Thesis Organization......................... 6 2 Face Detection Architecture and Fully Convolutional Networks 7 2.1 Introduction............................. 7 2.2 A CNN Cascade for Face Detection ................ 7 2.2.1 Overall Framework..................... 8 2.2.2 Calibration Nets ...................... 9 2.2.3 Training Process ...................... 9 2.3 Fully Convolutional Networks ................... 10 2.3.1 Introduction......................... 10 2.3.2 Example .......................... 10 2.3.3 Applications ........................ 11 2.4 Fully Convolutional Network Version of CNN Cascade . . . 12 2.4.1 Architecture......................... 12 2.4.2 Computation Reduction .................. 14 2.5 Summary .............................. 15 3 Quantizing 17 3.1 Introduction............................. 17 3.2 Hard Quantizing........................... 19 3.3 Stochastic Rounding ........................ 19 3.4 Modifying the Cost Function.................... 21 3.5 Network Retraining......................... 22 3.6 Summary .............................. 23 4 Hardware Implementation 25 4.1 Introduction............................. 25 4.2 Potential Accelerations of a Typical CNN . . . . . . . . . . . . . 25 4.3 The Roofline Model......................... 26 4.4 LoopTiling ............................. 27 4.5 Computation Optimization ..................... 28 4.5.1 Loop Unrolling....................... 28 4.5.2 Tile Size Selection ..................... 30 4.6 CTC Ratio and Benefits from Quantizing . . . . . . . . . . . . . 30 4.7 Optimal Parameters......................... 32 4.8 Block Diagram ........................... 32 4.9 Computation Engine ........................ 33 4.10 Experiments............................. 34 4.10.1 Development Device.................... 34 4.10.2 Performance Comparison to CPU . . . . . . . . . . . . 36 4.11 Summary .............................. 36 5 Conclusion 37 Reference 39 | |
dc.language.iso | en | |
dc.title | 針對臉部偵測之卷積神經網路於可程式邏輯裝置之架構設計 | zh_TW |
dc.title | Architecture Design of Convolutional Neural Networks for Face Detection on FPGA Platforms | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 楊明玄,黃朝宗,莊永裕,傅立成 | |
dc.subject.keyword | 臉部偵測,卷積神經網路,可程式邏輯裝置,架構設計, | zh_TW |
dc.subject.keyword | Architecture Design,Neural Networks,Face Detection,FPGA Platforms, | en |
dc.relation.page | 42 | |
dc.identifier.doi | 10.6342/NTU201600483 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2016-06-26 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-R02943121-1.pdf 目前未授權公開取用 | 2.96 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。