改進單次多框偵測器架構與後處理

Yao-Ren Chang; 張耀仁

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68543

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均
dc.contributor.author	Yao-Ren Chang	en
dc.contributor.author	張耀仁	zh_TW
dc.date.accessioned	2021-06-17T02:24:42Z	-
dc.date.available	2017-08-24
dc.date.copyright	2017-08-24
dc.date.issued	2017
dc.date.submitted	2017-08-18
dc.identifier.citation	Object Detection [1] Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104(2), pp. 154-171. [2] Arbeláez, P., Pont-Tuset, J., Barron, J. T., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328-335. [3] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587. [4] Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9), pp, 1627-1645. [5] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1440-1448. [6] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91-99. [7] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision, pp. 21-37. [8] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788. [9] Girshick, R., Iandola, F., Darrell, T., & Malik, J. (2015). Deformable part models are convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 437-446). [10] Lu, C., Lu, Y., Chen, H., & Tang, C. K. (2015). Square Localization for Efficient and Accurate Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2560-2568). [11] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2), 303-338. [12] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2016). Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144. [13] Shrivastava, A., Gupta, A., & Girshick, R. (2016). Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 761-769). [14] Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems (pp. 379-387). [15] Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware cnn model. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1134-1142). [16] Chen, C., Liu, M. Y., Tuzel, O., & Xiao, J. (2016, November). R-cnn for small object detection. In Asian Conference on Computer Vision (pp. 214-230). Springer, Cham. [17] Kim, K. H., Hong, S., Roh, B., Cheon, Y., & Park, M. (2016). PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. arXiv preprint arXiv:1608.08021. [18] Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Improving Object Detection With One Line of Code. arXiv preprint arXiv:1704.04503. [19] Redmon, J., & Farhadi, A. (2016). YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242. Image Classification [20] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097-1105. [21] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. [22] Harris, C., & Stephens, M. (1988, August). A combined corner and edge detector. In Alvey vision conference, Vol. 15, No. 50, pp. 10-5244. [23] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE. [24] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034). [25] Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6), 141-142. [26] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [27] Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. Neural Network [28] Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. arXiv preprint arXiv:1302.4389. [29] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In AAAI (pp. 4278-4284). [30] Veit, A., Wilber, M. J., & Belongie, S. (2016). Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems (pp. 550-558). Class Notes [31] “Digital Visual Effects” class notes by Yung-Yu Chuang, http://www.csie.ntu.edu.tw/~cyy/courses/vfx/14spring/lectures/handouts/lec06_feature.pdf [32] “Machine Learning and having it deep and structured” class notes by Hung-Yi Lee, http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Deep%20More%20(v2).pdf
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68543	-
dc.description.abstract	近幾年，電腦視覺(computer vision)領域大量地使用卷積類神經網路(CNNs)。在這份論文中，我們會以CNNs當作物體偵測(Object Detection)的基礎方法。我們會先從單次多框偵測器(Single Shot Multibox Detector, SSD)當作方法的基礎。我們在SSD上加上特徵金字塔網路(Feature Pyramid Networks, FPN)，讓每一個位置都有全域資訊與地域的資訊。我們也在後處裡(postprocessing)的時候，增加了圍框投票(bounding box voting)的方法，來獲得更好的定位效果。在實驗當中，我們主要使用的資料庫為Pascal VOC 2007 test。而在物體偵測的領域中，我們使用平均準確度(Average Precision, AP)來當作衡量的標準，我們會平均每個類別的AP，得到mean AP(mAP)。在原始的SSD中，我們可以得到77.21% mAP的結果我們在論文的實驗中，比較我們改進過後的方法與原始的SSD以及其他的物體偵測架構。結果顯示，最終我們可以得到77.85% mAP的結果。我們的方法獲得了更好的偵測成果。	zh_TW
dc.description.abstract	In recent years, Convolutional Neural Networks(CNNs) have gained a lot of popularity in computer vision. In this work, we will use convolutional neural networks for object detection. To start with, we use Single Shot Multibox Detector(SSD) [7] as our basic framework. We impose Feature Pyramid Networks on SSD to combine local and global information. We also adjust postprocessing with bounding box voting for better localization. For comparison, we test our model on Pascal VOC 2007 test dataset. During evaluation, we calculate Average Precision(AP) for each model and class. Then, we average each AP to get mean Average Precision(mAP) as our final evaluation metric. With original SSD, we can have 77.21% mAP in Pascal VOC 2007 test dataset. In this thesis, our simulation results show that, the proposed method outperforms the original SSD and has better performance for object detection. Our final model can achieve 77.85% mAP.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:24:42Z (GMT). No. of bitstreams: 1 ntu-106-R04942127-1.pdf: 3796035 bytes, checksum: 2765fd7cc6d5d51c0590d7b0048335d1 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vii LIST OF TABLES ix Chapter 1 Introduction 1 1.1 Background 1 1.2 Organization 2 Chapter 2 Conventional Feature Based Methods 4 2.1 Feature Extraction 4 2.1.1 Harris Corner Detector [22] 4 2.1.2 Histogram of Oriented Gradients(HOG) 5 2.2 Support Vector Machine (SVM) 6 2.2.1 Latent SVM 7 2.3 Deformable Part Model (DPM) 7 Chapter 3 Neural Network 10 3.1 Neural Network 10 3.1.1 Neuron 10 3.1.2 Deep Neural Network 13 3.1.3 Gradient Descent 14 3.1.4 Advanced Structures for Neural Network 17 3.2 Convolutional Neural Network (CNN) 19 3.2.1 Convolutional Kernel 20 3.2.2 Fully-Connected Layer 20 3.2.3 Pooling 21 Chapter 4 Existing CNN Methods 22 4.1 Intersection over Union(IoU) 22 4.2 Evaluation-Mean Average Precision(mAP) 23 4.3 From R-CNN to Faster R-CNN 24 4.3.1 R-CNN[3] 24 4.3.2 Fast R-CNN[5] 25 4.3.3 Faster R-CNN[6] 26 4.4 Single Shot Multibox Detector[7] 28 4.4.1 Feature Extraction(VGG-16) 29 4.4.2 Multi Scale Feature Maps 29 4.4.3 Prior boxes 29 4.4.4 Classification 30 4.4.5 Localization 31 4.4.6 Training 32 Chapter 5 Observation and Experiment 34 5.1 Dataset 34 5.2 Less Anchors 35 5.3 Add more layers to high resolution feature maps 36 5.4 Feature Pyramid Network(FPN) 38 5.4.1 FPN structure 38 5.4.2 FPN on SSD 39 5.5 OHEM (Online Hard Example Mining)[13]+ FPN + SSD 41 5.6 Multi-Path 42 5.6.1 Multiple sizes ROIs 43 5.6.2 Some works use multi-path 44 5.6.3 Two-way structure of SSD. 47 From above observation, we think that the optimal point of localization and classification may be different. As a result, we try to use separate feature extraction models for localization classification, as shown in Fig 5 10. 47 5.7 Non-Maximum Suppression(NMS) 49 5.7.1 Hard NMS 49 5.7.2 Soft NMS [18] 50 5.7.3 Bounding Box Voting[15] 51 5.7.4 Result of various kinds of NMS technique. 52 Chapter 6 Proposed Method 54 Chapter 7 Future Work 57 7.1 Speed 57 7.2 Relation between Bounding Box voting and FPN 57 7.3 Soft-NMS 57 7.4 Better Regularization 58 Chapter 8 Conclusion 59 REFERENCE 60
dc.language.iso	en
dc.title	改進單次多框偵測器架構與後處理	zh_TW
dc.title	Improving Single Shot Multibox Detector with Feature Pyramid Networks Structure and Postprocessing for Better Object Detection Performance	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	郭景明,王鵬華,夏至賢
dc.subject.keyword	物體偵測,卷積類神經網路,單次多框偵測器,	zh_TW
dc.subject.keyword	Object Detection,Convolutional Neural Networks,MultiBox Detector,	en
dc.relation.page	63
dc.identifier.doi	10.6342/NTU201704027
dc.rights.note	有償授權
dc.date.accepted	2017-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	3.71 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。