基於改良式YOLO及亮度分佈分析之應用於夜間行車記錄器的車輛影像辨識

鞠之浩; Chih-Hao Chu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91467

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均	zh_TW
dc.contributor.advisor	Jian-Jiun Ding	en
dc.contributor.author	鞠之浩	zh_TW
dc.contributor.author	Chih-Hao Chu	en
dc.date.accessioned	2024-01-26T16:38:01Z	-
dc.date.available	2024-01-27	-
dc.date.copyright	2024-01-26	-
dc.date.issued	2024	-
dc.date.submitted	2024-01-16	-
dc.identifier.citation	[1] R. Basri, "Recognition by Prototypes", in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 161–167, doi: 10.1109/CVPR.1993.340994, 1993. [2] Ministry of Health and Welfare," 111 年死因統計結果分析", pages10-11, 2022. [3] N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian Network Classifiers", Machine Learning 29, pages 131–163, doi:10.1023/A:1007465528199, 1997. [4] P. Viola, M. Jones, "Rapid object detection using a boosted cascade of simple features", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), doi:10.1109/CVPR.2001.990517, 2001. [5] K. O’Shea, and R. Nash, "An Introduction to Convolutional Neural Networks", doi:10.48550/arXiv.1511.08458, 2015. [6] Z. Li, W. Yang, S. Peng, and F. Liu, "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects", in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pages 6999-7019, doi:10.1109/TNNLS.2021.3084827, 2022. [7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", in Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580-587, doi:10.48550/arXiv.1311.2524, 2014. [8] R. Girshick, "Fast R-CNN", in 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440-1448, doi: 10.1109/ICCV.2015.169 , 2015. [9] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pages 1137-1149, doi: 10.1109/TPAMI.2016.2577031, 2017. [10] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN", in 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980-2988, doi: 10.1109/ICCV.2017.322, 2017. [11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection", in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779-788, doi: 10.1109/CVPR.2016.91, 2016. [12] C.Y. Wang, A. Bochkovskiy, and H.Y.M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7464-7475, doi: 10.48550/arXiv.2207.02696, 2022. [13] K. Horak, and R. Sablatnig, "Deep learning concepts and datasets for image recognition: overview 2019," in Eleventh international conference on digital image processing (ICDIP 2019), vol. 11179, pages 484-491, doi: 10.1117/12.2539806, 2019. [14] S. E. Hejres, and M. Hammad, "Image Recognition System Using Neural Network Techniques: an Overview," in 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), pages 706-713, doi: 10.1109/3ICT53449.2021.9581829, 2021. [15] M. A. Rouf, Q.Wu, X. Yu, Y. Iwahori, H. Wu, and A. Wang, "Real-time Vehicle Detection, Tracking and Counting System Based on YOLOv7," in Embedded Selforganising Systems, vol. 10, no. 7, pages 4-8, doi: 10.14464/ess.v10i7.598, 2023. [16] W. Liang, J. Long, K. C. Li, J. Xu, N. Ma, and X.Lei, "A fast defogging image recognition algorithm based on bilateral hybrid filtering, " in ACM transactions on multimedia computing, communications, and applications (TOMM), vol. 17(2), no.42, pages 1-16, doi: 10.1145/3391297, 2021. [17] B. Bhanu, and J. Peng, "Adaptive integrated image segmentation and object recognition," in IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, no. 4, pages 427-441, doi: 10.1109/5326.897070, 2000. [18] X. Jiang, "Feature extraction for image recognition and computer vision," in 2009 2nd IEEE International Conference on Computer Science and Information Technology, pages 1-15, doi: 10.1109/ICCSIT.2009.5235014, 2009. [19] C. Janiesch, P. Zschech, and K. Heinrich, "Machine learning and deep learning, " in Electron Markets 31, pages 685–695, doi: 10.1007/ arXiv:2104.05314, 2021. [20] R. Rojas, and R. Rojas, "The backpropagation algorithm," in Neural networks: a systematic introduction, pages 149-182, doi: 10.1007/978-3-642-61068-4_7, 1996. [21] I.H. Sarker, "Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions," in SN Computer Science 2.6, no. 420, doi:10.1007/s42979-021-00815-1, 2021. [22] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pages 2278-2324, doi: 10.1109/5.726791, 1998. [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Advances in neural information processing systems 25, doi: 10.1145/3065386, 2012. [24] R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit," in Nature 405, pages 947–951, doi: 10.1038/35016072, 2000. [25] M. A. Ranzato, Y. L. Boureau, and Y. Cun, "Sparse feature learning for deep belief networks, " in Advances in neural information processing systems 20, 2007. [26] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," doi: 10.48550/arXiv.1207.0580, 2012. [27] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting, " in The journal of machine learning research, vol. 15, pages 1929-1958, doi: 10.5555/2627435.2670313, 2014. [28] J. Bridle, "Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters," in Advances in neural information processing systems 2 , pages 211-217, doi: 10.5555/109230.109256, 1989. [29] H. Robbins, and S. Monro, "A stochastic approximation method," in The Annals of Mathematical Statistics, vol. 22, no. 3, pages 400-407, doi: 10.1214/aoms/1177729586, 1951. [30] J. Kiefer, and J. Wolfowitz. "Stochastic Estimation of the Maximum of a Regression Function," in The Annals of Mathematical Statistics, vol. 23, no. 3, pages 462–66, doi: 10.1214/aoms/1177729392, 1952. [31] S. Ruder, "An overview of gradient descent optimization algorithms," in arXiv preprint, doi: 10.48550/arXiv.1609.04747, 2016. [32] V. N. Vapnik, "The Nature of Statistical Learning Theory," 1995. [33] K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in arXiv preprint, doi: 10.48550/arXiv.1409.1556, 2014. [34] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936-944, doi: 10.1109/CVPR.2017.106, 2017. [35] Lixuan Du et al, "Overview of two-stage object detection algorithms," in 2020 J. Phys.: Conf. Ser. 1544 012033, doi: 10.1088/1742-6596/1544/1/012033, 2020. [36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1-9, doi: 10.48550/arXiv.1409.4842, 2015. [37] J. Redmon, and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517-6525, doi: 10.1109/CVPR.2017.690, 2017. [38] J. Redmon, and A. Farhadi, "YOLOv3: An Incremental Improvement," in arXiv preprint, doi: 10.48550/arXiv.1804.02767, 2018. [39] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," in arXiv preprint, doi: 10.48550/arXiv.2004.10934, 2020. [40] C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13024-13033, doi: 10.1109/CVPR46437.2021.01283, 2021. [41] Ultralytics Github: Yolov5. https://github.com/ultralytics/yolov5. [42] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L.Li, et al, "YOLOv6: A single-stage object detection framework for industrial applications," in arXiv preprint, doi: 10.48550/arXiv.2209.02976, 2022. [43] C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1571-1580, doi: 10.1109/CVPRW50498.2020.00203, 2020. [44] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759-8768, doi: 10.1109/CVPR.2018.00913, 2018. [45] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pages 1904-1916, doi: 10.1109/TPAMI.2015.2389824, 2015. [46] Z. Zhang, T. He, H. Zhang, Z. Zhang, J. Xie, and M. Li, "Bag of freebies for training object detection neural networks," doi: 10.48550/arXiv.1902.04103, 2019. [47] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe, "CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6022-6031, doi: 10.1109/ICCV.2019.00612, 2019. [48] G. Ghiasi, T. Y. Lin, and Q. V. Le, "Dropblock: A regularization method for convolutional networks, " in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18), doi: 10.48550/arXiv.1810.12890, 2018. [49] R. Müller, S. Kornblith, and G. E. Hinton, "When does label smoothing help?" in Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 4694–4703, doi: 10.48550/arXiv.1906.02629, 2019. [50] Z. Yao, Y. Cao, S. Zheng, G. Huang, and S. Lin, "Cross-iteration batch normalization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12331-12340, doi.: 10.48550/arXiv.2002.05712, 2021. [51] C. Y. Wang, H. Y. M. Liao, and I. H. Yeh, "Designing network design strategies through gradient path analysis, " in arXiv preprint, doi: 10.48550/arXiv:2211.04800, 2022. [52] Y. Lee, J. W. Hwang, S. Lee, Y. Bae, and J. Park, "An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 752-760, doi: 10.1109/CVPRW.2019.00103, 2019. [53] C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, "You only learn one representation: Unified network for multiple tasks, " in arXiv preprint, doi: 10.48550/arXiv.2105.04206, 2021. [54] A. Tarvainen, and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," in Advances in neural information processing systems 30 (NIPS 2017), doi: 10.48550/arXiv.1703.01780, 2017. [55] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al, "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, pages 740-755, doi: 10.48550/arXiv.1405.0312, 2014. [56] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, "Bdd100k: A diverse driving dataset for heterogeneous multitask learning," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636-2645, doi: 10.48550/arXiv.1805.04687, 2020. [57] Tzutalin, "LabelImg," Git code https://github.com/tzutalin/labelImg, 2015. [58] D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," doi: 10.48550/arXiv.2010.16061, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91467	-
dc.description.abstract	行車記錄器對駕駛人來說是非常重要的一項工具，在交通事故調查及證明上扮演著重要的角色，除此之外，隨著深度學習的快速發展，未來也可將行車記錄器進而利用成協助駕駛偵測前車狀況的工具，因此在本論文中，我們聚焦於夜間行車記錄器所拍攝的影像，提出了基於YOLOv7演算法及影像像素分布分析應用的前車辨識方法。首先為了穩定YOLOv7夜間車輛辨識的準確性及穩定性，預先利用YOLOv7訓練好的權重再針對夜間的行車記錄器影像中的車輛進一步訓練，我們額外收集10組訓練資料，每組資料包含100張training及10張validation的夜間車輛影像進行訓練。訓練完後利用新生成的YOLOv7權重進行夜間車輛影像辨識後，為了進一步提高辨識的精確性及穩定性，我們採用三種影像處理方法進行後處理，判斷偵測到的車輛是否為前車。本論文使用三種影像像素分布分析方法進行分析，淘汰我們認為YOLOv7偵測到的不是前車的bounding box，而提高模型的效果。第一個影像像素分布分析方法是分析bounding box中高於門檻值的像素分布，因像素越亮其強度值越高，我們認為前車被偵測到的bounding box其像素強度值大於門檻時的分佈形狀，其寬與高的比例必在一定的門檻之上，若低於門檻代表其bounding box屬狹長型，則不符合真正的車輛的長寬比。第二個方法是分析bounding box中像素強度值低於門檻時的數量比例，應介於上下限比例門檻之中，若YOLOv7偵測到的前車bounding box其所有像素強度值低於門檻值的數量比例不介於上下限之門檻比例時，可能屬於誤判或路邊未行駛低亮度的車輛或是由對向而來的高亮度車輛，因此將其淘汰。首先因行進間的車輛其後燈一定有亮度，若亮度過暗則不符合行進間車輛的條件，可能屬於誤判或為路邊未發動行駛之車輛，因此低亮度的像素佔比高於門檻時將其淘汰。另因對向來車因頭燈亮的關係，將整個bounding box的亮度皆提高許多，其低亮度像素佔比低於門檻的比例則非常低，而本篇論文重點在於提升夜間行進車輛辨識結果，因此停駛在路邊或是對向車道的車輛皆不符合本論文目的，因此皆需排除。最後第三種方法是綜合兩個分析方法與原始YOLOv7進行比較，根據我們所使用的方法中，前兩種方法一起使用的結果是表現最好的。我們的研究不只可以增加行車紀錄器的用途，也可以運用在除了車禍以外的行車安全上。之後我們會再更精進提出的方法，並且運用在更多的場景上，像是夜間車輛追蹤或是測速等等的環境。	zh_TW
dc.description.abstract	The driving recorder plays an important role in the process of accident investigation, providing invaluable evidence and insights into the sequence of events. Furthermore, the driving recorder has also evolved into a multifunctional technological tool with the boom of deep learning. This advancement has created the features for safety on it, one of them can provide timely warnings to the driver about potential dangers may be caused by cars in front. In this thesis, we focus on nighttime vehicle recognition on videos captured by driving recorders at night and propose three methods for identifying cars in front, by improving YOLOv7, along with two ways of intensity analysis based on image intensity distribution. To enhance the precise and stable nighttime vehicle recognition using YOLOv7, we first fine tune the YOLOv7 pretrained weights specifically for images captured by driving recorders at night. We collected 10 additional sets of training data, consisting of 100 training images and 10 validation images. We try to make the model is able to identify vehicles even in the night scenarios. After fine tuning the YOLOv7 model, we use the newly generated weights for nighttime vehicle recognition. To further improve the precision and accuracy of recognition, we propose three post-processing methods to determine whether the detected bounding boxes are cars in front of us. In this thesis, we take three methods of image pixel distribution analysis to identify and eliminate the detected bounding boxes that YOLOv7 with new weights has misidentified. Due to pixels become brighter, their intensity values increase. The first method is to analyze the pixel distribution in bounding boxes that exceed a specified threshold. We keep them and consider that the shape of those pixels in bounding boxes which the width-height ratio must larger than a certain threshold. In normal circumstances, the ratio should align with the expected ratios of a standard vehicle. If the width-height ratio falls below the threshold, the detected bounding box is nearly narrow, and it is abnormal for a car’s expected ratio. The second method is to analyze the percentage of pixels with intensity values below the threshold within the bounding box, which should fall within the upper and lower bound thresholds. If the percentage of pixels with intensity values below the threshold does not fall within the upper and lower bound thresholds, it may be a false positive such as a vehicle is parked on the roadside with low intensity or a vehicle with high intensity coming from the opposite direction. Therefore, we eliminate them. On the one hand, vehicles moving on the road must have bright rear lights. If the brightness is too low, it does not match the criteria for vehicles moving in front, potentially indicating a false positive like a car is parked on the roadside. Therefore, when the low intensity values percentage exceeds the upper bound threshold, we eliminate it. On the other hand, due to the headlights from oncoming vehicles on the opposite direction, the intensity values of the entire bounding box are significantly increased. The percentage of low intensity values may fall below the lower threshold. For this thesis we focus on improving nighttime recognition of moving vehicles in front, the vehicles parked on the roadside or the oncoming vehicles on the opposite direction should not be detected, so we excluded them. By combining these two methods we proposed with the robust capabilities of the YOLOv7 algorithm. The best performance can be achieved using the combined method. The proposed methods will not only improve the capabilities of driving recorder systems but also show the potential to enhance the road safety. In the future, we will further optimize the model’s performance and explore more application scenarios, such as vehicle tracking at nighttime and speed monitoring in driving environments.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-01-26T16:38:01Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-01-26T16:38:01Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iv CONTENTS vi LIST OF FIGURES viii LIST OF TABLES x Chapter 1 Introduction 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 2 Chapter 2 Related Work 4 2.1 OVERVIEW OF DEEP LEARNING AND IMAGE RECOGNITION 4 2.2 OBJECT DETECTION AND YOLO 10 2.3 EVALUATION METHOD 17 Chapter 3 YOLOv7 retrained 20 3.1 YOLOV7 20 3.2 DATA SOURCE AND RETRAINING 22 3.3 RETRAINING YOLOV7 25 Chapter 4 Proposed Method 28 4.1 WIDTH-HEIGHT RATIO METHOD 32 4.2 PERCENTAGE OF LOWER INTENSITY PIXELS METHOD 35 4.3 COMBINED METHOD 39 Chapter 5 Experiment 41 5.1 COMPARED WITH YOLOV7 WITH NEW WEIGHTS 41 5.2 COMPARED WITH THE FIRST METHOD – WIDTH-HEIGHT RATIO 44 5.3 COMPARED WITH THE SECOND METHOD – PERCENTAGE OF LOWER INTENSITY PIXELS 48 5.4 COMPARED WITH THE THIRD METHOD – COMBINED METHOD 56 Chapter 6 Conclusion and Future Works 62 6.1 CONCLUSION 62 6.2 FUTURE WORKS 63 REFERENCE 65	-
dc.language.iso	en	-
dc.title	基於改良式YOLO及亮度分佈分析之應用於夜間行車記錄器的車輛影像辨識	zh_TW
dc.title	Improved YOLO Algorithm Based on Intensity Distribution Analysis for Nighttime Vehicle Recognition in Driving Recorder	en
dc.type	Thesis	-
dc.date.schoolyear	112-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	簡鳳村;許文良;曾易聰	zh_TW
dc.contributor.oralexamcommittee	Feng-Tsun Chien;Wen-Liang Hsue;Yi-Chong Zeng	en
dc.subject.keyword	行車記錄器,夜間車輛辨識,後處理,亮度分佈分析,	zh_TW
dc.subject.keyword	Driving Recorder,Nighttime Vehicle Recognition,YOLOv7,Post-processing,Intensity Distribution Analysis,	en
dc.relation.page	71	-
dc.identifier.doi	10.6342/NTU202400087	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-01-17	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	3.45 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。