基於深度學習以及幾何資訊進行單相機影像目標偵測與空間定位追蹤

陳佳菱; Chia-Ling Chen

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96581

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	韓仁毓	zh_TW
dc.contributor.advisor	Jen-Yu Han	en
dc.contributor.author	陳佳菱	zh_TW
dc.contributor.author	Chia-Ling Chen	en
dc.date.accessioned	2025-02-19T16:37:31Z	-
dc.date.available	2025-02-20	-
dc.date.copyright	2025-02-19	-
dc.date.issued	2024	-
dc.date.submitted	2025-01-16	-
dc.identifier.citation	AFFES, N., KTARI, J., BEN AMOR, N., FRIKHA, T., and HAMAM, H., (2023). Comparison of YOLOV5, YOLOV6, YOLOV7 and YOLOV8 for Intelligent Video Surveillance. Journal of Information Assurance and Security,18(5). Agrawal, T., Kirkpatrick, C., Imran, K., Figus, M.,(2020). Automatically Detecting Personal Protective Equipment on Persons in Images Using Amazon Rekognition. Amazon. Alhassan Gamani, A. R., Arhin, I., and Kyeremateng Asamoah, A., (2024). Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment. arXiv e-prints, arXiv-2408. Akcay, S.,Kundegorski, M.E., Willcocks, C.G., Breckon, T.P., (2018). Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery. IEEE Trans. Inf. Forensic Secur. 13, 2203–2215. Bodla, N., Singh, B., Chellappa, R., and Davis, L. S. (2017)., Soft-NMS: Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 5561-5569. Bianchini, M., Simic, M., Ghosh, A., Shaw, R. N., (2022) In Machine Learning for Robotics Applications. Springer Verlag, Singapore. Bochkovskiy, A., Wang, C. Y., and Liao, H. Y. M., (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. Chen, L., Lin, Z., and Zhang, Z., (2018). Low-light image enhancement for deep learning object detection and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Cao, Z., Simon, T., Wei, S. E., and Sheikh, Y., (2017). Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7291-7299. Chen, G. T., (2022). Human body geometry extraction and motion analysis based on surveillance camera images, Master Thesis, National Taiwan University, Taipei, Taiwan. Chuang, T. Y., Han, J. Y., Jhan, D. J., and Yang, M. D., (2020). Geometric recognition of pedestrians in monocular rotating imagery using faster R-CNN. Remote Sensing, 12(12), 1908. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6), 679-698. Dalal, N., andTriggs, B., (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Deng, J., Xuan, X., Wang, W., and al.,(2020). A Review of Research on Object Detection Based on Deep Learning. Journal of Physics: Conference Series 1684: 012028. Forsyth, D. A., and Ponce, J., (2002). Computer vision: a modern approach. prentice hall professional technical reference. Gonzalez, R. C., & WOODS 3rd, R. E., (2008). Edition. Digital Image Processing. Upper Saddle River, USA: Prentice Hall. Girshick, R., (2015)., Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December, 1440–1448. Golub, G. H., and Van Loan, C. F. (2013). Matrix computations. JHU press. Huang, Z., Wang, Z., Huang, L., Huang, C., Wei, Y., and Liu, W., (2019). DCSP: Densely connected skip paths for crowd counting by contextual patch-based detection. IEEE Transactions on Image Processing, 28(3), 1257-1268. He, K., Gkioxari, G., Dollár, P., and Girshick, R., (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, 2961-2969. He, K., Zhang, X., Ren, S., and Sun, J., (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hermens, F., (2024). Automatic object detection for behavioural research using YOLOv8. Behavior Research Methods, 1-24. Hu, W.C.; Chen, C.H.; Chen, T.Y.; Huang, D.Y.; Wu, Z.C., (2015). Pedestrian detection and tracking from video captured by moving camera. Journal of Visual Communication and Image Representation,30,164–180. Jhan, D. J., (2017). Positioning and Tracking of Pedestrians Using Image Sequence from a Single Camera, Master Thesis, National Taiwan University, Taipei, Taiwan. Jocher, G., Chaurasia, A., Qiu, J., andUltralytics Team. (2021). YOLO by Ultralytics. Jain, A. K. (1989). Fundamentals of digital image processing. Prentice-Hall, Inc.. Kim, J. W., Choi, J. Y., Ha, E. J., and Choi, J. H., (2023). Human pose estimation using mediapipe pose and optimization method based on a humanoid model. Applied Sciences, 13(4), 2700. Kneip, L., Chli, M., Siegwart, R., and Siegwart, R. Y., (2011). Robust real-time visual odometry with a single camera and an IMU. British Machine Vision Conference, Scotland, UK. Liu, Y., Shen, F., and van den Hengel, A., (2019). Context-aware crowd counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 539-555. Lee, D. S., Kim, J. S., Jeong, S. C., and Kwon, S. K., (2020). Human height estimation by color deep learning and depth 3D conversion. Applied Sciences, 10(16), 5531. LINDER, Wilfried., (2009). Digital photogrammetry. Berlin/Heidelberg. Germany: Springer. Martinez-Martin, E., del Pobil, A. P., (2017). Object Detection and Recognition for Assistive Robots: Experimentation and Implementation. IEEE Robotics and Automation Magazine 24: 123–138. Migliore, D., Rigamonti, R., Marzorati, D., Matteucci, M., and Sorrenti, D. G., (2009). Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. International Workshop on Safe Navigation in Open and Dynamic Environments Application to Autonomous Vehicles, Kobe, Japan, 12-17. Mikhail, E. M., and Ackermann, F., (1976). Observations and Least Squares. New York: IEP-A Dun-Donnelley. Mikhail, E. M., and Gracie, G., (1981). Analysis and adjustment of survey measurements. Van Nostrand Reinhold. New York. Mikhail, E. M. (2001). Introduction to Modern Photogrammetry. John Williey& sons. Ohashi, T., Ikegami, Y., and Nakamura, Y., (2020). Synergetic reconstruction from 2D pose and 3D motion for wide-space multi-person video motion capture in the wild. Image and Vision Computing, 104, 104028. Park, S., Hwang, J., and Kwak, N., (2016). 3d human pose estimation using convolutional neural networks with 2d pose information. In European Conference on Computer Vision, Springer International Publishing, 156-169. Redmon, J., andFarhadi, A., (2016). YOLOv3: An Incremental Improvement. In arXiv preprint arXiv:1804.02767. Redmon, J., Divvala, S., Girshick, R., andFarhadi, A., (2016). You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 779-788. RangeKing, Jocher, G., (2023). Brief summary of YOLOv8 model structure. GitHub Issue. Retrieved from https://github.com/ultralytics/ultralytics/issues/189. Rasouli, A., Tsotsos, J. K., (2019). Autonomous Vehicles That Interact with Pedestrians: A Survey of Theory and Practice. IEEE Transactions on Intelligent Transportation Systems, 21, 900–918. Rogez, G., Weinzaepfel, P., and Schmid, C., (2017). Lcr-net: Localization-classification-regression for human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3433-3441. Ren, S.; He, K.; Girshick, R.; Sun, J., (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. "In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 91–99. Redmon, J.; Farhadi, A., (2018). YOLOv3: An incremental improvement. arXiv, arXiv:1804.02767 . Schindler, K. (2015). Mathematical foundations of photogrammetry. In Handbook of geomathematics, Springer, 3087-3103. Sambolek, S., andIvasic-Kos, M. (2024). Person Detection and Geolocation Estimation in UAV Aerial Images: An Experimental Approach. InICPRAM, 785-792. Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., and Fua, P., (2016). Structured prediction of 3d human pose with deep neural networks. arXiv preprint, arXiv:1605.05180. Tomono, M., (2005). 3-D localization and mapping using a single camera based on structure-from-motion with automatic baseline selection. IEEE International Conference on Robotics and Automation, Barcelona, Spain. Wang, C. Y., Bochkovskiy, A., and Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,7464-7475. Wolf, P. R., Dewitt, B. A., and Wilkinson, B. E., (2000). Elements of Photogrammetry with Applications in GIS. McGraw-Hill Education. Wright, S. J. (2006). Numerical optimization. Yang, M.D., Chao, C.F.,Lu, L.Y., Huang, K.S., Chen, Y.P., (2013)., Image-based 3D scene reconstruction and exploration in augmented reality. Autom. Constr. 3, 48–60. Yang, M.D., Tseng, H.H.,Hsu, Y.C., Tsai, H.P., (2020).,Semantic Segmentation Using Deep Learning with Vegetation Indices for Rice Lodging Identification in Multi-date UAV Visible Images. Remote Sens. 12, 633. Zhao, Z. Q. Zheng. P., Xu, ST and Wu, X., (2019). Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11), 3212-3232. Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y., (2016)., Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 589-597. Zoph, B., Le, Q.V., (2017)., Neural architecture search with reinforcement learning. In Proceedings of the 2017 International Conference on Learning Representations, Toulon, France, 24–26 April. Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M. H., and Shao, L., (2022). Learning enriched features for fast image restoration and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1934-1948. Zheng, C., Wu, W., Chen, C., Yang, T., Zhu, S., Shen, J., and Shah, M., (2023). Deep learning-based human pose estimation: A survey. ACM Computing Surveys, 56(1), 1-37. Zou, F. Y., (2010). Analysis of Close-range Photogrammetry by Using Non-metric Camera, Master Thesis, National Chiao Tung University, Hsinchu, Taiwan. https://docs.ultralytics.com/models/yolov8/#performance-metrics https://github.com/ultralytics/ultralytics https://google.github.io/mediapipe/solutions/pose.html Ultralytics Pose Estimation models	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96581	-
dc.description.abstract	近年來，監控攝影機已經隨處可見，但傳統的監控系統受限於固定的位置、影像解析度以及安裝條件，難以利用立體影像進行三維物體坐標的檢測。本研究針對此問題，採用單一攝影機捕捉的動態影像序列，結合深度學習模型Yolov8-pose進行大規模影像數據處理，於兩個室內環境中進行驗證行人自動檢測及追蹤，並通過攝影測量的共線性方程計算三維幾何信息。結果顯示，在使用R5 5600X CPU及3070顯示卡的條件下，每幀影像處理的時間大約為1秒，行人身高誤差控制在±3公分以內，相機外方位參數pitch值誤差約為1度。此外，研究加入低頭角度偵測和上衣顏色的過濾條件，RMSE由15.8mm提升至13.3mm，提升系統定位精度。本研究方法顯著降低了計算時間和人力成本，為即時行人定位與追蹤提供了一種高效且低成本的替代方案，適用於室內監控場景並增加行人追蹤特徵指標，可延伸運用於提升室內場域管理效率及縮短行人意外告警時間。	zh_TW
dc.description.abstract	In recent years, surveillance cameras have become ubiquitous, yet surveillance systems are typically fixed in specific positions and are constrained by the image resolution and installation location, making it difficult to utilize stereo images for three-dimensional object coordinate detection. This study was validated in two indoor environments, utilizing dynamic sequence images captured by a single camera. By combining YOLOv8-pose deep learning model for processing large volumes of image data, the system automatically detects and tracks pedestrians, and calculates three-dimensional geometric information based on photogrammetric collinearity equations. The results indicate that using an R5 5600X CPU and a 3070 GPU, the processing time per image frame was approximately 1 second, with pedestrian height errors controlled within ±3 cm and the pitch error of the camera's exterior orientation parameters being around 1 degree. Additionally, this study incorporates head-down angle detection and clothing color filtering, improving the root mean square error from 15.8mm to 13.3mm, thereby enhancing the system's localization accuracy. This approach significantly reduces computational time and labor costs, providing an efficient and low-cost solution for real-time pedestrian localization and tracking. It is particularly suitable for indoor surveillance scenarios, enabling improved indoor management efficiency, reduced pedestrian accident alert times, and enhanced pedestrian tracking feature indicators for practical applications.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-19T16:37:31Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-19T16:37:31Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	謝辭 i 中文摘要 ii Abstract iii Chapter1. Introduction 1 1.1 Overview 1 1.2 Motivation and purpose 2 1.3 Thesis outline 3 Chapter 2. Literature Review 5 2.1 Localization and tracking of pedestrians using single-camera images 5 2.2 Inferring 3D pose information based on 2D pose data and deep learning 6 2.3 Detecting human skeleton and height using deep learning 7 2.4 Human detection and analyzing 3D human pose using deep learning 9 2.5 Summary 16 Chapter 3. Methodology 19 3.1 Calibration of camera internal orientation parameters 20 3.2 Calculation of external orientation parameters based on collinearity equations and adjustment 22 3.3 Testing and analysis of deep learning models 26 3.3.1 Mask R-CNN testing and analysis 27 3.3.2 Testing image brightness enhancement with MIRNet model 28 3.3.3 Detecting humans using YOLOv8-pose deep learning method 28 3.3.4 Accuracy evaluation metrics for human detection by YOLOv8-pose model 31 3.4 Location tracking from single-camera images 33 3.4.1 Calculation of human geometric information 34 3.4.2 Incorporating head-down angle for specific pedestrian location tracking 39 3.4.3 Incorporating clothing color for specific pedestrian location tracking 42 3.4.4 Quality assessment for localization 44 3.4.5 Summary 45 Chapter 4. Numerical Results and Analysis 47 4.1 Camera specifications and research sites 47 4.2 Calibration results of camera internal orientation parameters 48 4.3 Calculation results of camera external orientation parameters 50 4.4 Detecting humans using YOLOv8-pose models and extracting image coordinates 52 4.4.1 Analysis of human detection results 53 4.4.2 Real-world environment testing 59 4.4.3 Analysis of image coordinate extraction results 64 4.4.4 Analysis of the relationship between camera configuration and localization accuracy 69 4.5 Testing the processing time of YOLOv8-pose models on a laptop 70 4.6 Human location tracking 72 4.6.1 Analysis of localization results without incorporating human pose 72 4.6.2 Analysis of localization results incorporating head-down angle 75 4.6.3 Analysis of localization results incorporating clothing color 79 4.6.4 Integrated analysis of localization results incorporating head-down angle and clothing color 85 4.7 Discussion 88 Chapter 5. Conclusion and Future Work 91 5.1 Conclusion 91 5.2 Future work 93 References 95 Appendix 104	-
dc.language.iso	en	-
dc.subject	深度學習	zh_TW
dc.subject	行人追蹤特徵指標	zh_TW
dc.subject	單相機定位	zh_TW
dc.subject	YOLOv8-pose	zh_TW
dc.subject	即時行人偵測	zh_TW
dc.subject	Pedestrian tracking feature indicators	en
dc.subject	Real-time Pedestrian Detection	en
dc.subject	YOLOv8-pose	en
dc.subject	Location Tracking From Single-Camera Images	en
dc.subject	Deep Learning	en
dc.title	基於深度學習以及幾何資訊進行單相機影像目標偵測與空間定位追蹤	zh_TW
dc.title	Object Detection and Location Tracking From Single-Camera Images Based on Deep Learning and Geometric Information	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	高書屏;甯方璽;郭重言;蘇文瑞;蔡亞倫	zh_TW
dc.contributor.oralexamcommittee	Shu-Ping Kao;Fang-Si Ning;Chong-Yan Kuo;Wun-Ruei Su;ya-luns tsai	en
dc.subject.keyword	即時行人偵測,YOLOv8-pose,單相機定位,深度學習,行人追蹤特徵指標,	zh_TW
dc.subject.keyword	Real-time Pedestrian Detection,YOLOv8-pose,Location Tracking From Single-Camera Images,Deep Learning,Pedestrian tracking feature indicators,	en
dc.relation.page	123	-
dc.identifier.doi	10.6342/NTU202500115	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-01-16	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	土木工程學系	-
dc.date.embargo-lift	2026-01-14	-
Appears in Collections:	土木工程學系

Files in This Item:

File	Size	Format
ntu-113-1.pdf Restricted Access	25.98 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets