Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 土木工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96581
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor韓仁毓zh_TW
dc.contributor.advisorJen-Yu Hanen
dc.contributor.author陳佳菱zh_TW
dc.contributor.authorChia-Ling Chenen
dc.date.accessioned2025-02-19T16:37:31Z-
dc.date.available2025-02-20-
dc.date.copyright2025-02-19-
dc.date.issued2024-
dc.date.submitted2025-01-16-
dc.identifier.citationAFFES, N., KTARI, J., BEN AMOR, N., FRIKHA, T., and HAMAM, H., (2023). Comparison of YOLOV5, YOLOV6, YOLOV7 and YOLOV8 for Intelligent Video Surveillance. Journal of Information Assurance and Security,18(5).
Agrawal, T., Kirkpatrick, C., Imran, K., Figus, M.,(2020). Automatically Detecting Personal Protective Equipment on Persons in Images Using Amazon Rekognition. Amazon.
Alhassan Gamani, A. R., Arhin, I., and Kyeremateng Asamoah, A., (2024). Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment. arXiv e-prints, arXiv-2408.
Akcay, S.,Kundegorski, M.E., Willcocks, C.G., Breckon, T.P., (2018). Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery. IEEE Trans. Inf. Forensic Secur. 13, 2203–2215.
Bodla, N., Singh, B., Chellappa, R., and Davis, L. S. (2017)., Soft-NMS: Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 5561-5569.
Bianchini, M., Simic, M., Ghosh, A., Shaw, R. N., (2022) In Machine Learning for Robotics Applications. Springer Verlag, Singapore.
Bochkovskiy, A., Wang, C. Y., and Liao, H. Y. M., (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Chen, L., Lin, Z., and Zhang, Z., (2018). Low-light image enhancement for deep learning object detection and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Cao, Z., Simon, T., Wei, S. E., and Sheikh, Y., (2017). Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7291-7299.
Chen, G. T., (2022). Human body geometry extraction and motion analysis based on surveillance camera images, Master Thesis, National Taiwan University, Taipei, Taiwan.
Chuang, T. Y., Han, J. Y., Jhan, D. J., and Yang, M. D., (2020). Geometric recognition of pedestrians in monocular rotating imagery using faster R-CNN. Remote Sensing, 12(12), 1908.
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6), 679-698.
Dalal, N., andTriggs, B., (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Xuan, X., Wang, W., and al.,(2020). A Review of Research on Object Detection Based on Deep Learning. Journal of Physics: Conference Series 1684: 012028.
Forsyth, D. A., and Ponce, J., (2002). Computer vision: a modern approach. prentice hall professional technical reference.
Gonzalez, R. C., & WOODS 3rd, R. E., (2008). Edition. Digital Image Processing. Upper Saddle River, USA: Prentice Hall.
Girshick, R., (2015)., Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December, 1440–1448.
Golub, G. H., and Van Loan, C. F. (2013). Matrix computations. JHU press.
Huang, Z., Wang, Z., Huang, L., Huang, C., Wei, Y., and Liu, W., (2019). DCSP: Densely connected skip paths for crowd counting by contextual patch-based detection. IEEE Transactions on Image Processing, 28(3), 1257-1268.
He, K., Gkioxari, G., Dollár, P., and Girshick, R., (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, 2961-2969.
He, K., Zhang, X., Ren, S., and Sun, J., (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Hermens, F., (2024). Automatic object detection for behavioural research using YOLOv8. Behavior Research Methods, 1-24.
Hu, W.C.; Chen, C.H.; Chen, T.Y.; Huang, D.Y.; Wu, Z.C., (2015). Pedestrian detection and tracking from video captured by moving camera. Journal of Visual Communication and Image Representation,30,164–180.
Jhan, D. J., (2017). Positioning and Tracking of Pedestrians Using Image Sequence from a Single Camera, Master Thesis, National Taiwan University, Taipei, Taiwan.
Jocher, G., Chaurasia, A., Qiu, J., andUltralytics Team. (2021). YOLO by Ultralytics.
Jain, A. K. (1989). Fundamentals of digital image processing. Prentice-Hall, Inc..
Kim, J. W., Choi, J. Y., Ha, E. J., and Choi, J. H., (2023). Human pose estimation using mediapipe pose and optimization method based on a humanoid model. Applied Sciences, 13(4), 2700.
Kneip, L., Chli, M., Siegwart, R., and Siegwart, R. Y., (2011). Robust real-time visual odometry with a single camera and an IMU. British Machine Vision Conference, Scotland, UK.
Liu, Y., Shen, F., and van den Hengel, A., (2019). Context-aware crowd counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 539-555.
Lee, D. S., Kim, J. S., Jeong, S. C., and Kwon, S. K., (2020). Human height estimation by color deep learning and depth 3D conversion. Applied Sciences, 10(16), 5531.
LINDER, Wilfried., (2009). Digital photogrammetry. Berlin/Heidelberg. Germany: Springer.
Martinez-Martin, E., del Pobil, A. P., (2017). Object Detection and Recognition for Assistive Robots: Experimentation and Implementation. IEEE Robotics and Automation Magazine 24: 123–138.
Migliore, D., Rigamonti, R., Marzorati, D., Matteucci, M., and Sorrenti, D. G., (2009). Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. International Workshop on Safe Navigation in Open and Dynamic Environments Application to Autonomous Vehicles, Kobe, Japan, 12-17.
Mikhail, E. M., and Ackermann, F., (1976). Observations and Least Squares. New York: IEP-A Dun-Donnelley.
Mikhail, E. M., and Gracie, G., (1981). Analysis and adjustment of survey measurements. Van Nostrand Reinhold. New York.
Mikhail, E. M. (2001). Introduction to Modern Photogrammetry. John Williey& sons.
Ohashi, T., Ikegami, Y., and Nakamura, Y., (2020). Synergetic reconstruction from 2D pose and 3D motion for wide-space multi-person video motion capture in the wild. Image and Vision Computing, 104, 104028.
Park, S., Hwang, J., and Kwak, N., (2016). 3d human pose estimation using convolutional neural networks with 2d pose information. In European Conference on Computer Vision, Springer International Publishing, 156-169.
Redmon, J., andFarhadi, A., (2016). YOLOv3: An Incremental Improvement. In arXiv preprint arXiv:1804.02767.
Redmon, J., Divvala, S., Girshick, R., andFarhadi, A., (2016). You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 779-788.
RangeKing, Jocher, G., (2023). Brief summary of YOLOv8 model structure. GitHub Issue. Retrieved from https://github.com/ultralytics/ultralytics/issues/189.
Rasouli, A., Tsotsos, J. K., (2019). Autonomous Vehicles That Interact with Pedestrians: A Survey of Theory and Practice. IEEE Transactions on Intelligent Transportation Systems, 21, 900–918.
Rogez, G., Weinzaepfel, P., and Schmid, C., (2017). Lcr-net: Localization-classification-regression for human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3433-3441.
Ren, S.; He, K.; Girshick, R.; Sun, J., (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. "In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 91–99.
Redmon, J.; Farhadi, A., (2018). YOLOv3: An incremental improvement. arXiv, arXiv:1804.02767 .
Schindler, K. (2015). Mathematical foundations of photogrammetry. In Handbook of geomathematics, Springer, 3087-3103.
Sambolek, S., andIvasic-Kos, M. (2024). Person Detection and Geolocation Estimation in UAV Aerial Images: An Experimental Approach. InICPRAM, 785-792.
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., and Fua, P., (2016). Structured prediction of 3d human pose with deep neural networks. arXiv preprint, arXiv:1605.05180.
Tomono, M., (2005). 3-D localization and mapping using a single camera based on structure-from-motion with automatic baseline selection. IEEE International Conference on Robotics and Automation, Barcelona, Spain.
Wang, C. Y., Bochkovskiy, A., and Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,7464-7475.
Wolf, P. R., Dewitt, B. A., and Wilkinson, B. E., (2000). Elements of Photogrammetry with Applications in GIS. McGraw-Hill Education.
Wright, S. J. (2006). Numerical optimization.
Yang, M.D., Chao, C.F.,Lu, L.Y., Huang, K.S., Chen, Y.P., (2013)., Image-based 3D scene reconstruction and exploration in augmented reality. Autom. Constr. 3, 48–60.
Yang, M.D., Tseng, H.H.,Hsu, Y.C., Tsai, H.P., (2020).,Semantic Segmentation Using Deep Learning with Vegetation Indices for Rice Lodging Identification in Multi-date UAV Visible Images. Remote Sens. 12, 633.
Zhao, Z. Q. Zheng. P., Xu, ST and Wu, X., (2019). Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11), 3212-3232.
Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y., (2016)., Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 589-597.
Zoph, B., Le, Q.V., (2017)., Neural architecture search with reinforcement learning. In Proceedings of the 2017 International Conference on Learning Representations, Toulon, France, 24–26 April.
Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M. H., and Shao, L., (2022). Learning enriched features for fast image restoration and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1934-1948.
Zheng, C., Wu, W., Chen, C., Yang, T., Zhu, S., Shen, J., and Shah, M., (2023). Deep learning-based human pose estimation: A survey. ACM Computing Surveys, 56(1), 1-37.
Zou, F. Y., (2010). Analysis of Close-range Photogrammetry by Using Non-metric Camera, Master Thesis, National Chiao Tung University, Hsinchu, Taiwan.
https://docs.ultralytics.com/models/yolov8/#performance-metrics
https://github.com/ultralytics/ultralytics
https://google.github.io/mediapipe/solutions/pose.html
Ultralytics Pose Estimation models
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96581-
dc.description.abstract近年來,監控攝影機已經隨處可見,但傳統的監控系統受限於固定的位置、影像解析度以及安裝條件,難以利用立體影像進行三維物體坐標的檢測。本研究針對此問題,採用單一攝影機捕捉的動態影像序列,結合深度學習模型Yolov8-pose進行大規模影像數據處理,於兩個室內環境中進行驗證行人自動檢測及追蹤,並通過攝影測量的共線性方程計算三維幾何信息。結果顯示,在使用R5 5600X CPU及3070顯示卡的條件下,每幀影像處理的時間大約為1秒,行人身高誤差控制在±3公分以內,相機外方位參數pitch值誤差約為1度。此外,研究加入低頭角度偵測和上衣顏色的過濾條件,RMSE由15.8mm提升至13.3mm,提升系統定位精度。本研究方法顯著降低了計算時間和人力成本,為即時行人定位與追蹤提供了一種高效且低成本的替代方案,適用於室內監控場景並增加行人追蹤特徵指標,可延伸運用於提升室內場域管理效率及縮短行人意外告警時間。zh_TW
dc.description.abstractIn recent years, surveillance cameras have become ubiquitous, yet surveillance systems are typically fixed in specific positions and are constrained by the image resolution and installation location, making it difficult to utilize stereo images for three-dimensional object coordinate detection. This study was validated in two indoor environments, utilizing dynamic sequence images captured by a single camera. By combining YOLOv8-pose deep learning model for processing large volumes of image data, the system automatically detects and tracks pedestrians, and calculates three-dimensional geometric information based on photogrammetric collinearity equations. The results indicate that using an R5 5600X CPU and a 3070 GPU, the processing time
per image frame was approximately 1 second, with pedestrian height errors controlled within ±3 cm and the pitch error of the camera's exterior orientation parameters being around 1 degree. Additionally, this study incorporates head-down angle detection and clothing color filtering, improving the root mean square error from 15.8mm to 13.3mm, thereby enhancing the system's localization accuracy. This approach significantly reduces computational time and labor costs, providing an efficient and low-cost solution for real-time pedestrian localization and tracking. It is particularly suitable for indoor surveillance scenarios, enabling improved indoor management efficiency, reduced pedestrian accident alert times, and enhanced pedestrian tracking feature indicators for practical applications.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-19T16:37:31Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-02-19T16:37:31Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents謝辭 i
中文摘要 ii
Abstract iii
Chapter1. Introduction 1
1.1 Overview 1
1.2 Motivation and purpose 2
1.3 Thesis outline 3
Chapter 2. Literature Review 5
2.1 Localization and tracking of pedestrians using single-camera images 5
2.2 Inferring 3D pose information based on 2D pose data and deep learning 6
2.3 Detecting human skeleton and height using deep learning 7
2.4 Human detection and analyzing 3D human pose using deep learning 9
2.5 Summary 16
Chapter 3. Methodology 19
3.1 Calibration of camera internal orientation parameters 20
3.2 Calculation of external orientation parameters based on collinearity equations and adjustment 22
3.3 Testing and analysis of deep learning models 26
3.3.1 Mask R-CNN testing and analysis 27
3.3.2 Testing image brightness enhancement with MIRNet model 28
3.3.3 Detecting humans using YOLOv8-pose deep learning method 28
3.3.4 Accuracy evaluation metrics for human detection by YOLOv8-pose model 31
3.4 Location tracking from single-camera images 33
3.4.1 Calculation of human geometric information 34
3.4.2 Incorporating head-down angle for specific pedestrian location tracking 39
3.4.3 Incorporating clothing color for specific pedestrian location tracking 42
3.4.4 Quality assessment for localization 44
3.4.5 Summary 45
Chapter 4. Numerical Results and Analysis 47
4.1 Camera specifications and research sites 47
4.2 Calibration results of camera internal orientation parameters 48
4.3 Calculation results of camera external orientation parameters 50
4.4 Detecting humans using YOLOv8-pose models and extracting image coordinates 52
4.4.1 Analysis of human detection results 53
4.4.2 Real-world environment testing 59
4.4.3 Analysis of image coordinate extraction results 64
4.4.4 Analysis of the relationship between camera configuration and localization accuracy 69
4.5 Testing the processing time of YOLOv8-pose models on a laptop 70
4.6 Human location tracking 72
4.6.1 Analysis of localization results without incorporating human pose 72
4.6.2 Analysis of localization results incorporating head-down angle 75
4.6.3 Analysis of localization results incorporating clothing color 79
4.6.4 Integrated analysis of localization results incorporating head-down angle and clothing color 85
4.7 Discussion 88
Chapter 5. Conclusion and Future Work 91
5.1 Conclusion 91
5.2 Future work 93
References 95
Appendix 104
-
dc.language.isoen-
dc.subject深度學習zh_TW
dc.subject行人追蹤特徵指標zh_TW
dc.subject單相機定位zh_TW
dc.subjectYOLOv8-posezh_TW
dc.subject即時行人偵測zh_TW
dc.subjectPedestrian tracking feature indicatorsen
dc.subjectReal-time Pedestrian Detectionen
dc.subjectYOLOv8-poseen
dc.subjectLocation Tracking From Single-Camera Imagesen
dc.subjectDeep Learningen
dc.title基於深度學習以及幾何資訊進行單相機影像目標偵測與空間定位追蹤zh_TW
dc.titleObject Detection and Location Tracking From Single-Camera Images Based on Deep Learning and Geometric Informationen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree博士-
dc.contributor.oralexamcommittee高書屏;甯方璽;郭重言;蘇文瑞;蔡亞倫zh_TW
dc.contributor.oralexamcommitteeShu-Ping Kao;Fang-Si Ning;Chong-Yan Kuo;Wun-Ruei Su;ya-luns tsaien
dc.subject.keyword即時行人偵測,YOLOv8-pose,單相機定位,深度學習,行人追蹤特徵指標,zh_TW
dc.subject.keywordReal-time Pedestrian Detection,YOLOv8-pose,Location Tracking From Single-Camera Images,Deep Learning,Pedestrian tracking feature indicators,en
dc.relation.page123-
dc.identifier.doi10.6342/NTU202500115-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2025-01-16-
dc.contributor.author-college工學院-
dc.contributor.author-dept土木工程學系-
dc.date.embargo-lift2026-01-14-
Appears in Collections:土木工程學系

Files in This Item:
File SizeFormat 
ntu-113-1.pdf
  Restricted Access
25.98 MBAdobe PDFView/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved