請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94493完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 連豊力 | zh_TW |
| dc.contributor.advisor | Feng-Li Lian | en |
| dc.contributor.author | 邱偉銘 | zh_TW |
| dc.contributor.author | Wei-Ming Chiu | en |
| dc.date.accessioned | 2024-08-16T16:21:19Z | - |
| dc.date.available | 2024-08-17 | - |
| dc.date.copyright | 2024-08-16 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-09 | - |
| dc.identifier.citation | [1: Zharovskikh 2020] Anastasiya Zharovskikh. “Empowering sports with 3d pose estimation.” (19 May 2020), [Online]. Available: https : / / indatalabs . com / blog / 3d - pose estimation (visited on 04/19/2024).
[2: Boesch 2024] Gaudenz Boesch. “Computer vision in sports - use cases in 2024.” (2024), [On line]. Available: https://viso.ai/applications/visual-ai-in-sports/(visited on 04/19/2024). [3: VICON 2024] Inc VICON. “What is motion capture.” (2024), [Online]. Available: https:// www.vicon.com/about-us/what-is-motion-capture/ (visited on 04/19/2024). [4: 3M 2016] Saftey 3M. “From medicine to the movies and back.” (2016), [Online]. Avail able: https://www.3m.co.uk/3M/en_GB/safety- uk/stories/full story/~/scotchlite-markers/?storyid=aad35e8d-c573-4729-a2fb 6983c103c607 (visited on 04/19/2024). [5: Fang et al. 2022] Hao-Shu Fang et al., “Alphapose: Whole-body regional multi-person pose esti mation and tracking in real-time,” IEEE Transactions on Pattern Analysis andMachine Intelligence, 2022. [6: Cao et al. 2019] Z. Cao et al., “Openpose: Realtime multi-person 2d pose estimation using partaffinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence,2019. [7: Jocher, Chaurasia, and Qiu 2023] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. “Ultralytics YOLO.” (Jan. 2023),[Online]. Available: https://github.com/ultralytics/ultralytics.163 [8: Rachel Metz 2022] CNN Business Rachel Metz. “A vr-controlled robot that throws boxing punches.”(Dec. 2022), [Online]. Available: https://edition.cnn.com/2022/05/12/tech/vr-leg-tracking-research/index.html (visited on 04/20/2024). [9: BBC 2023] News BBC. “A vr-controlled robot that throws boxing punches.” (15 Dec 2023),[Online]. Available: https : / / www . bbc . com / news / world - us - canada -67722014 (visited on 04/20/2024). [10: Baumgartner, Paassen, and Klatt 2023] Tobias Baumgartner, Benjamin Paassen, and Stefanie Klatt, “Extracting spatialknowledge from track and field broadcasts for monocular 3d human pose estima tion,” Scientific Reports, vol. 13, no. 1, p. 14 031, 2023. [11: Lin et al. 2014] Tsung-Yi Lin et al., “Microsoft COCO: common objects in context,”CoRR, vol. abs/1405.0312, 2014. arXiv: 1405.0312. [12: Johnson and Everingham 2010] Sam Johnson and Mark Everingham, “Clustered pose and nonlinear appearancemodels for human pose estimation.,” in bmvc, Aberystwyth, UK, vol. 2, 2010, p. 5. [13: Johnson and Everingham 2011] Sam Johnson and Mark Everingham, “Learning effective human pose estimationfrom inaccurate annotation,” in CVPR 2011, IEEE, 2011, pp. 1465–1472. [14: SmugMug ] Inc. SmugMug. “About flickr.” (), [Online]. Available: https://www.flickr.com/about (visited on 04/23/2024). [15: Sapp and Taskar 2013] Ben Sapp and Ben Taskar, “Modec: Multimodal decomposable models for humanpose estimation,” in Proceedings of the IEEE conference on computer vision andpattern recognition, 2013, pp. 3674–3681.164 [16: Andriluka et al. 2018] Mykhaylo Andriluka et al., “Posetrack: A benchmark for human pose estimationand tracking,” in Proceedings of the IEEE conference on computer vision andpattern recognition, 2018, pp. 5167–5176. [17: Li et al. 1812] J Li et al., “Efficient crowded scenes pose estimation and a new benchmark. 2018.10.48550,” arXiv, 1812. [18: Samkari et al. 2023] Esraa Samkari et al., “Human pose estimation using deep learning: A systematicliterature review,” Machine Learning and Knowledge Extraction, vol. 5, no. 4,pp. 1612–1659, 2023. [19: Zheng et al. 2023] Ce Zheng et al., “Deep learning-based human pose estimation: A survey,” ACMComputing Surveys, vol. 56, no. 1, pp. 1–37, 2023. [20: Song et al. 2021] Liangchen Song et al., “Human pose estimation and its application to action recog nition: A survey,” Journal of Visual Communication and Image Representation,vol. 76, p. 103 055, 2021. [21: Toshev and Szegedy 2014] Alexander Toshev and Christian Szegedy, “Deeppose: Human pose estimation viadeep neural networks,” in Proceedings of the IEEE conference on computer visionand pattern recognition, 2014, pp. 1653–1660. [22: Carreira et al. 2016] Joao Carreira et al., “Human pose estimation with iterative error feedback,” inProceedings of the IEEE conference on computer vision and pattern recognition,2016, pp. 4733–4742. [23: Zhang, Zhu, and Ye 2019] Feng Zhang, Xiatian Zhu, and Mao Ye, “Fast human pose estimation,” in Pro 165ceedings of the IEEE/CVF conference on computer vision and pattern recognition,2019, pp. 3517–3526. [24: Newell, Yang, and Deng 2016] Alejandro Newell, Kaiyu Yang, and Jia Deng, “Stacked hourglass networks forhuman pose estimation,” in Computer Vision–ECCV 2016: 14th European Con ference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, PartVIII 14, Springer, 2016, pp. 483–499. [25: Yang et al. 2017] Wei Yang et al., “Learning feature pyramids for human pose estimation,” in pro ceedings of the IEEE international conference on computer vision, 2017, pp. 1281–1290. [26: Tian et al. 2019] Yan Tian et al., “Densely connected attentional pyramid residual network for hu man pose estimation,” Neurocomputing, vol. 347, pp. 13–23, 2019. [27: Chen et al. 2017] Yu Chen et al., “Adversarial posenet: A structure-aware convolutional networkfor human pose estimation,” in Proceedings of the IEEE international conferenceon computer vision, 2017, pp. 1212–1221. [28: Shamsolmoali et al. 2020] Pourya Shamsolmoali et al., “Amil: Adversarial multi-instance learning for hu man pose estimation,” ACM Transactions on Multimedia Computing, Communi cations, and Applications (TOMM), vol. 16, no. 1s, pp. 1–23, 2020. [29: Creswell et al. 2018] Antonia Creswell et al., “Generative adversarial networks: An overview,” IEEEsignal processing magazine, vol. 35, no. 1, pp. 53–65, 2018. [30: Artacho and Savakis 2020] Bruno Artacho and Andreas Savakis, “Unipose: Unified human pose estimation insingle images and videos,” in Proceedings of the IEEE/CVF conference on com puter vision and pattern recognition, 2020, pp. 7035–7044.166 [31: Fan, Liu, and Wang 2021] Zhipeng Fan, Jun Liu, and Yao Wang, “Motion adaptive pose estimation fromcompressed videos,” in Proceedings of the IEEE/CVF International Conferenceon Computer Vision, 2021, pp. 11 719–11 728. [32: Yu et al. 2019] Yong Yu et al., “A review of recurrent neural networks: Lstm cells and networkarchitectures,” Neural computation, vol. 31, no. 7, pp. 1235–1270, 2019. [33: Girshick et al. 2014] Ross Girshick et al., “Rich feature hierarchies for accurate object detection and se mantic segmentation,” in Proceedings of the IEEE conference on computer visionand pattern recognition, 2014, pp. 580–587. [34: Girshick 2015] Ross Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conferenceon computer vision, 2015, pp. 1440–1448. [35: He et al. 2017] Kaiming He et al., “Mask r-cnn,” in Proceedings of the IEEE international con ference on computer vision, 2017, pp. 2961–2969. [36: Zhang et al. 2020] Feng Zhang et al., “Distribution-aware coordinate representation for human poseestimation,” in Proceedings of the IEEE/CVF conference on computer vision andpattern recognition, 2020, pp. 7093–7102. [37: Li et al. 2020] Ying Li et al., “A-hrnet: Attention based high resolution network for human poseestimation,” in 2020 Second International Conference on Transdisciplinary AI(TransAI), IEEE, 2020, pp. 75–79. [38: Yu et al. 2021] Changqian Yu et al., “Lite-hrnet: A lightweight high-resolution network,” in Pro ceedings of the IEEE/CVF conference on computer vision and pattern recognition,2021, pp. 10 440–10 450.167 [39: Xu et al. 2022] Dingning Xu et al., “Ldnet: Lightweight dynamic convolution network for humanpose estimation,” Advanced Engineering Informatics, vol. 54, p. 101 785, 2022. [40: Jin et al. 2020] Sheng Jin et al., “Differentiable hierarchical graph grouping for multi-person poseestimation,” in Computer Vision–ECCV 2020: 16th European Conference, Glas gow, UK, August 23–28, 2020, Proceedings, Part VII 16, Springer, 2020, pp. 718–734. [41: Brasó, Kister, and Leal-Taixé 2021] Guillem Brasó, Nikita Kister, and Laura Leal-Taixé, “The center of attention:Center-keypoint grouping via attention for multi-person pose estimation,” in Pro ceedings of the IEEE/CVF International Conference on Computer Vision, 2021,pp. 11 853–11 863. [42: Nie et al. 2018] Xuecheng Nie et al., “Pose partition networks for multi-person pose estimation,”in Proceedings of the european conference on computer vision (eccv), 2018, pp. 684–699. [43: Luo et al. 2021] Zhengxiong Luo et al., “Rethinking the heatmap regression for bottom-up humanpose estimation,” in Proceedings of the IEEE/CVF conference on computer visionand pattern recognition, 2021, pp. 13 264–13 273. [44: Liu et al. 2021] Zhenguang Liu et al., “Deep dual consecutive network for human pose estima tion,” in Proceedings of the IEEE/CVF conference on computer vision and patternrecognition, 2021, pp. 525–534. [45: Feng et al. 2023] Runyang Feng et al., “Mutual information-based temporal difference learning forhuman pose estimation in video,” in Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition, 2023, pp. 17 131–17 141.168 [46: Girdhar et al. 2018] Rohit Girdhar et al., “Detect-and-track: Efficient pose estimation in videos,” inProceedings of the IEEE conference on computer vision and pattern recognition,2018, pp. 350–359. [47: Wang, Tighe, and Modolo 2020] Manchen Wang, Joseph Tighe, and Davide Modolo, “Combining detection andtracking for human pose estimation in videos,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, 2020, pp. 11 088–11 096. [48: Bertasius et al. 2019] Gedas Bertasius et al., “Learning temporal pose estimation from sparsely-labeledvideos,” Advances in neural information processing systems, vol. 32, 2019. [49: Xiu et al. 2018] Yuliang Xiu et al., “Pose flow: Efficient online pose tracking,” arXiv preprintarXiv:1802.00977, 2018. [50: Raaj et al. 2018] Yaadhav Raaj et al., “Efficient Online Multi-Person 2D Pose Tracking with Recur rent Spatio-Temporal Affinity Fields,” arXiv e-prints, arXiv:1811.11975, arXiv:1811.11975,Nov. 2018. DOI: 10.48550/arXiv.1811.11975. arXiv: 1811.11975 [cs.CV]. [51: Doering, Iqbal, and Gall 2018] Andreas Doering, Umar Iqbal, and Juergen Gall, “Joint Flow: Temporal FlowFields for Multi Person Tracking,” arXiv e-prints, arXiv:1805.04596, arXiv:1805.04596,May 2018. DOI: 10.48550/arXiv.1805.04596. arXiv: 1805.04596 [cs.CV]. [52: El Kaid and Baïna 2023] Amal El Kaid and Karim Baïna, “A systematic review of recent deep learningapproaches for 3d human pose estimation,” Journal of Imaging, vol. 9, no. 12,2023, ISSN: 2313-433X. DOI: 10.3390/jimaging9120275. [53: Ionescu et al. 2014] Catalin Ionescu et al., “Human3.6m: Large scale datasets and predictive methodsfor 3d human sensing in natural environments,” IEEE Transactions on Pattern169Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325–1339, 2014. DOI: 10.1109/TPAMI.2013.248. [54: Mehta et al. 2018] Dushyant Mehta et al., “Single-shot multi-person 3d pose estimation from monoc ular rgb,” in 3D Vision (3DV), 2018 Sixth International Conference on, IEEE, Sep.2018. [55: Véges and Lörincz 2020] Márton Véges and András Lörincz, “Temporal smoothing for 3d human pose esti mation and localization for occluded people,” ArXiv, vol. abs/2011.00250, 2020. [56: Tripathi et al. 2023] Shashank Tripathi et al., “3d human pose estimation via intuitive physics,” 2023IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),pp. 4713–4725, 2023. [57: Luvizon, Tabia, and Picard 2023] D.C. Luvizon, Hedi Tabia, and David Picard, “Ssp-net: Scalable sequential pyra mid networks for real-time 3d human pose regression,” Pattern Recognition, vol. 142,p. 109 714, 2023, ISSN: 0031-3203. DOI: https : / / doi . org / 10 . 1016 / j .patcog.2023.109714. [58: Joo, Neverova, and Vedaldi 2021] Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi, “Exemplar fine-tuning for3d human model fitting towards in-the-wild 3d human pose estimation,” in 2021International Conference on 3D Vision (3DV), 2021, pp. 42–52. DOI: 10.1109/3DV53792.2021.00015. [59: Bogo et al. 2016] Federica Bogo et al., “Keep it smpl: Automatic estimation of 3d human pose andshape from a single image,” ArXiv, vol. abs/1607.08128, 2016. [60: Pishchulin et al. 2015] Leonid Pishchulin et al., “Deepcut: Joint subset partition and labeling for multi170person pose estimation,” 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pp. 4929–4937, 2015. [61: Choi, Shim, and Kim 2022] Jeongjun Choi, Dongseok Shim, and H. Kim, “Diffupose: Monocular 3d humanpose estimation via denoising diffusion probabilistic model,” 2023 IEEE/RSJ In ternational Conference on Intelligent Robots and Systems (IROS), pp. 3773–3780,2022. [62: Mehta et al. 2017] Dushyant Mehta et al., “Vnect,” ACM Transactions on Graphics (TOG), vol. 36,pp. 1–14, 2017. [63: Li et al. 2021] Wenhao Li et al., “Mhformer: Multi-hypothesis transformer for 3d human poseestimation,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog nition (CVPR), pp. 13 137–13 146, 2021. [64: Benzine et al. 2020] Abdallah Benzine et al., “Pandanet: Anchor-based single-shot multi-person 3dpose estimation,” 2020 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pp. 6855–6864, 2020. [65: Rogez, Weinzaepfel, and Schmid 2017] Grégory Rogez, Philippe Weinzaepfel, and Cordelia Schmid, “Lcr-net: Localization classification-regression for human pose,” in 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2017, pp. 1216–1224. DOI: 10.1109/CVPR.2017.134. [66: Rogez, Weinzaepfel, and Schmid 2018] Grégory Rogez, Philippe Weinzaepfel, and Cordelia Schmid, “Lcr-net++: Multi person 2d and 3d pose detection in natural images,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 42, pp. 1146–1161, 2018.171 [67: Li et al. 2020] Jiefeng Li et al., “Hmor: Hierarchical multi-person ordinal relations for monocularmulti-person 3d pose estimation,” ArXiv, vol. abs/2008.00206, 2020. [68: Reddy et al. 2021] N. Reddy et al., “Tessetrack: End-to-end learnable multi-person articulated 3dpose tracking,” in 2021 IEEE/CVF Conference on Computer Vision and Pat tern Recognition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society, Jun.2021, pp. 15 185–15 195. DOI: 10.1109/CVPR46437.2021.01494. [69: Mehta et al. 2019] Dushyant Mehta et al., “Xnect,” ACM Transactions on Graphics (TOG), vol. 39,82:1–82:17, 2019. [70: Cheng et al. 2021] Yu-Feng Cheng et al., “Monocular 3d multi-person pose estimation by integratingtop-down and bottom-up networks,” 2021 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), pp. 7645–7655, 2021. [71: Jin et al. 2022] Lei Jin et al., “Single-stage is enough: Multi-person absolute 3d pose estima tion,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni tion (CVPR), 2022, pp. 13 076–13 085. DOI: 10.1109/CVPR52688.2022.01274. [72: Wang et al. 2022] Z. Wang et al., “Distribution-aware single-stage models for multi-person 3d poseestimation,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog nition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2022,pp. 13 086–13 095. DOI: 10.1109/CVPR52688.2022.01275. [73: Zhan et al. 2022] Y. Zhan et al., “Ray3d: Ray-based 3d human pose estimation for monocular ab solute 3d localization,” in 2022 IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society,Jun. 2022, pp. 13 106–13 115. DOI: 10.1109/CVPR52688.2022.01277.172 [74: Zhang et al. 2019] Haokui Zhang et al., “Exploiting temporal consistency for real-time video depthestimation,” in 2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019, pp. 1725–1734. DOI: 10.1109/ICCV.2019.00181. [75: Lee, Lee, and Lee 2018] Kyoungoh Lee, Inwoong Lee, and Sanghoon Lee, “Propagating lstm: 3d pose es timation based on joint interdependency,” in European Conference on ComputerVision, 2018. [76: Lea et al. 2016] Colin Lea et al., “Temporal convolutional networks: A unified approach to ac tion segmentation,” in Computer Vision – ECCV 2016 Workshops, Gang Hua andHervé Jégou, Eds., Cham: Springer International Publishing, 2016, pp. 47–54,ISBN: 978-3-319-49409-8. [77: Liu, Guang, and Rojas 2020] Junfa Liu, Yisheng Guang, and Juan Rojas, “Gast-net: Graph attention spatio temporal convolutional networks for 3d human pose estimation in video,” ArXiv,vol. abs/2003.14179, 2020. [78: Zhou et al. 2020] Jie Zhou et al., “Graph neural networks: A review of methods and applications,”AI Open, vol. 1, pp. 57–81, 2020, ISSN: 2666-6510. DOI: https://doi.org/10.1016/j.aiopen.2021.01.001. [79: Wang and Zhang 2022] Tianfeng Wang and Xiaoxu Zhang, “Simplified-attention enhanced graph con volutional network for 3d human pose estimation,” Neurocomputing, vol. 501,pp. 231–243, 2022, ISSN: 0925-2312. DOI: https://doi.org/10.1016/j.neucom.2022.06.033. [80: Zhang et al. 2022] Jinlu Zhang et al., “Mixste: Seq2seq mixed spatio-temporal encoder for 3d hu man pose estimation in video,” in 2022 IEEE/CVF Conference on Computer Vi 173sion and Pattern Recognition (CVPR), 2022, pp. 13 222–13 232. DOI: 10.1109/CVPR52688.2022.01288. [81: Shan et al. 2022] Wenkang Shan et al., “P-stmo: Pre-trained spatial temporal many-to-one modelfor 3d human pose estimation,” in Computer Vision – ECCV 2022, Shai Avidanet al., Eds., Cham: Springer Nature Switzerland, 2022, pp. 461–478, ISBN: 978-3-031-20065-6. [82: Wang et al. 2021] Tao Wang et al., “Direct multi-view multi-person 3d human pose estimation,”Advances in Neural Information Processing Systems, 2021. [83: Ma et al. 2021] Haoyu Ma et al., “Transfusion: Cross-view fusion with transformer for 3d humanpose estimation,” in British Machine Vision Conference, 2021. [84: Zhang et al. 2023] Xiaoyan Zhang et al., “Deep learning-based real-time 3d human pose estimation,”Engineering Applications of Artificial Intelligence, vol. 119, p. 105 813, 2023,ISSN: 0952-1976. DOI: https : / / doi . org / 10 . 1016 / j . engappai . 2022 .105813. [85: Wandt et al. 2021] Bastian Wandt et al., “Canonpose: Self-supervised monocular 3d human pose es timation in the wild,” in Computer Vision and Pattern Recognition (CVPR), Jun.2021. [86: Stelzer, Pourvoyeur, and Fischer 2004] Andreas Stelzer, Klaus Pourvoyeur, and Alexander Fischer, “Concept and appli cation of lpm-a novel 3-d local position measurement system,” IEEE Transactionson microwave theory and techniques, vol. 52, no. 12, pp. 2664–2669, 2004. [87: Schepers and Veltink 2010] H Martin Schepers and Petrus H Veltink, “Stochastic magnetic measurement model174for relative position and orientation estimation,” Measurement science and tech nology, vol. 21, no. 6, p. 065 801, 2010. [88: Spörri, Schiefermüller, and Müller 2016] Jörg Spörri, Christian Schiefermüller, and Erich Müller, “Collecting kinematicdata on a ski track with optoelectronic stereophotogrammetry: A methodologicalstudy assessing the feasibility of bringing the biomechanics lab to the field,” PLoSOne, vol. 11, no. 8, e0161757, 2016. [89: Begon et al. 2009] Mickaël Begon et al., “Computation of the 3d kinematics in a global frame overa 40 m-long pathway using a rolling motion analysis system,” Journal of biome chanics, vol. 42, no. 16, pp. 2649–2653, 2009. [90: Joukov et al. 2017] Vladimir Joukov et al., “Human motion estimation on lie groups using imu mea surements,” in 2017 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS), IEEE, 2017, pp. 1965–1972. [91: Sarafianos et al. 2016] Nikolaos Sarafianos et al., “3d human pose estimation: A review of the literatureand analysis of covariates,” Computer Vision and Image Understanding, vol. 152,pp. 1–20, 2016. [92: Daubney, Gibson, and Campbell 2012] Ben Daubney, David Gibson, and Neill Campbell, “Estimating pose of articu lated objects using low-level motion,” Computer Vision and Image Understand ing, vol. 116, no. 3, pp. 330–346, 2012. [93: ning, Hong-ming, and Li-hong 2008] Wei ning, Cai Hong-ming, and Jiang Li-hong, “A dynamical adjustment partition ing algorithm for distributed virtual environment systems,” in Proceedings of The7th ACM SIGGRAPH International Conference on Virtual-Reality Continuum andIts Applications in Industry, 2008, pp. 1–5.175 [94: Huang and Yang 2009] Jia-Bin Huang and Ming-Hsuan Yang, “Estimating human pose from occludedimages,” in Asian Conference on Computer Vision, Springer, 2009, pp. 48–60. [95: Grauman, Shakhnarovich, and Darrell 2003] Grauman, Shakhnarovich, and Darrell, “Inferring 3d structure with a statisticalimage-based shape model,” in Proceedings Ninth IEEE International Conferenceon Computer Vision, IEEE, 2003, pp. 641–647. [96: Rosales and Sclaroff 2006] Rómer Rosales and Stan Sclaroff, “Combining generative and discriminative mod els in a framework for articulated pose estimation,” International Journal of Com puter Vision, vol. 67, pp. 251–276, 2006. [97: Sedai, Bennamoun, and Huynh 2013] Suman Sedai, Mohammed Bennamoun, and Du Q Huynh, “Discriminative fusionof shape and appearance features for human pose estimation,” Pattern recognition,vol. 46, no. 12, pp. 3223–3237, 2013. [98: Redmon and Farhadi 2018] Joseph Redmon and Ali Farhadi, “Yolov3: An incremental improvement,” ArXiv,vol. abs/1804.02767, 2018. [99: Tan, Pang, and Le 2020] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detec tion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog nition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2020,pp. 10 778–10 787. DOI: 10.1109/CVPR42600.2020.01079. [100: He et al. 2016] Kaiming He et al., “Deep residual learning for image recognition,” in 2016 IEEEConference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. DOI: 10.1109/CVPR.2016.90.176 [101: Wang et al. 2017] Panqu Wang et al., “Understanding convolution for semantic segmentation,” arXivpreprint arXiv:1702.08502, 2017. [102: Shi et al. 2016] Wenzhe Shi et al., “Real-time single image and video super-resolution using anefficient sub-pixel convolutional neural network,” in 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2016, pp. 1874–1883. DOI:10.1109/CVPR.2016.207. [103: Bradski 2000] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. [104: Zhang 2000] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334,2000. DOI: 10.1109/34.888718. [105: Reverter Valeiras et al. 2016] David Reverter Valeiras et al., “An event-based solution to the perspective-n-pointproblem,” Frontiers in Neuroscience, vol. 10, 2016. DOI: 10.3389/fnins.2016.00208. [106: Becker 2017] Alex Becker. “KalmanFilter.Net Overview.” (2017), [Online]. Available: https://www.kalmanfilter.net/default.aspx (visited on 04/27/2024). [107: Assimakis, Adam, and Douladiris 2012] Nicholas Assimakis, Maria Adam, and Anargyros Douladiris, “Information filterand kalman filter comparison: Selection of the faster filter,” International Journalof Information Engineering (IJIE), vol. 2, pp. 1–5, Jan. 2012. [108: Kamal, Farrell, and Roy-Chowdhury 2013] Ahmed. T. Kamal, Jay. A. Farrell, and Amit. K. Roy-Chowdhury, “Informationweighted consensus filters and their application in distributed camera networks,”177IEEE Transactions on Automatic Control, vol. 58, no. 12, pp. 3112–3125, 2013.DOI: 10.1109/TAC.2013.2277621. [109: Wang et al. 2024] Guoan Wang et al., “Development of a low-cost and portable walker-based humanmotion estimation system,” IEEE/ASME Transactions on Mechatronics, pp. 1–11,2024. DOI: 10.1109/TMECH.2024.3361466. [110: Gaveau and Papaxanthis 2011] J. Gaveau and C. Papaxanthis, “The temporal structure of vertical arm move ments,” PloS one, 6(7), e22045, pp. 1–11, 2011. DOI: 10.1371/journal.pone.0022045. [111: Na, Choi, and Kim 2022] Ki-In Na, Sunglok Choi, and Jong-Hwan Kim, “Adaptive target tracking with in teracting heterogeneous motion models,” IEEE Transactions on Intelligent Trans portation Systems, vol. 23, no. 11, pp. 21 301–21 313, 2022. [112: Li, Liu, and Qu 2022] Shen Li, Yang Liu, and Xiaobo Qu, “Model controlled prediction: A reciprocal alternative of model predictive control,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 6, pp. 1107–1110, 2022. [113: Zhang 1998] Zhengyou Zhang, “Determining the epipolar geometry and its uncertainty: A re view,” International journal of computer vision, vol. 27, pp. 161–195, 1998. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94493 | - |
| dc.description.abstract | 隨著人機互動需求的逐年上升,三維人體姿態估測在近年來更為受到了重視。為了使機器人、工具機械乃至於電子裝置能夠感知、分析,進而學習、理解,最後隨人類肢體動作自動反應,正是需要一個快速且足夠準確的估測方法。
基於視覺的估測方式雖然沒有達到使用目標標記的光學式方法的估測準確度,但是僅使用單一鏡頭相機帶來低成本與容易建置的優點,也因此被認為是更為適合用於現實應用的方法。隨著電腦運算量能的提升與機器學習技術的發展,已經有許多採用深度學習技術的基於視覺的估測方法被提出、研究。然而三維的估測不像二維的估測已經達到精準的即時運算,三維的估測仍然無法解決運算速度與估測準確度的兩難問題。而三維估測的結果合理性與重新投影回二維影像的重投影誤差也還不是很受到重視,前者在需要人體動態分析的應用中會是一個重要的關鍵,後者則是與估測方法的一般化可行性有相關。 本篇論文中提出了一個運算快速且容易建置的多視角三維人體姿態估測方法,此方法利用不同視角的多台單一鏡相機,然後使用從相機影像得到的二維人體估測作為輸入資料以確保計算速度,並通過一個兩階段式的估測系統與最後的後處理得到平滑的連續三維的人體姿態估測。在第一階段的系統中會進行最佳化運算,並將上述提及的重投影誤差納入考量,一定程度保證此方法的一般化可行性。在第二階段的系統中引入了基於運動模型的濾波器,將基本運動學的條件、關係包含在濾波器的模型中,讓估測的姿態符合基本運動學。更進一步來說,我們所提出的兩階段式系統架構將兩種估測方法組合在一起,能夠達到各自優缺點的互補,將非線性及線性的條件簡單的融入在姿態估測中。然後,為了得到更平滑的連續估測動態,我們在系統之後再加入一段資料後處理。最後,此方法估測結果的合理性也會以運動學與幾何學的角度下在此篇論中進行分析、探討。 | zh_TW |
| dc.description.abstract | The techniques of 3D human pose estimation have gain much attention in recent years due to the growing demand of human robot or human machine interactions. In order to sense, analyze, learn, realize, and eventually react automatically to human action, a fast and adequately accurate 3D human pose estimation method is much needed for all robots,machines, and electronic devices.
Despite not being as accurate as those target-marking optoelectronic methods, vision-based methods using only monocular cameras has the advantage of low cost and easy implementation, and thus, is considered as the method more likely to be applied to real-world applications. With the increasing computation power and the well-developed machine learning techniques, many vision-based estimation methods based on deep learning have been proposed and studied. But, unlike the 2D estimations achieving real-time computation with good precision, 3D pose estimations remain struggling between computation speed and estimation accuracy. Moreover, the rationality of estimated motion and the reprojection error back to the 2D iamges have not gain much attention, which the former is an crucial part of applications requiring motion analysis, and the latter is related to the generalizability of the estimation methods. A multi-view 3D human pose estimation method using multiple monocular cameras is proposed in this thesis for easy implementation and fast computation by applying optimization through multi-view real-time 2D estimations and model-based filtering. In the first stage of the system, the reprojection error of the estimates are taken into consideration within the optimization process to assure the generalizability of the method to some extent. Then, the basic kinematic relations of the human body are also included in the dynamic model of the model-based filtering, which is the second stage of the system. With the two-stage structure of the system, the disadventages of each stage are covered up by the other stage, and the nonlinear and linear relations and constraints are easily included within the estimation results. Next, a postprocess method is applied to further smooth the motion of the acquired estimates after the two-stage estimation. Last but not least, the rationality of the estimated motion is also analyzed in this thesis in respect of the values of kinematic and geometric parameters. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T16:21:19Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-08-16T16:21:19Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
摘要 ii ABSTRACT iv CONTENTS vii LIST OF FIGURES xi LIST OF TABLES xiii Chapter 1 Introduction 1 1.1 Motivation and Applications of Human Pose Estimation . . . . . . . 1 1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Single target tracking . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Multi-target tracking . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Background and Literature Survey 11 2.1 2D Vision-Based Human Pose Estimation . . . . . . . . . . . . . . . 11 2.1.1 Single Pose Image-based 2D Human Pose Estimation . . . . . . . 12 2.1.2 Single Pose Video-based 2D Human Pose Estimation . . . . . . . 13 2.1.3 Multi-pose Image-based 2D Human Pose Estimation . . . . . . . . 14 2.1.4 Multi-pose Video-based 2D Human Pose Estimation . . . . . . . . 15 2.2 3D Vision-Based Human Pose Estimation . . . . . . . . . . . . . . . 16 2.2.1 Single Pose Monocular Image-based 3D Human Pose Estimation . 17 2.2.2 Multi-Pose Monocular Image-based 3D Human Pose Estimation . 18 2.2.3 Monocular Video-based 3D Human Pose Estimation . . . . . . . . 19 2.2.4 Multi-View 3D Human Pose Estimation . . . . . . . . . . . . . . 20 2.3 Other 3D Human Pose Estimation Methods . . . . . . . . . . . . . . 21 Chapter 3 Related Works and Algorithms 25 3.1 AlphaPose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.1 Extrinsic Parameters of Pinhole Camera Model . . . . . . . . . . . 26 3.2.2 Intrinsic Parameters of Pinhole Camera Model . . . . . . . . . . . 28 3.2.3 Linear Transformation of Pinhole Camera Model . . . . . . . . . . 29 3.3 Zhang’s Camera Calibration Method . . . . . . . . . . . . . . . . . . 30 3.4 Model-based Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Kalman Filter in Linear Time-invariant Systems . . . . . . . . . . 32 3.4.2 Information Filter in Linear Time-invariant Systems . . . . . . . . 36 3.4.3 Information-weighted Consensus Filter in Linear Time-invariant Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Tracking with Linear Quadratic Regulator . . . . . . . . . . . . . . 42 Chapter 4 Proposed 3D Human Pose Estimation method 45 4.1 Preprocessing for 3D Human Estimation . . . . . . . . . . . . . . . 45 4.1.1 Camera Calibration for Camera System . . . . . . . . . . . . . . . 46 4.1.2 Camera Image Undistortion . . . . . . . . . . . . . . . . . . . . . 49 4.1.3 2D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . 50 4.2 Single-stage Estimation by Optimization . . . . . . . . . . . . . . . 51 4.2.1 Bounding Box Optimization Method . . . . . . . . . . . . . . . . 51 4.2.2 Information-weighted Consensus Filter on the Ground . . . . . . . 56 4.2.3 Scattered Bounding Box Optimization Method . . . . . . . . . . . 59 4.3 Two-stage Estimation by Optimization and Model-based Filtering . . 62 4.3.1 Bounding Box Optimization Method wtih Information Filter . . . . 62 4.3.2 Scattered Bounding Box Optimization Method wtih Information Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4 Optimization Method with Length Constraints . . . . . . . . . . . . 67 4.4.1 Stepwise Optimization with Length Constraint . . . . . . . . . . . 68 4.4.2 Length Estimation with Information Filter . . . . . . . . . . . . . 70 4.4.3 Ratio Scheduler for Optimization with Length Constraint . . . . . 76 4.5 LQR Motion Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5.1 Motion Smoothing with LQR Tracking . . . . . . . . . . . . . . . 79 4.5.2 Single-step LQR Motion Smoothing . . . . . . . . . . . . . . . . 81 Chapter 5 Experimental Results and Analysis 83 5.1 Experiment Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.1.1 Camera System Setups . . . . . . . . . . . . . . . . . . . . . . . . 83 5.1.2 Tested Cases of Motions . . . . . . . . . . . . . . . . . . . . . . . 88 5.1.3 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2 Single-stage Estimation Results . . . . . . . . . . . . . . . . . . . . 96 5.2.1 Average Velocity of Single-stage Estimates . . . . . . . . . . . . . 96 5.2.2 Length of Limbs of Single-stage Estimates . . . . . . . . . . . . . 102 5.2.3 Smoothness of Single-stage Estimates . . . . . . . . . . . . . . . . 106 5.3 Two-stage Estimation Results . . . . . . . . . . . . . . . . . . . . . 112 5.3.1 Average Velocity of Two-stage Estimates . . . . . . . . . . . . . . 112 5.3.2 Length of Limbs of Two-stage Estimates . . . . . . . . . . . . . . 117 5.3.3 Smoothness of Two-stage Estimates . . . . . . . . . . . . . . . . . 120 5.4 Estimation Results with Length Constraints . . . . . . . . . . . . . . 126 5.4.1 Estimated Length of The Length IF . . . . . . . . . . . . . . . . . 126 5.4.2 Two-stage Estimation Results with Length Constraints . . . . . . . 128 5.4.3 Ratio Scheduler Results for Length Constraints . . . . . . . . . . . 130 5.5 Estimation Results with LQR Smoothing . . . . . . . . . . . . . . . 135 5.5.1 Smoothness of Estimates after LQR Smoothing . . . . . . . . . . 135 5.5.2 Average Velocity and Calculated Length of Estimates after LQR Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.6 Analysis of Computation Time . . . . . . . . . . . . . . . . . . . . . 148 5.7 Analysis on Experiment Results with Mismatches . . . . . . . . . . . 151 Chapter 6 Conclusions and Future Works 159 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 References 163 Appendix A 3D Triangular Reconstruction 179 A.1 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 | - |
| dc.language.iso | en | - |
| dc.subject | 人體姿態估測 | zh_TW |
| dc.subject | 最佳化 | zh_TW |
| dc.subject | 基於模型的濾波器 | zh_TW |
| dc.subject | Optimization | en |
| dc.subject | Human pose estimation | en |
| dc.subject | Model-based filter | en |
| dc.title | 基於最佳化與訊息濾波器之兩階段式三維人體姿態估測系統 | zh_TW |
| dc.title | A Two-stage Vision-based Multi-view 3D Human Pose Estimation System with Optimization and Information Filter | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 李後燦;黃正民;江明理 | zh_TW |
| dc.contributor.oralexamcommittee | Hou-Tsan Li;Cheng-Ming Huang;Ming-Li Chiang | en |
| dc.subject.keyword | 人體姿態估測,基於模型的濾波器,最佳化, | zh_TW |
| dc.subject.keyword | Human pose estimation,Model-based filter,Optimization, | en |
| dc.relation.page | 181 | - |
| dc.identifier.doi | 10.6342/NTU202403639 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2024-08-12 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 21.52 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
