請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96069完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 許永真 | zh_TW |
| dc.contributor.advisor | Yung-Jen Hsu | en |
| dc.contributor.author | 蕭昀豪 | zh_TW |
| dc.contributor.author | Yun-Hao Hsiao | en |
| dc.date.accessioned | 2024-10-11T16:05:29Z | - |
| dc.date.available | 2024-10-12 | - |
| dc.date.copyright | 2024-10-11 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-10-04 | - |
| dc.identifier.citation | [1] D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785, 2019.
[2] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019. [3] Y. Ding, J. Liang, B. Jiang, A. Zheng, and R. He. Maps: A noise-robust progressive learning approach for source-free domain adaptive keypoint detection. arXiv preprint arXiv:2302.04589, 2023. [4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky. Domain-adversarial training of neural networks. Journal of machine learning research, 17(59):1–35, 2016. [5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [6] X. Huang and S. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017. [7] J. Jiang, Y. Ji, X. Wang, Y. Liu, J. Wang, and M. Long. Regressive domain adaptation for unsupervised keypoint detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6780–6789, 2021. [8] R. Jin, J. Zhang, J. Yang, and D. Tao. Multibranch adversarial regression for domain adaptative hand pose estimation. IEEE Transactions on Circuits and Systems for Video Technology, 32(9):6125–6136, 2022. [9] D. Kim, K. Wang, K. Saenko, M. Betke, and S. Sclaroff. A unified framework for domain adaptive pose estimation. In Proceedings of European Conference on Computer Vision (ECCV), pages 603–620. Springer, 2022. [10] S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016. [11] C. Li and G. H. Lee. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1482–1491, 2021. [12] J. Li, S. Bian, A. Zeng, C. Wang, B. Pang, W. Liu, and C. Lu. Human pose regression with residual log-likelihood estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11025–11034, 2021. [13] Y. Liu, Y. Tian, Y. Chen, F. Liu, V. Belagiannis, and G. Carneiro. Perturbed and strict mean teachers for semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4258–4267, 2022. [14] M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In International conference on machine learning, pages 97–105. PMLR, 2015. [15] W. Mao, Y. Ge, C. Shen, Z. Tian, X. Wang, Z. Wang, and A. v. den Hengel. Poseur: Direct human pose regression with transformers. In Proceedings of European Conference on Computer Vision (ECCV), pages 72–88. Springer, 2022. [16] T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018. [17] J. Mu, W. Qiu, G. D. Hager, and A. L. Yuille. Learning from synthetic animals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12386–12395, 2020. [18] A. Peláez-Vegas, P. Mesejo, and J. Luengo. A survey on semi-supervised semantic segmentation. arXiv preprint arXiv:2302.09899, 2023. [19] Q. Peng, C. Zheng, and C. Chen. Source-free domain adaptive human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4826–4836, 2023. [20] M. V. Pusic, K. Boutis, R. Hatala, and D. A. Cook. Learning curves in health professions education. Academic Medicine, 90(8):1034–1042, 2015. [21] M. V. Pusic, K. Boutis, S. A. Santen, and W. Cutrer. How does master adaptive learning ensure optimal pathways to clinical expertise? The Master Adaptive Learner, page 174, 2019. [22] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2107–2116, 2017. [23] K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596–608, 2020. [24] J. Stenum, K. M. Cherry-Allen, C. O. Pyles, R. D. Reetzke, M. F. Vignos, and R. T. Roemmich. Applications of pose estimation in human health and performance across the lifespan. Sensors, 21(21):7315, 2021. [25] B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation. CoRR, abs/1511.05547, 2015. [26] B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 443–450. Springer, 2016. [27] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang. High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019. [28] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017. [29] B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 466–481, 2018. [30] Y. Xu, J. Zhang, Q. Zhang, and D. Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35:38571–38584, 2022. [31] Y. Xu, J. Zhang, Q. Zhang, and D. Tao. Vitpose+: Vision transformer foundation model for generic body pose estimation. arXiv preprint arXiv:2212.04246, 2022. [32] H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, and W. Zuo. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2272–2281, 2017. [33] L. Yang, L. Qi, L. Feng, W. Zhang, and Y. Shi. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7236–7246, 2023. [34] S. Yang, Z. Quan, M. Nie, and W. Yang. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11802–11812, 2021. [35] Y. Yuan, R. Fu, L. Huang, W. Lin, C. Zhang, X. Chen, and J. Wang. Hrformer: High-resolution vision transformer for dense predict. Advances in neural information processing systems, 34:7281–7293, 2021. [36] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017. [37] Y. Zhang, H. Zhang, B. Deng, S. Li, K. Jia, and L. Zhang. Semi-supervised models are strong unsupervised domain adaptation learners, 2021. [38] Z. Zhao, T. Wang, S. Xia, and Y. Wang. Hand-3d-studio: A new multi-view system for 3d hand reconstruction. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2478–2482. IEEE, 2020. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96069 | - |
| dc.description.abstract | 近幾年,領域自適應姿態估計 (DAPE) 受到越來越多的關注。在解決該任務的方法中,半監督式學習 (SSL) 的方法因為與無監督領域自適應 (UDA) 有相似的目標而被廣泛使用。然而,雖然 SSL 和 UDA 同樣都旨在用未標記的數據增強已在標記數據上訓練的模型,兩個任務對於資料分布的不同期望依然讓 SSL 方法無法完美地契合於 UDA 中。有鑑於此,我們提出了 Phase2Phase,這一策略整合了三種方法:Adaptive Mean Teacher、T-VAT-based UniMatch,以及 Mixup Augmentation。
傳統的 Mean Teacher 方法通常採用較大的平滑係數,但在 UDA 任務中,較大的平滑細數容易阻礙教師模型迅速達到學生模型的性能。為了克服這一問題,我們提出 Adaptive Mean Teacher。藉由在領域自適應初期加入緩衝階段,並在該緩衝階段使用較低的平滑係數,我們使教師模型在保持 Mean Teacher 穩定性特點的同時,使其在初期就能快速追上學生模型的表現。 Mean Teacher 已經被證實在 UDA 的情境下有效,而除了如 Mean Teacher 這樣以模型為中心出發,用不同模型生成兩個輸出以進行一致性調節的方法,SSL 中同樣存在像 FixMatch 和 UniMatch 這樣在輸入和特徵層面引入干擾項的方法。儘管這些策略理論上能相輔相成,但實驗卻顯示生硬地結合兩個方法會導致效能下降。為此,我們提出的 Phase2Phase 在採用傳統 Mean Teacher 的訓練階段後又引入了一個額外階段,在該階段利用 T-VAT-based UniMatch 讓模型校能進一步提升。最後,我們還在訓練全程都引入 Mixup 技術,以提升模型的穩健性和整體性能。 實驗結果顯示,我們提出的方法確實有益於提升 DAPE 的表現。值得注意的是,Phase2Phase 超越了能使用標記資料的其他 DAPE 方法,同時他的表現也逼近當前最先進、在不使用標記資料的情況下依然表現優異 DAPE 方法,SFDAHPE。這凸顯了 Phase2Phase 在實際情境中的實用性。 | zh_TW |
| dc.description.abstract | Domain Adaptive Pose Estimation (DAPE) has received increasing attention recently. Many methods commonly used in semi-supervised learning (SSL) tasks are now applied to DAPE tasks because SSL shares a similar goal with unsupervised domain adaptation (UDA). Although both SSL and UDA aim to enhance a model trained on labeled data with unlabeled data, the differences in data distribution expectations present a challenge, preventing these SSL methods from seamlessly adapting to DAPE. With an awareness of inconsistent expectations in the data, we introduce Phase2Phase, a novel approach that integrates three core strategies: Adaptive Mean Teacher, T-VAT-based UniMatch, and Mixup augmentation.
In traditional Mean Teacher methods, a significant smoothing coefficient is typically employed. However, in UDA tasks, a large smoothing coefficient can hinder the teacher model from quickly achieving the performance of the student model. To overcome this issue, we propose the Adaptive Mean Teacher. By introducing a ramp-up phase during the initial stages of domain adaptation, where a reduced smoothing coefficient is applied, we enable the teacher model to rapidly align with the performance of the student model, while preserving the stability inherent in the traditional Mean Teacher framework. Mean Teacher has been proven effective in UDA, and in addition to model-centered approaches like Mean Teacher, which generate two outputs from different models for consistency regularization, SSL also includes methods like FixMatch and UniMatch that introduce disturbances at the input and feature levels. Although these strategies theoretically complement each other, their direct combination can lead to degraded effectiveness. Our thesis introduces an additional phase for T-VAT-based UniMatch to facilitate integration. Finally, we incorporate Mixup augmentation to boost the model's robustness, further elevating the overall performance. The experimental results show that all the proposed methods enhance the performance of DAPE. Notably, Phase2Phase surpasses previous source-dependent DAPE approaches and achieves comparable results to the current state of the art in source-free DAPE, SFDAHPE. This underscores Phase2Phase’s practical effectiveness in the real-world scenarios. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-10-11T16:05:29Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-10-11T16:05:29Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee I
Acknowledgements III 摘要 V Abstract VII Contents IX List of Figures XIII List of Tables XV List of Algorithms XVII Symbols XIX Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Thesis Organization 3 Chapter 2 Literature Review 5 2.1 2D Pose Estimation 5 2.2 Unsupervised Domain Adaptation (UDA) and Semi-Supervised learning (SSL) 6 2.2.1 Methods with Greater Specificity to UDA 7 2.2.2 Methods shared between UDA and SSL 8 2.3 Consistency Regularization 10 2.4 Domain Adaptive Pose Estimation 11 Chapter 3 Methodology 15 3.1 Problem Statement 15 3.2 Preliminaries 16 3.3 Pipeline of Our Methods 17 3.3.1 Adaptive Mean Teacher 17 3.3.2 Two-Phase Training Framework 18 3.3.3 T-VAT-based UniMatch 22 3.3.4 Mixup Augmentation 26 3.4 Algorithm 27 Chapter 4 Experiment 31 4.1 Dataset 31 4.1.1 Rendered Hand Pose (RHD) 32 4.1.2 Hand-3D-Studio (H3D) 32 4.2 Evaluation Metrics 33 4.3 Experiments Setup 34 4.4 Evaluation and Results 35 4.4.1 Quantitative Results 35 4.4.2 Qualitative Results 36 4.5 Ablation Study on Framework 36 4.6 Sesensitiveitive Analysis 38 4.6.1 Sensitive Analysis on the Adaptive Mean Teacher 38 4.6.2 Sensitive Analysis on the Mixup Augmentation 41 4.6.3 Sensitive Analysis of the Augmentation in ABEP 42 Chapter 5 Conclusion 43 5.1 Contribution 44 5.1.1 Addressing Slow Update Challenges in Traditional Mean Teacher Framework with Adaptive Mean Teacher 44 5.1.2 Integrating Weak-to-Strong Augmentation-Based Consistency Regularization into the Mean Teacher Framework 44 5.2 Limitation and Future Work 45 5.2.1 Probing into the Underperformance of the Teacher Model Relative to the Student Model 45 5.2.2 Expanding Experiments Across Various Tasks 45 5.2.3 Exploring the Practicality in Source-Free DAPE 46 References 47 Appendix A — Analogous Explanation of the Sudden Learning Acceleration at the Start of Domain Adaptation Using the Human Learning Curve 53 | - |
| dc.language.iso | en | - |
| dc.title | 循序漸進:用分階段訓練提升半監督學習方法在領域自適應姿態估計中的適用性 | zh_TW |
| dc.title | Phase2Phase: Enhancing the Applicability of Semi-Supervised Learning Methods in Domain Adaptive Pose Estimation through Phased Training | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.coadvisor | 鄭文皇 | zh_TW |
| dc.contributor.coadvisor | Wen-Huang Cheng | en |
| dc.contributor.oralexamcommittee | 陳駿丞;楊智淵 | zh_TW |
| dc.contributor.oralexamcommittee | Jun-Cheng Chen;Chih-Yuan Yang | en |
| dc.subject.keyword | 領域自適應,姿態估計,關鍵點檢測,半監督式學習, | zh_TW |
| dc.subject.keyword | Domain Adaptation,Pose Estimation,Keypoint Detection,Semi-supervised Learning, | en |
| dc.relation.page | 54 | - |
| dc.identifier.doi | 10.6342/NTU202404439 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2024-10-04 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| dc.date.embargo-lift | 2025-03-06 | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf | 5.24 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
