Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96330
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真zh_TW
dc.contributor.advisorJane Yung-jen Hsuen
dc.contributor.author呂兆凱zh_TW
dc.contributor.authorZhao-Kai Luen
dc.date.accessioned2024-12-24T16:23:09Z-
dc.date.available2024-12-25-
dc.date.copyright2024-12-24-
dc.date.issued2024-
dc.date.submitted2024-11-25-
dc.identifier.citation[1] Thi Thi Zin, Ye Htet, Yuya Akagi, Hiroki Tamura, Kazuhiro Kondo, Sanae Araki, and Etsuo Chosa. Real-time action recognition system for elderly people using stereo depth camera. Sensors, 21(17):5895, 2021.
[2] Han Sun and Yu Chen. Real-time elderly monitoring for senior safety by lightweight human action recognition. In 2022 IEEE 16th International Symposium on Medical Information and Communication Technology (ISMICT), pages 1–6. IEEE, 2022.
[3] Sharath Chandra Akkaladevi and Christoph Heindl. Action recognition for human robot interaction in industrial applications. In 2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS), pages 94–99. IEEE, 2015.
[4] Yan Jin, Mengke Li, Yang Lu, Yiu-ming Cheung, and Hanzi Wang. Longtailed visual recognition via self-heterogeneous integration with knowledge excavation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23695–23704, 2023.
[5] Ahmet Iscen, André Araujo, Boqing Gong, and Cordelia Schmid. Class-balanced distillation for long-tailed visual recognition. arXiv preprint arXiv:2104.05279, 2021.
[6] Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, and Larry S Davis. Videolt: Large-scale long-tailed video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7960–7969, 2021.
[7] Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, and Dima Damen. Use your head: Improving long-tail video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2415– 2425, 2023.
[8] Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, and Yujun Shen. Balancing logit variation for long-tailed semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19561–19573, 2023.
[9] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
[10] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
[11] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014.
[12] Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
[13] Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
[14] Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12026– 12035, 2019.
[15] Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. Revisiting skeletonbased action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2969–2978, 2022.
[16] T Lin. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002, 2017.
[17] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
[18] Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
[19] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
[20] Mengke Li, HU Zhikai, Yang Lu, Weichao Lan, Yiu-ming Cheung, and Hui Huang. Feature fusion from head to tail for long-tailed visual recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 13581–13589, 2024.
[21] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
[22] Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning, pages 6438–6447. PMLR, 2019.
[23] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
[24] AFM Uddin, Mst Monira, Wheemyung Shin, TaeChoong Chung, Sung-Ho Bae, et al. Saliencymix: A saliency guided data augmentation strategy for better regularization. arXiv preprint arXiv:2006.01791, 2020.
[25] Hanchao Liu, Yuhe Liu, Tai-Jiang Mu, Xiaolei Huang, and Shi-Min Hu. Skeletoncutmix: Mixing up skeleton with probabilistic bone exchange for supervised domain adaptation. IEEE Transactions on Image Processing, 2023.
[26] Kailin Xu, Fanfan Ye, Qiaoyong Zhong, and Di Xie. Topology-aware convolutional neural network for efficient skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2866–2874, 2022.
[27] Zhan Chen, Hong Liu, Tianyu Guo, Zhengyan Chen, Pinhao Song, and Hao Tang. Contrastive learning from spatio-temporal mixed skeleton sequences for selfsupervised skeleton-based action recognition. arXiv preprint arXiv:2207.03065, 2022.
[28] Hongda Liu, Yunlong Wang, Min Ren, Junxing Hu, Zhengquan Luo, Guangqi Hou, and Zhenan Sun. Balanced representation learning for long-tailed skeleton-based action recognition. arXiv preprint arXiv:2308.14024, 2023.
[29] Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016.
[30] Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C Kot. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE transactions on pattern analysis and machine intelligence, 42(10):2684–2701, 2019.
[31] Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song-Chun Zhu. Cross-view action modeling, learning and recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2649–2656, 2014.
[32] Haodong Duan, Jiaqi Wang, Kai Chen, and Dahua Lin. Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7351–7354, 2022.
[33] Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 183–192, 2020.
[34] Hyung-gun Chi, Myoung Hoon Ha, Seunggeun Chi, Sang Wan Lee, Qixing Huang, and Karthik Ramani. Infogcn: Representation learning for human skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20186–20196, 2022.
[35] Huanyu Zhou, Qingjie Liu, and Yunhong Wang. Learning discriminative representations for skeleton based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10608–10617, 2023.
[36] Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 143–152, 2020.
[37] Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, and Weiming Hu. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 13359–13368, 2021.
[38] Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5375–5384, 2016.
[39] Jason Van Hulse, Taghi M Khoshgoftaar, and Amri Napolitano. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning, pages 935–942, 2007.
[40] Hsin-Ping Chou, Shih-Chieh Chang, Jia-Yu Pan, Wei Wei, and Da-Cheng Juan. Remix: rebalanced mixup. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 95–110. Springer, 2020.
[41] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.
[42] Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11662–11671, 2020.
[43] Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
[44] Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, and Shan Luo. Long-tailed instance segmentation using gumbel optimized loss. In European Conference on Computer Vision, pages 353–369. Springer, 2022.
[45] Mengke Li, Yiu-Ming Cheung, and Zhikai Hu. Key point sensitive loss for longtailed visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4812–4825, 2022.
[46] Shaoli Huang, Xinchao Wang, and Dacheng Tao. Snapmix: Semantically proportional mixing for augmenting fine-grained data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1628–1636, 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96330-
dc.description.abstract基於骨架資料的動作辨識由於其計算效率高以及在動態環境中的穩定表現,近年來逐漸受到關注。然而,現有大部分方法仍依賴於平衡的資料集,即每個類別都擁有相近數量的樣本,卻忽視了現實生活中的資料分佈往往是不均衡的現象,這樣的長尾分佈問題會顯著降低模型的效能。本論文針對長尾分佈的骨架動作辨識提出了新的解決方案,旨在強化代表性不足的樣本類別的表現,從而克服資料不平衡導致的模型偏差問題。
為此,我們提出了一種結合動態混合技術與新型軟標籤計算的資料增強方法,專注於提升尾部類別的辨識準確率。我們的方法包含一個專為少數類別設計的選擇策略,並透過新型軟標籤計算,確保混合樣本與其對應混合標籤之間的關聯性,進一步提高增強資料的品質。
我們在長尾的 NTU RGB+D 資料集、N-UCLA 資料集,以及我們實驗室收集的長尾 AIMS 嬰兒動作資料集上進行了實驗,結果顯示我們的方法在整體動作辨識表現上優於當前最先進的方法,特別是在尾部類別的辨識上取得了顯著的提升,展現了其應對現實世界資料集的潛力。
zh_TW
dc.description.abstractSkeleton-based action recognition has gained increasing attention in recent years due to its computational efficiency and robust performance in dynamic environments. However, most existing methods still rely on balanced datasets, where each class has a similar number of samples, ignoring the fact that data in real-world scenarios is often imbalanced. This long-tailed distribution significantly reduces the model’s effectiveness. In this paper, we propose a novel solution to address the challenge of long-tailed skeleton-based action recognition by enhancing the performance of underrepresented classes, thereby mitigating the bias introduced by data imbalance.
To this end, we introduce a data augmentation approach that combines dynamic mixup techniques with a new soft label calculation method, specifically focused on improving the recognition accuracy of tailed classes. Our method includes a selection strategy designed for minority classes, and the soft label calculation ensures the correlation between mixed samples and their corresponding mixed labels, further enhancing the quality of the augmented data.
We conducted experiments on the long-tailed NTU RGB+D dataset, N-UCLA dataset, and the long-tailed AIMS infant action dataset collected in our lab. The results demonstrate that our method outperforms state-of-the-art approaches in overall action recognition performance, with particularly significant improvements in the recognition of tailed classes, demonstrating its potential in tackling real-world datasets.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-12-24T16:23:08Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-12-24T16:23:09Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents v
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Related Work 5
2.1 Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 RGB-Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Optical Flow-Based . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Skeleton-Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Long-tailed Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Re-weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Classifier re-training (cRT) . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Mixup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Advanced Variants and Implementations . . . . . . . . . . . . . . . 9
2.4 Advances in Skeleton-Based Action Recognition through Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3 Problem Definition 12
Chapter 4 Methodology 17
4.1 The limitations of current long-tailed skeleton-based action recognition method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Class Weight Calculation . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2 Sample Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.3 Performance Weight . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.4 Class Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.5 Updating Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Soft Label Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 Contribution of Body Parts to The Action . . . . . . . . . . . . . . 24
4.3.3 Soft Label Calculation . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Selecting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.2 Selecting Strategy Steps . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 5 Experiments 30
5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 Datasets and scenarios . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.2 Evaluation Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 Competitor Methods . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.1 Baseline Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4.1 Effectiveness of two components . . . . . . . . . . . . . . . . . . . 43
5.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5.1 Sensitivity of Weighting Factor α . . . . . . . . . . . . . . . . . . . 46
5.5.2 Improvement of Tailed Classes . . . . . . . . . . . . . . . . . . . . 47
5.5.3 Feature Alignment and Class Similarity Evaluation . . . . . . . . . 48
5.5.4 Analysis of Mixup and Our Method Compared to Vanilla . . . . . . 49
5.5.5 Performance Analysis of Models on Challenging Classes . . . . . . 50
Chapter 6 Conclusion 52
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . 53
References 55
-
dc.language.isoen-
dc.subject動作識別zh_TW
dc.subject骨架資料zh_TW
dc.subject資料增強zh_TW
dc.subject長尾學習zh_TW
dc.subject資料混合zh_TW
dc.subjectLong-tailed Learningen
dc.subjectMixupen
dc.subjectData Augmentationen
dc.subjectSkeleton Dataen
dc.subjectAction Recognitionen
dc.title使用顯著軟標籤的動態混合以應對長尾骨架動作識別zh_TW
dc.titleDynamic Mixup with Salient Soft Labels for Long-tailed Skeleton-based Action Recognitionen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.coadvisor傅立成zh_TW
dc.contributor.coadvisorLi-Chen Fuen
dc.contributor.oralexamcommittee鄭素芳;郭彥伶zh_TW
dc.contributor.oralexamcommitteeSuh-Fang Jeng;Yen-Ling Kuoen
dc.subject.keyword動作識別,長尾學習,骨架資料,資料增強,資料混合,zh_TW
dc.subject.keywordAction Recognition,Long-tailed Learning,Skeleton Data,Data Augmentation,Mixup,en
dc.relation.page62-
dc.identifier.doi10.6342/NTU202404430-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-11-26-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2029-11-25-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf
  此日期後於網路公開 2029-11-25
3.18 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved