Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81225
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Yung-Chen HSU)
dc.contributor.authorNien-Tse Linen
dc.contributor.author林念澤zh_TW
dc.date.accessioned2022-11-24T03:37:14Z-
dc.date.available2021-08-11
dc.date.available2022-11-24T03:37:14Z-
dc.date.copyright2021-08-11
dc.date.issued2021
dc.date.submitted2021-08-01
dc.identifier.citationD. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459, 2018. A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “Ntu rgb+ d: A large scale dataset for 3d human activity analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019, 2016. S. Herath, M. Harandi, and F. Porikli, “Going deeper into action recognition: A survey,” Image and vision computing, vol. 60, pp. 4–21, 2017. A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257–267, 2001. I. Laptev, “On space-time interest points,” International journal of computer vision, vol. 64, no. 2-3, pp. 107–123, 2005. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950, 2017. G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta,“Hollywood in homes: Crowdsourcing data collection for activity understanding,” in European Conference on Computer Vision, pp. 510–526, Springer, 2016. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152, 2020. J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308, 2017. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” arXiv preprint arXiv:1406.2199, 2014. C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933–1941, 2016. C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal multiplier networks for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4768–4777, 2017. S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2012. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. K. Hara, H. Kataoka, and Y. Satoh, “Learning spatio-temporal features with 3d residual networks for action recognition,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3154–3160, 2017. Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representation with pseudo-3d residual networks,” in proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541, 2017. L. Sun, K. Jia, D.-Y. Yeung, and B. E. Shi, “Human action recognition using factorized spatio-temporal convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4597–4605, 2015. Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia, vol. 19, no. 2, pp. 4–10, 2012. A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European conference on computer vision, pp. 483–499, Springer, 2016. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: realtime multi-person 2d pose estimation using part affinity fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 172–186, 2019. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, 2019. B. Ren, M. Liu, R. Ding, and H. Liu, “A survey on 3d skeleton-based action recognition using learning method,” arXiv preprint arXiv:2002.05907, 2020. D. Wu and L. Shao, “Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 724–731, 2014. H. Wang and L. Wang, “Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 499–508, 2017. Z. Ding, P. Wang, P. O. Ogunbona, and W. Li, “Investigation of different skeleton features for cnn-based 3d action recognition,” in 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 617–622, IEEE, 2017. P. Wang, Z. Li, Y. Hou, and W. Li, “Action recognition based on joint trajectory maps using convolutional neural networks,” in Proceedings of the 24th ACM international conference on Multimedia, pp. 102–106, 2016. C. Zheng, W. Wu, T. Yang, S. Zhu, C. Chen, R. Liu, J. Shen, N. Kehtarnavaz, and M. Shah, “Deep learning-based human pose estimation: A survey,” arXiv preprint arXiv:2012.13392, 2020. A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1653–1660, 2014. F. Zhang, X. Zhu, and M. Ye, “Fast human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526, 2019. B. Artacho and A. Savakis, “Unipose: Unified human pose estimation in single images and videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7035–7044, 2020. G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911, 2017. X. Nie, J. Feng, J. Zhang, and S. Yan, “Single-stage multi-person pose machines,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6951–6960, 2019. M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”arXiv preprint arXiv:2009.09796, 2020. D. C. Luvizon, D. Picard, and H. Tabia, “2d/3d pose estimation and action recognition using multitask deep learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146, 2018. J. Y.-H. Ng, J. Choi, J. Neumann, and L. S. Davis, “Actionflownet: Learning motion representation for action recognition,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1616–1624, IEEE, 2018. J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3150–3158, 2016. D. Xu, W. Ouyang, X. Wang, and N. Sebe, “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 675–684, 2018. Z. Yao, Y. Wang, M. Long, J. Wang, S. Y. Philip, and J. Sun, “Multi-task learning of generalizable representations for video action recognition,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2020. D. Luvizon, D. Picard, and H. Tabia, “Multi-task deep learning for real-time 3d human pose estimation and action recognition,” IEEE transactions on pattern analysis and machine intelligence, 2020. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high performance deep learning library,” arXiv preprint arXiv:1912.01703, 2019. H. R. V. Joze, A. Shaban, M. L. Iuzzolino, and K. Koishida, “Mmtm: multi-modal transfer module for cnn fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13289–13299, 2020. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015. Z. Zhang, Z. Lv, C. Gan, and Q. Zhu, “Human action recognition using convolutional lstm and fully-connected lstm with different attentions,” Neurocomputing, vol. 410, pp. 304–316, 2020.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81225-
dc.description.abstract動作識別在影片內容標注、監控系統與人機互動方面都有著廣泛的應用。隨著深度學習的演進,基於三維卷積網路(3D ConvNet)的演算法已經能夠在大型資料集上取得相當高的準確度。然而,這類演算法十分仰賴大量的訓練資料來歸納出複雜的分類規則,隨著資料集的縮小,其準確度與穩定度皆會受到嚴重的影響。我們認為在典型的訓練方式中,僅以動作類別標籤監督模型的學習無法有效引導三維卷積網路抽取利於動作識別的特徵。為解決這個問題,本研究提出一個多任務學習的框架與一個輔助任務,輔助任務的內容是學習人體移動的區域並藉此強化對模型特徵抽取的監督。在 NTU-RGB+D60 資料集上進行的實驗表明,所提出的多任務學習方法可以成功應用在3種不同的三維卷積網路中,並且可以在不增加額外推理時間(inference time)的情況下大幅增進三維卷積網路在小型資料集上的準確度。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:37:14Z (GMT). No. of bitstreams: 1
U0001-2907202113081200.pdf: 7785344 bytes, checksum: dbbfb6feb69a08f7d0b14797a0c5caca (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents1 Introduction...1 1.1 Background...1 1.2 Motivation...2 1.3 Proposed Method..3 1.4 Outline of the Thesis...4 2 Literature Review...5 2.1 Deep Learning for Action Recognition...5 2.1.1 Multi-stream Network Based Method...5 2.1.2 3D Convolutional Network Based Method...6 2.1.3 Skeleton Based Method...7 2.2 Human Pose Estimation...8 2.2.1 2D Single-Person Pose Estimation...8 2.2.2 2D Multi-Person Pose Estimation...8 2.3 Multi-task Learning for Action Recognition...9 3 Problem Definition...11 3.1 Action Recognition...12 3.2 Action Recognition on Small Scale Datasets...13 4 Methodology...14 4.1 Algorithm of Joint Movement Pattern Odentification...15 4.1.1 Full Image Joint Heatmap Evaluation...16 4.1.2 Joint Movement Pattern Identification...19 4.2 The Multi-task Learning Model...22 5 Experiments...25 5.1 Experiment Setup...25 5.1.1 The Dataset...25 5.1.2 Data Augmentation...26 5.1.3 Evaluation Metrics...27 5.2 Training Detail...27 5.3 Experiment Result...27 5.3.1 Effect and Generalization ability to different 3D ConvNets...27 5.3.2 Generalization ability to different size of sub-datasets...36 5.4 Discussion...39 5.4.1 The Weight for Joint Movement Patterns Loss...39 5.4.2 The Stability of 3D ConvNets on Small-scale datasets...40 5.4.3 Failure Cases...41 6 Conclusion...43 6.1 Summary and Contribution...43 6.2 Future Study...44 Bibliography...45
dc.language.isoen
dc.subject動作識別zh_TW
dc.subject小型資料集zh_TW
dc.subject三維卷積網路zh_TW
dc.subject多任務學習zh_TW
dc.subjectSmall-scale Dataseten
dc.subjectAction Recognitionen
dc.subjectMulti-task Learningen
dc.subject3D Convolutional Neural Networken
dc.title利用多任務學習以關節移動圖像引導動作識別zh_TW
dc.titleJoint Movement Pattern Guided Action Recognition Using Multi-task Learningen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王鈺強(Hsin-Tsai Liu),李明穗(Chih-Yang Tseng),鄭素芳,陳維超
dc.subject.keyword動作識別,多任務學習,三維卷積網路,小型資料集,zh_TW
dc.subject.keywordAction Recognition,Multi-task Learning,3D Convolutional Neural Network,Small-scale Dataset,en
dc.relation.page50
dc.identifier.doi10.6342/NTU202101889
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-08-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-2907202113081200.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
7.6 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved