請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81225完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 許永真(Yung-Chen HSU) | |
| dc.contributor.author | Nien-Tse Lin | en |
| dc.contributor.author | 林念澤 | zh_TW |
| dc.date.accessioned | 2022-11-24T03:37:14Z | - |
| dc.date.available | 2021-08-11 | |
| dc.date.available | 2022-11-24T03:37:14Z | - |
| dc.date.copyright | 2021-08-11 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-08-01 | |
| dc.identifier.citation | D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459, 2018. A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “Ntu rgb+ d: A large scale dataset for 3d human activity analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019, 2016. S. Herath, M. Harandi, and F. Porikli, “Going deeper into action recognition: A survey,” Image and vision computing, vol. 60, pp. 4–21, 2017. A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257–267, 2001. I. Laptev, “On space-time interest points,” International journal of computer vision, vol. 64, no. 2-3, pp. 107–123, 2005. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950, 2017. G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta,“Hollywood in homes: Crowdsourcing data collection for activity understanding,” in European Conference on Computer Vision, pp. 510–526, Springer, 2016. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152, 2020. J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308, 2017. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” arXiv preprint arXiv:1406.2199, 2014. C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933–1941, 2016. C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal multiplier networks for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4768–4777, 2017. S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2012. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. K. Hara, H. Kataoka, and Y. Satoh, “Learning spatio-temporal features with 3d residual networks for action recognition,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3154–3160, 2017. Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representation with pseudo-3d residual networks,” in proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541, 2017. L. Sun, K. Jia, D.-Y. Yeung, and B. E. Shi, “Human action recognition using factorized spatio-temporal convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4597–4605, 2015. Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia, vol. 19, no. 2, pp. 4–10, 2012. A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European conference on computer vision, pp. 483–499, Springer, 2016. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: realtime multi-person 2d pose estimation using part affinity fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 172–186, 2019. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, 2019. B. Ren, M. Liu, R. Ding, and H. Liu, “A survey on 3d skeleton-based action recognition using learning method,” arXiv preprint arXiv:2002.05907, 2020. D. Wu and L. Shao, “Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 724–731, 2014. H. Wang and L. Wang, “Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 499–508, 2017. Z. Ding, P. Wang, P. O. Ogunbona, and W. Li, “Investigation of different skeleton features for cnn-based 3d action recognition,” in 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 617–622, IEEE, 2017. P. Wang, Z. Li, Y. Hou, and W. Li, “Action recognition based on joint trajectory maps using convolutional neural networks,” in Proceedings of the 24th ACM international conference on Multimedia, pp. 102–106, 2016. C. Zheng, W. Wu, T. Yang, S. Zhu, C. Chen, R. Liu, J. Shen, N. Kehtarnavaz, and M. Shah, “Deep learning-based human pose estimation: A survey,” arXiv preprint arXiv:2012.13392, 2020. A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1653–1660, 2014. F. Zhang, X. Zhu, and M. Ye, “Fast human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526, 2019. B. Artacho and A. Savakis, “Unipose: Unified human pose estimation in single images and videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7035–7044, 2020. G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911, 2017. X. Nie, J. Feng, J. Zhang, and S. Yan, “Single-stage multi-person pose machines,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6951–6960, 2019. M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”arXiv preprint arXiv:2009.09796, 2020. D. C. Luvizon, D. Picard, and H. Tabia, “2d/3d pose estimation and action recognition using multitask deep learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146, 2018. J. Y.-H. Ng, J. Choi, J. Neumann, and L. S. Davis, “Actionflownet: Learning motion representation for action recognition,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1616–1624, IEEE, 2018. J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3150–3158, 2016. D. Xu, W. Ouyang, X. Wang, and N. Sebe, “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 675–684, 2018. Z. Yao, Y. Wang, M. Long, J. Wang, S. Y. Philip, and J. Sun, “Multi-task learning of generalizable representations for video action recognition,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2020. D. Luvizon, D. Picard, and H. Tabia, “Multi-task deep learning for real-time 3d human pose estimation and action recognition,” IEEE transactions on pattern analysis and machine intelligence, 2020. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high performance deep learning library,” arXiv preprint arXiv:1912.01703, 2019. H. R. V. Joze, A. Shaban, M. L. Iuzzolino, and K. Koishida, “Mmtm: multi-modal transfer module for cnn fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13289–13299, 2020. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015. Z. Zhang, Z. Lv, C. Gan, and Q. Zhu, “Human action recognition using convolutional lstm and fully-connected lstm with different attentions,” Neurocomputing, vol. 410, pp. 304–316, 2020. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81225 | - |
| dc.description.abstract | 動作識別在影片內容標注、監控系統與人機互動方面都有著廣泛的應用。隨著深度學習的演進,基於三維卷積網路(3D ConvNet)的演算法已經能夠在大型資料集上取得相當高的準確度。然而,這類演算法十分仰賴大量的訓練資料來歸納出複雜的分類規則,隨著資料集的縮小,其準確度與穩定度皆會受到嚴重的影響。我們認為在典型的訓練方式中,僅以動作類別標籤監督模型的學習無法有效引導三維卷積網路抽取利於動作識別的特徵。為解決這個問題,本研究提出一個多任務學習的框架與一個輔助任務,輔助任務的內容是學習人體移動的區域並藉此強化對模型特徵抽取的監督。在 NTU-RGB+D60 資料集上進行的實驗表明,所提出的多任務學習方法可以成功應用在3種不同的三維卷積網路中,並且可以在不增加額外推理時間(inference time)的情況下大幅增進三維卷積網路在小型資料集上的準確度。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-24T03:37:14Z (GMT). No. of bitstreams: 1 U0001-2907202113081200.pdf: 7785344 bytes, checksum: dbbfb6feb69a08f7d0b14797a0c5caca (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 1 Introduction...1 1.1 Background...1 1.2 Motivation...2 1.3 Proposed Method..3 1.4 Outline of the Thesis...4 2 Literature Review...5 2.1 Deep Learning for Action Recognition...5 2.1.1 Multi-stream Network Based Method...5 2.1.2 3D Convolutional Network Based Method...6 2.1.3 Skeleton Based Method...7 2.2 Human Pose Estimation...8 2.2.1 2D Single-Person Pose Estimation...8 2.2.2 2D Multi-Person Pose Estimation...8 2.3 Multi-task Learning for Action Recognition...9 3 Problem Definition...11 3.1 Action Recognition...12 3.2 Action Recognition on Small Scale Datasets...13 4 Methodology...14 4.1 Algorithm of Joint Movement Pattern Odentification...15 4.1.1 Full Image Joint Heatmap Evaluation...16 4.1.2 Joint Movement Pattern Identification...19 4.2 The Multi-task Learning Model...22 5 Experiments...25 5.1 Experiment Setup...25 5.1.1 The Dataset...25 5.1.2 Data Augmentation...26 5.1.3 Evaluation Metrics...27 5.2 Training Detail...27 5.3 Experiment Result...27 5.3.1 Effect and Generalization ability to different 3D ConvNets...27 5.3.2 Generalization ability to different size of sub-datasets...36 5.4 Discussion...39 5.4.1 The Weight for Joint Movement Patterns Loss...39 5.4.2 The Stability of 3D ConvNets on Small-scale datasets...40 5.4.3 Failure Cases...41 6 Conclusion...43 6.1 Summary and Contribution...43 6.2 Future Study...44 Bibliography...45 | |
| dc.language.iso | en | |
| dc.subject | 動作識別 | zh_TW |
| dc.subject | 小型資料集 | zh_TW |
| dc.subject | 三維卷積網路 | zh_TW |
| dc.subject | 多任務學習 | zh_TW |
| dc.subject | Small-scale Dataset | en |
| dc.subject | Action Recognition | en |
| dc.subject | Multi-task Learning | en |
| dc.subject | 3D Convolutional Neural Network | en |
| dc.title | 利用多任務學習以關節移動圖像引導動作識別 | zh_TW |
| dc.title | Joint Movement Pattern Guided Action Recognition Using Multi-task Learning | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 王鈺強(Hsin-Tsai Liu),李明穗(Chih-Yang Tseng),鄭素芳,陳維超 | |
| dc.subject.keyword | 動作識別,多任務學習,三維卷積網路,小型資料集, | zh_TW |
| dc.subject.keyword | Action Recognition,Multi-task Learning,3D Convolutional Neural Network,Small-scale Dataset, | en |
| dc.relation.page | 50 | |
| dc.identifier.doi | 10.6342/NTU202101889 | |
| dc.rights.note | 同意授權(限校園內公開) | |
| dc.date.accepted | 2021-08-03 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2907202113081200.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 7.6 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
