去除動作預測中的混雜因素

Yan-Ru Wang; 王彥茹

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80744

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民(Winston H. Hsu)
dc.contributor.author	Yan-Ru Wang	en
dc.contributor.author	王彥茹	zh_TW
dc.date.accessioned	2022-11-24T03:14:58Z	-
dc.date.available	2021-11-06
dc.date.available	2022-11-24T03:14:58Z	-
dc.date.copyright	2021-11-06
dc.date.issued	2021
dc.date.submitted	2021-10-14
dc.identifier.citation	S. Agethen, H.C. Lee, and W. H. Hsu. Anticipation of human actions with posebased fine-grained representations. 2019 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops(CVPRW) J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. pages 4724–4733, 07 2017. A. Chadha, G. Arora, and N. Kaloty. iPerceive: Applying common-sense reasoning to multimodal dense video captioning and video question answering. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–13, 2021. G. Chen, J. Li, J. Lu, and J. Zhou. Human trajectory prediction via counterfactual analysis. In ICCV, 2021. D. Epstein, B. Chen, and C. Vondrick. Oops! predicting unintentional action in video. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. Y. A. Farha and J. Gall. Uncertainty-aware anticipation of activities. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 1197–1204, 2019. Y. A. Farha, A. Richard, and J. Gall. When will you do what? anticipating temporal occurrences of activities. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5343–5352, 2018. C. Feichtenhofer, H. Fan, J. Malik, and K. He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. A. Furnari, S. Battiato, and G. Maria Farinella. Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, September 2018. A. Furnari and G. M. Farinella. What would you expect? anticipating egocentric actions with rollingunrolling lstms and modality attention. In International Conference on Computer Vision (ICCV), 2019. H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Forecasting future action sequences with neural memory networks. BMVC, 2019. H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Predicting the future: A jointly learnt model for action anticipation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. J. Gao, Z. Yang, and R. Nevatia. Red: Reinforced encoder-decoder networks for action anticipation. BMVC, 07 2017. R. Girdhar and K. Grauman. Anticipative Video Transformer. In ICCV, 2021. Q. Ke, M. Fritz, and B. Schiele. Time-conditioned action anticipation in one shot. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. H. Kuehne, A. B. Arslan, and T. Serre. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In Proceedings of Computer Vision and Pattern Recognition Conference (CVPR), 2014. T. Lan, T.C. Chen, and S. Savarese. A hierarchical representation for future action prediction. In ECCV, pages 689–704, 2014. C. Li, S. H. Chan, and Y.T. Chen. Who make drivers stop? towards driver-centric risk assessment: Risk object identification via causal inference. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10711–10718, 2020. M. Liu, S. Tang, Y. Li, and J. M. Rehg. Forecasting human-object interaction: Joint prediction of motor attention and actions in first person video. In ECCV, 2020. T. Mahmud, M. Hasan, and A. K. RoyChowdhury. Joint prediction of activity labels and starting times in untrimmed videos. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5784–5793, 2017. A. Miech, I. Laptev, J. Sivic, H. Wang, L. Torresani, and D. Tran. Leveraging the present to anticipate the future in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. R. Morais, L. Vương, T. Tran, and S. Venkatesh. Learning to abstract and predict human actions. BMVC, 2020. G. Nan, R. Qiao, Y. Xiao, J. Liu, S. Leng, H. Zhang, and W. Lu. Interventional video grounding with dual contrastive learning. In CVPR, 2021. L. Neumann, A. Zisserman, and A. Vedaldi. Future event prediction: If and when. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2935–2943, 2019. Y. Ng and B. Fernando. Forecasting future action sequences with attention: A new approach to weakly supervised action forecasting. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, PP, 09 2020. J. Pearl, M. Glymour, and N. P. Jewell. The book of why: the new science of cause and effect. John Wiley Sons, 2016. J. Pearl and D. Mackenzie. The book of why: the new science of cause and effect. 2018. Basic Books. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017. F. Sener, D. Singhania, and A. Yao. Temporal aggregate representations for long-range video understanding. In European Conference on Computer Vision, pages 154–171. Springer, 2020 F. Sener and A. Yao. Zero-shot anticipation for instructional activities. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 862–871, 2019. C. Sun, A. Shrivastava, C. Vondrick, R. Sukthankar, K. Murphy, and C. Schmid. Relational action forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 2019. D. Surís, R. Liu, and C. Vondrick. Learning the predictability of the future. 2021. C. Vondrick, H. Pirsiavash, and A. Torralba. Anticipating visual representations from unlabeled video. In 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pages 98-106,Los Alamitos,CA,USA,jun2016. IEEE Computer Society. T. Wang, J. Huang, H. Zhang, and Q. Sun. Visual commonsense rcnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10760–10770, 2020. Y. Wu, L. Zhu, X. Wang, Y. Yang, and F. Wu. Learning to anticipate egocentric actions by imagination. IEEE Transactions on Image Processing, 30:1143–1152, 01 2021. K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. X. Yang, F. Feng, W. Ji, M. Wang, and T.S. Chua. Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, 2021.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80744	-
dc.description.abstract	"動作預測是希望模型能夠根據看到的一段影片去推斷即將發生的未來動作, 這對於許多智能應用是很重要的能力,例如：自動駕駛, 輔助型機器人。現階段的作法多利用動作識別的模型所提取出來的資訊來作為設計動作預測模型的基礎。然而，我們發現當我們單純利用動作識別的模型來學習動作預測的問題時,模型會有過於單純仰賴畫面中正在進行的動作來判斷,忽略畫面中其他重要資訊好比畫面中有哪些物件。基於Judea Pearl所提的因果理論，這樣單純依靠被動觀察輸入與輸出的關聯性來推導兩者之間的因果關係是會受到混雜因子的誤導。我們藉由主動干預模型原先的學習模式，讓模型在做出預判之前，必須先考慮每種動作發生的可能性藉此來降低它過於依靠畫面中動作而不去觀察影片中其他資訊的問題。實驗結果顯示，我們所提出的comprehenser有助於消弭上述所提到的問題，並且可應用於不同的動作識別模型架構之上，皆獲得更卓越的性能。"	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-24T03:14:58Z (GMT). No. of bitstreams: 1 U0001-1410202111421700.pdf: 2655556 bytes, checksum: 36a85a87de3d7cd244c133186c046bf4 (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Action Anticipation 5 2.2 Causality in Vision 6 Chapter 3 Approach 7 3.1 Problem Definition 7 3.2 Causal Intervention 8 3.3 Comprehenser Module 10 Chapter 4 Experiments 12 4.1 Dataset 12 4.2 Implementation Details 12 Chapter 5 Results 14 5.1 Quantitative Analysis 14 5.2 Qualitative Analysis 15 Chapter 6 Conclusion 17 References 18 Appendix A — Breakfast Full Results 23
dc.language.iso	en
dc.subject	因果推斷	zh_TW
dc.subject	動作預測	zh_TW
dc.subject	Causal Intervention	en
dc.subject	Action Anticipation	en
dc.title	去除動作預測中的混雜因素	zh_TW
dc.title	Deconfounded Action Anticipation	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳文進(Hsin-Tsai Liu),葉梅珍(Chih-Yang Tseng),陳奕廷,余能豪
dc.subject.keyword	動作預測,因果推斷,	zh_TW
dc.subject.keyword	Action Anticipation,Causal Intervention,	en
dc.relation.page	24
dc.identifier.doi	10.6342/NTU202103717
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2021-10-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-1410202111421700.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	2.59 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。