嬰幼童危險預測系統

Shao-Fu Lien; 連少甫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15506

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李明穗(Ming-Sui Lee)
dc.contributor.author	Shao-Fu Lien	en
dc.contributor.author	連少甫	zh_TW
dc.date.accessioned	2021-06-07T17:41:22Z	-
dc.date.copyright	2020-07-17
dc.date.issued	2020
dc.date.submitted	2020-07-14
dc.identifier.citation	[1] A. M. Dellinger and J. A. Stevens, “The injury problem among older adults: Mortality, morbidity and costs,” Journal of Safety Research, 2006. [2] K. O. Margie Peden, A. A. H. Joan Ozanne-Smith, A. F. R. Christine Branche, and F. R. and K. Bartolomeos, “World Report on Child Injury Prevention,” World Health Organization, 2008. [3] T. R. Nansel, N. Weaver, M. Donlin, H. Jacobsen, M. W. Kreuter, and B. Simons-Morton, “Baby, Be Safe: the effect of tailored communications for pediatric injury prevention provided in a primary care setting,” Patient education and counseling, vol. 46, no. 3, pp. 175–190, Mar. 2002. [4] M. G. Scheidler, B. L. Shultz, L. Schall, A. Vyas, and E. M. Barksdale, “Falling televisions: The hidden danger for children,” Journal of Pediatric Surgery, 2002. [5] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014. [6] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015. [7] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013. [8] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [9] C. Feichtenhofer, H. Fan, J. Malik, and K. He, “Slowfast networks for video recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2019. [10] C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal multiplier networks for video action recognition,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. [11] I. Saleemi, K. Shafique, and M. Shah, “Probabilistic modeling of scene dynamics for applications in visual surveillance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. [12] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [13] Y. S. Chong and Y. H. Tay, “Abnormal event detection in videos using spatiotemporal autoencoder,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. [14] W. Liu, W. Luo, D. Lian, and S. Gao, “Future Frame Prediction for Anomaly Detection - A New Baseline,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [15] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [16] A. Vemula, K. Muelling, and J. Oh, “Social Attention: Modeling Attention in Human Crowds,” in Proceedings - IEEE International Conference on Robotics and Automation, 2018. [17] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [18] H. Manh and G. Alaghband, “Scene-LSTM: A Model for Human Trajectory Prediction.” 2018. [19] C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating Visual Representations from Unlabeled Video,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 98–106. [20] Y. Kong, S. Gao, B. Sun, and Y. Fu, “Action prediction from videos via memorizing hard-to-predict samples,” in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018. [21] S. Qi, S. Huang, P. Wei, and S. C. Zhu, “Predicting Human Activities Using Stochastic Grammar,” in Proceedings of the IEEE International Conference on Computer Vision, 2017. [22] F. H. Chan, Y. T. Chen, Y. Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. [23] T. Suzuki, H. Kataoka, Y. Aoki, and Y. Satoh, “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [24] A. Jain, A. Singh, H. S. Koppula, S. Soh, and A. Saxena, “Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture,” in Proceedings - IEEE International Conference on Robotics and Automation, 2016. [25] P. Wang, S. Lien, and M. Lee, “A Learning-Based Prediction Model for Baby Accidents,” in 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 629–633. [26] E. Hanna and M. Cardillo, “Faster RCNN,” Biological Conservation, 2013. [27] K. He, X. Zhang, S. Ren, and J. Sun, “ResNet,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [28] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. [29] G. Ning and H. Huang, “LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking,” Proceedings of CVPRW 2020 on Towards Human-Centric Image/Video Synthesis and the 4th Look Into Person (LIP) Challenge, 2020. [30] A. Jain, A. Singh, H. S. Koppula, S. Soh, and A. Saxena, “Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture,” in IEEE International Conference on Robotics and Automation, 2016. [31] F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, “ActivityNet: A large-scale video benchmark for human activity understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015. [32] W. Kay et al., “The Kinetics Human Action Video Dataset,” CoRR, vol. abs/1705.0, 2017. [33] Y. Yoshikawa, J. Lin, and A. Takeuchi, “STAIR Actions: A Video Dataset of Everyday Home Actions,” arXiv preprint arXiv:1804.04326, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15506	-
dc.description.abstract	嬰幼兒的安全照護需要大量且密集的人力介入，對於照顧者來說不但費力而且勞心。大部分的嬰幼兒意外往往是由於照顧者的經驗缺乏或疏失而造成無法彌補的傷害，此外，若是照護員不當管教甚至虐待嬰幼兒，即使已經加裝監視系統，也只能事後透過人工方式調閱還原真相，並無法降低傷害或減輕父母的擔憂。為了能更即時反應嬰幼兒的活動安全狀況，本論文利用電腦視覺技術與結合深度學習，分析嬰幼兒的行為、所在場域中的人物與物件, 對嬰幼兒的活動狀態進行危險值的估計，以即時預測並預防意外事件的發生。我們提出了基於遞迴神經網路的嬰幼兒意外預測模型，稱之為 BabyNet。首先每一幀會進入三個特徵提取模組，取得對應的特徵後，接著透過遞迴神經網路預測出潛在的危險值。為了解決不同樣本中可預測性的不同，本文所提出的自適性損失函數根據影片的光流估計出影片的運動能量，以此作為調整權重的依據。為了實驗及訓練，我們搜集並建立了嬰兒意外的資料集，總共包含 1200 部的影片。根據實驗結果，BabyNet 預測的平均準確率高達 88.6%，而在 AUC 的表現也高達 0.92，除此之外還能在夠在 3.671 秒前成功預測意外發生。	zh_TW
dc.description.abstract	Child injuries are a large and growing worldwide problem. Up to 830,000 infants and toddlers die from accidental injuries every year. In addition to death cases, there are many sequelae due to damages caused by accidents. This caused decreased medical resources and increased the burden of financial pressure on parents. According to the reference, about 90% of accidents happen at home, so prevention strategies are necessary. There are many existing monitoring systems designed specifically for infants and toddlers, but most of the functions are relatively passive. Thus, the ability to reduce accidents is limited. In this paper, we proposed a baby accident prediction model based on recurrent neural networks, which is called BabyNet. First, to extract embedding features, each frame will send into three feature extraction modules: action module, object module, and pose module. The action module extracts two-scale features based on human detection. The object module gives each object an attention value and uses this value to fuse the object's features. The pose module processes the extracted skeleton feature, making it easier for the neural network to learn the relationship between pose and accident. Then, the probability of accidents is decoded by the RNN-based model. To consider the diverse predictability in accidents, the adaptive loss function is proposed, which estimates the instability of the video based on the optical flow. For evaluation and training, we built a dataset, baby video dataset, which contained about 1200 videos. According to the experimental results, BabyNet with adaptive loss function achieved 0.92 and 0.886 in AUC and AP. Also, BabyNet could predict accidents successfully before 3.6 seconds in advance.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T17:41:22Z (GMT). No. of bitstreams: 1 U0001-1307202018115200.pdf: 7812264 bytes, checksum: 6745d699a3d8b293d718dc4162c72d03 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii 1 Introduction 1 1.1 Background and Motivation 1 1.2 Contributions 2 2 Related Work 4 2.1 Action Recognition 4 2.2 Anomaly Detection 5 2.3 Trajectory Prediction 6 2.4 Action Anticipation 8 3 Method 10 3.1 Network Overview 10 3.2 Action Module 11 3.3 Object Module 11 3.4 Pose Module 13 3.5 Adaptive Loss Function 15 4 Experiments 17 4.1 Baby Video Dataset (BVD 2020) 17 4.2 Evaluation Metrics 19 4.3 Performance Comparison 20 4.4 Qualitative results 23 5 Conclusion 25 6 REFERENCE 26
dc.language.iso	en
dc.subject	遞迴神經網路	zh_TW
dc.subject	適性損失函數	zh_TW
dc.subject	人體骨架特徵	zh_TW
dc.subject	意外預測	zh_TW
dc.subject	物件注意力機制	zh_TW
dc.subject	RNN-based model	en
dc.subject	baby video dataset	en
dc.subject	adaptive loss function	en
dc.subject	rich visual feature	en
dc.subject	accident prediction	en
dc.title	嬰幼童危險預測系統	zh_TW
dc.title	BabyNet: An Intelligent Baby Monitoring System using RNN-based Model and Adaptive Loss Function	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	周承復(Cheng-Fu Chou),梁容輝(Rung-Huei Liang)
dc.subject.keyword	意外預測,遞迴神經網路,物件注意力機制,人體骨架特徵,適性損失函數,	zh_TW
dc.subject.keyword	accident prediction,RNN-based model,rich visual feature,baby video dataset,adaptive loss function,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU202001482
dc.rights.note	未授權
dc.date.accepted	2020-07-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1307202018115200.pdf 未授權公開取用	7.63 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。