請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15506完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李明穗(Ming-Sui Lee) | |
| dc.contributor.author | Shao-Fu Lien | en |
| dc.contributor.author | 連少甫 | zh_TW |
| dc.date.accessioned | 2021-06-07T17:41:22Z | - |
| dc.date.copyright | 2020-07-17 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-07-14 | |
| dc.identifier.citation | [1] A. M. Dellinger and J. A. Stevens, “The injury problem among older adults: Mortality, morbidity and costs,” Journal of Safety Research, 2006. [2] K. O. Margie Peden, A. A. H. Joan Ozanne-Smith, A. F. R. Christine Branche, and F. R. and K. Bartolomeos, “World Report on Child Injury Prevention,” World Health Organization, 2008. [3] T. R. Nansel, N. Weaver, M. Donlin, H. Jacobsen, M. W. Kreuter, and B. Simons-Morton, “Baby, Be Safe: the effect of tailored communications for pediatric injury prevention provided in a primary care setting,” Patient education and counseling, vol. 46, no. 3, pp. 175–190, Mar. 2002. [4] M. G. Scheidler, B. L. Shultz, L. Schall, A. Vyas, and E. M. Barksdale, “Falling televisions: The hidden danger for children,” Journal of Pediatric Surgery, 2002. [5] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014. [6] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015. [7] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013. [8] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [9] C. Feichtenhofer, H. Fan, J. Malik, and K. He, “Slowfast networks for video recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2019. [10] C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal multiplier networks for video action recognition,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. [11] I. Saleemi, K. Shafique, and M. Shah, “Probabilistic modeling of scene dynamics for applications in visual surveillance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. [12] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [13] Y. S. Chong and Y. H. Tay, “Abnormal event detection in videos using spatiotemporal autoencoder,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. [14] W. Liu, W. Luo, D. Lian, and S. Gao, “Future Frame Prediction for Anomaly Detection - A New Baseline,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [15] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [16] A. Vemula, K. Muelling, and J. Oh, “Social Attention: Modeling Attention in Human Crowds,” in Proceedings - IEEE International Conference on Robotics and Automation, 2018. [17] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [18] H. Manh and G. Alaghband, “Scene-LSTM: A Model for Human Trajectory Prediction.” 2018. [19] C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating Visual Representations from Unlabeled Video,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 98–106. [20] Y. Kong, S. Gao, B. Sun, and Y. Fu, “Action prediction from videos via memorizing hard-to-predict samples,” in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018. [21] S. Qi, S. Huang, P. Wei, and S. C. Zhu, “Predicting Human Activities Using Stochastic Grammar,” in Proceedings of the IEEE International Conference on Computer Vision, 2017. [22] F. H. Chan, Y. T. Chen, Y. Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. [23] T. Suzuki, H. Kataoka, Y. Aoki, and Y. Satoh, “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [24] A. Jain, A. Singh, H. S. Koppula, S. Soh, and A. Saxena, “Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture,” in Proceedings - IEEE International Conference on Robotics and Automation, 2016. [25] P. Wang, S. Lien, and M. Lee, “A Learning-Based Prediction Model for Baby Accidents,” in 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 629–633. [26] E. Hanna and M. Cardillo, “Faster RCNN,” Biological Conservation, 2013. [27] K. He, X. Zhang, S. Ren, and J. Sun, “ResNet,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [28] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. [29] G. Ning and H. Huang, “LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking,” Proceedings of CVPRW 2020 on Towards Human-Centric Image/Video Synthesis and the 4th Look Into Person (LIP) Challenge, 2020. [30] A. Jain, A. Singh, H. S. Koppula, S. Soh, and A. Saxena, “Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture,” in IEEE International Conference on Robotics and Automation, 2016. [31] F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, “ActivityNet: A large-scale video benchmark for human activity understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015. [32] W. Kay et al., “The Kinetics Human Action Video Dataset,” CoRR, vol. abs/1705.0, 2017. [33] Y. Yoshikawa, J. Lin, and A. Takeuchi, “STAIR Actions: A Video Dataset of Everyday Home Actions,” arXiv preprint arXiv:1804.04326, 2018. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15506 | - |
| dc.description.abstract | 嬰幼兒的安全照護需要大量且密集的人力介入,對於照顧者來說不但費力而 且勞心。大部分的嬰幼兒意外往往是由於照顧者的經驗缺乏或疏失而造成無法彌 補的傷害,此外,若是照護員不當管教甚至虐待嬰幼兒,即使已經加裝監視系統, 也只能事後透過人工方式調閱還原真相,並無法降低傷害或減輕父母的擔憂。為了 能更即時反應嬰幼兒的活動安全狀況,本論文利用電腦視覺技術與結合深度學習, 分析嬰幼兒的行為、所在場域中的人物與物件, 對嬰幼兒的活動狀態進行危險值的 估計,以即時預測並預防意外事件的發生。我們提出了基於遞迴神經網路的嬰幼兒 意外預測模型,稱之為 BabyNet。首先每一幀會進入三個特徵提取模組,取得對應 的特徵後,接著透過遞迴神經網路預測出潛在的危險值。為了解決不同樣本中可預 測性的不同,本文所提出的自適性損失函數根據影片的光流估計出影片的運動能 量,以此作為調整權重的依據。為了實驗及訓練,我們搜集並建立了嬰兒意外的資 料集,總共包含 1200 部的影片。根據實驗結果,BabyNet 預測的平均準確率高達 88.6%,而在 AUC 的表現也高達 0.92,除此之外還能在夠在 3.671 秒前成功預測 意外發生。 | zh_TW |
| dc.description.abstract | Child injuries are a large and growing worldwide problem. Up to 830,000 infants and toddlers die from accidental injuries every year. In addition to death cases, there are many sequelae due to damages caused by accidents. This caused decreased medical resources and increased the burden of financial pressure on parents. According to the reference, about 90% of accidents happen at home, so prevention strategies are necessary. There are many existing monitoring systems designed specifically for infants and toddlers, but most of the functions are relatively passive. Thus, the ability to reduce accidents is limited. In this paper, we proposed a baby accident prediction model based on recurrent neural networks, which is called BabyNet. First, to extract embedding features, each frame will send into three feature extraction modules: action module, object module, and pose module. The action module extracts two-scale features based on human detection. The object module gives each object an attention value and uses this value to fuse the object's features. The pose module processes the extracted skeleton feature, making it easier for the neural network to learn the relationship between pose and accident. Then, the probability of accidents is decoded by the RNN-based model. To consider the diverse predictability in accidents, the adaptive loss function is proposed, which estimates the instability of the video based on the optical flow. For evaluation and training, we built a dataset, baby video dataset, which contained about 1200 videos. According to the experimental results, BabyNet with adaptive loss function achieved 0.92 and 0.886 in AUC and AP. Also, BabyNet could predict accidents successfully before 3.6 seconds in advance. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-07T17:41:22Z (GMT). No. of bitstreams: 1 U0001-1307202018115200.pdf: 7812264 bytes, checksum: 6745d699a3d8b293d718dc4162c72d03 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii 1 Introduction 1 1.1 Background and Motivation 1 1.2 Contributions 2 2 Related Work 4 2.1 Action Recognition 4 2.2 Anomaly Detection 5 2.3 Trajectory Prediction 6 2.4 Action Anticipation 8 3 Method 10 3.1 Network Overview 10 3.2 Action Module 11 3.3 Object Module 11 3.4 Pose Module 13 3.5 Adaptive Loss Function 15 4 Experiments 17 4.1 Baby Video Dataset (BVD 2020) 17 4.2 Evaluation Metrics 19 4.3 Performance Comparison 20 4.4 Qualitative results 23 5 Conclusion 25 6 REFERENCE 26 | |
| dc.language.iso | en | |
| dc.subject | 遞迴神經網路 | zh_TW |
| dc.subject | 適性損失函 數 | zh_TW |
| dc.subject | 人體骨架特徵 | zh_TW |
| dc.subject | 意外預測 | zh_TW |
| dc.subject | 物件注意力機制 | zh_TW |
| dc.subject | RNN-based model | en |
| dc.subject | baby video dataset | en |
| dc.subject | adaptive loss function | en |
| dc.subject | rich visual feature | en |
| dc.subject | accident prediction | en |
| dc.title | 嬰幼童危險預測系統 | zh_TW |
| dc.title | BabyNet: An Intelligent Baby Monitoring System using RNN-based Model and Adaptive Loss Function | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 周承復(Cheng-Fu Chou),梁容輝(Rung-Huei Liang) | |
| dc.subject.keyword | 意外預測,遞迴神經網路,物件注意力機制,人體骨架特徵,適性損失函 數, | zh_TW |
| dc.subject.keyword | accident prediction,RNN-based model,rich visual feature,baby video dataset,adaptive loss function, | en |
| dc.relation.page | 29 | |
| dc.identifier.doi | 10.6342/NTU202001482 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2020-07-15 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1307202018115200.pdf 未授權公開取用 | 7.63 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
