李辨識：寵物行為實時辨識於嵌入式系統

李詠億; Yong-Yi Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92981

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	傅楸善	zh_TW
dc.contributor.advisor	Chiou-Shann Fuh	en
dc.contributor.author	李詠億	zh_TW
dc.contributor.author	Yong-Yi Li	en
dc.date.accessioned	2024-07-12T16:06:35Z	-
dc.date.available	2024-07-13	-
dc.date.copyright	2024-07-12	-
dc.date.issued	2024	-
dc.date.submitted	2024-06-11	-
dc.identifier.citation	[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, " Generative Adversarial Networks," Proceedings of Advances in Neural Information Processing Systems, https://arxiv.org/pdf/1406.2661, 2014. [2] A. Radford, L. Metz, and S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," Proceedings of the International Conference on Learning Representations, https://arxiv.org/pdf/1511.06434, 2016. [3] J. Y. Zhu, et al., "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," Proceedings of the IEEE International Conference on Computer Vision, https://arxiv.org/pdf/1703.10593, 2017. [4] D. P Kingma, M. Welling, "Auto-Encoding Variational Bayes," arXiv preprint arXiv:1312.6114, https://arxiv.org/pdf/1312.6114, 2013. [5] J. Ho, A. Jain, P. Abbeel, "Denoising Diffusion Probabilistic Models," Advances in neural information processing systems 33, https://arxiv.org/pdf/2006.11239, 2020. [6] J. Song, C. Meng, S. Ermon, “Denoising Diffusion Implicit Models,” arXiv preprint arXiv:2010.02502, https://arxiv.org/pdf/2010.02502, 2019. [7] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, I. Sutskever, "Zero-Shot Text-to-Image Generation," International conference on machine learning. Pmlr, https://arxiv.org/pdf/2102.12092, 2021. [8] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, "Hierarchical Text-Conditional Image Generation with CLIP Latents," arXiv preprint arXiv:2204.06125 1.2, https://arxiv.org/abs/2204.06125, 2022. [9] J. Betker, et al., "Improving Image Captioning with Better Use of Captions," Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf 2.3 (2023): 8. [10] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J Fleet, M. Norouzi, “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” Advances in neural information processing systems 35, https://arxiv.org/pdf/2205.11487, 2022. [11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, " High-Resolution Image Synthesis with Latent Diffusion Models," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, https://arxiv.org/pdf/2112.10752, 2022 [12] S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in neural information processing systems 28, https://arxiv.org/pdf/1506.01497, 2015. [13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, “SSD: Single Shot MultiBox Detector,” Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016. [14] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, " You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE conference on computer vision and pattern recognition, https://arxiv.org/pdf/1506.02640, 2016. [15] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, "End-to-End Object Detection with Transformers," European conference on computer vision. Cham: Springer International Publishing, https://arxiv.org/pdf/2005.12872, 2020 [16] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly, "Parameter-Efficient Transfer Learning for NLP," Proceedings of International Conference on Biometrics, International conference on machine learning. PMLR, https://arxiv.org/pdf/1902.00751, 2019. [17] X. L. Li, P. Liang, "Prefix-Tuning: Optimizing Continuous Prompts for Generation," arXiv preprint arXiv:2101.00190, https://arxiv.org/pdf/2101.00190, 2021. [18] E. J. Hu, Y. Shen, P. Wallis, Z. A. Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, https://arxiv.org/pdf/2106.09685, 2021. [19] S. Han, J. Pool, J. Tran, W. J. Dally, "Learning both Weights and Connections for Efficient Neural Networks," Advances in neural information processing systems 28, https://arxiv.org/pdf/1506.02626, 2015. [20] G. Fang, X. Ma, M. Song, M. B. Mi, X. Wang, "DepGraph: Towards Any Structural Pruning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, https://arxiv.org/pdf/2301.12900, 2023. [21] G. Hinton, O. Vinyals, J. Dean. "Distilling the Knowledge in a Neural Network," arXiv preprint arXiv:1503.02531, https://arxiv.org/pdf/1503.02531, 2015. [22] C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, https://arxiv.org/pdf/2207.02696, 2023 [23] S. Lin, A. Wang, X. Yang. "SDXL-Lightning: Progressive Adversarial Diffusion Distillation", arXiv preprint arXiv:2402.13929, https://arxiv.org/pdf/2402.13929, 2024.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92981	-
dc.description.abstract	隨著家庭中寵物數量的增加，它們的健康和福利已成為現代社會的重要關注點。實時行為識別技術的發展，特別是其在嵌入式系統中的應用，為監控和理解寵物行為提供了新的途徑。本文探討了如何通過生成技術和模型優化來實現在嵌入式系統中的實時寵物行為辨識，並研究了這項技術在寵物照護中的有效應用。本文首先通過使用Stable Diffusion技術生成各種貓行為的高品質影像，以解決訓練數據集不足的問題。接著，選用YOLOv7-tiny模型來進行貓行為辨識，並使用修剪技術進一步優化模型，減少其大小和計算需求。最後，將優化後的模型部署到Realtek AMB82-mini晶片上，實現高效的實時推斷。本研究展示了利用生成技術擴充數據並修剪網路來優化模型以適應資源受限環境的有效方法，並驗證了其在實時貓行為辨識中的實用性和高效性。關鍵字：寵物行為識別、嵌入式系統、生成模型、Stable Diffusion、YOLOv7-tiny、模型剪枝	zh_TW
dc.description.abstract	With the rising number of pets in households, their health and welfare have become significant concerns in modern society. The development of real-time behavior recognition technology, especially its application in embedded systems, offers new ways to monitor and understand pet behaviors. This study explores how to implement real-time pet behavior recognition on embedded systems through generative techniques, network pruning, and model optimization and investigates the effective application of this technology in pet care. First, high-quality images of various cat behaviors were generated using Stable Diffusion to address the issue of insufficient training datasets. Next, the YOLOv7-tiny model was employed for cat behavior recognition, further optimized using pruning techniques to reduce its size and computational requirements. Finally, the optimized model was deployed on the Realtek AMB82-mini chip, achieving efficient real-time inference. This research demonstrates the effectiveness of using generative techniques to augment datasets and optimize models for resource-constrained environments, validating its practicality and efficiency in real-time cat behavior recognition. Keywords: pet behavior recognition, embedded systems, generative models, Stable Diffusion, YOLOv7-tiny, model pruning	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-12T16:06:35Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-12T16:06:35Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv LIST OF FIGURES x LIST OF TABLES xii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Data Collection Challenges 2 1.3 Embedded System 3 1.4 Thesis Organization 4 Chapter 2 Related Works 5 2.1 Generative Artificial Intelligence Model 5 2.1.1 Generative Adversarial Networks (GANs) 5 2.1.2 Variational Auto-Encoders (VAEs) 7 2.1.3 Diffusion Models 8 2.2 Object Detection with Deep Learning 12 2.2.1 Faster R-CNN (Region-based Convolutional Neural Networks) 12 2.2.2 Single-Shot Multi-Box Detector (SSD) 13 2.2.3 You Only Look Once (YOLO) 14 2.2.4 Detection Transformer (DeTr) 15 2.3 Parameter-Efficient Fine-Tuning (PEFT) 16 2.4 Model Optimization in Embedded Systems 18 2.4.1 Model Pruning 19 2.4.2 Knowledge Distillation 20 Chapter 3 Background 22 3.1 Stable Diffusion 22 3.2 Low-Rank Adaptation (LoRA) [18] 25 3.3 YOLOv7 [22] 28 3.4 Torch Pruning 30 Chapter 4 Methodology 32 4.1 Overview 32 4.2 Synthetic Data Generation 34 4.3 Pet Behavior Recognition 37 Chapter 5 Experimental Results 41 5.1 Evaluation Metric 41 5.1.1 Precision and Recall 41 5.1.2 Intersection over Union (IoU) 42 5.1.3 Average Precision (AP) and Mean Average Precision (mAP) 42 5.1.4 Floating Point Operations (FLOPs) 44 5.2 Effectiveness of Synthetic Data Generation 44 5.2.1 Addressing Limitations of SDXL-Lightning 45 5.2.2 Improved Training Dataset 45 5.2.3 Impact on Model Performance 47 5.3 Impact of Model Pruning 48 5.3.1 Size, Computational Efficiency, and Speed 48 5.3.2 Performance Metrics 49 5.4 Test Image Result 50 Chapter 6 Conclusion and Future Works 59 6.1 Implications and Contributions 59 6.2 Future Work 60 6.3 Conclusion 61 References 62	-
dc.language.iso	en	-
dc.subject	寵物行為識別	zh_TW
dc.subject	嵌入式系統	zh_TW
dc.subject	生成模型	zh_TW
dc.subject	Stable Diffusion	zh_TW
dc.subject	YOLOv7-tiny	zh_TW
dc.subject	模型剪枝	zh_TW
dc.subject	generative models	en
dc.subject	pet behavior recognition	en
dc.subject	model pruning	en
dc.subject	YOLOv7-tiny	en
dc.subject	Stable Diffusion	en
dc.subject	embedded systems	en
dc.title	李辨識：寵物行為實時辨識於嵌入式系統	zh_TW
dc.title	LiRecognition: Real-time Pet Behavior Recognition on Embedded System	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	方瓊瑤;巫宗昇	zh_TW
dc.contributor.oralexamcommittee	Chiung-Yao Fang;Zong-Sheng Wu	en
dc.subject.keyword	寵物行為識別,嵌入式系統,生成模型,Stable Diffusion,YOLOv7-tiny,模型剪枝,	zh_TW
dc.subject.keyword	pet behavior recognition,embedded systems,generative models,Stable Diffusion,YOLOv7-tiny,model pruning,	en
dc.relation.page	66	-
dc.identifier.doi	10.6342/NTU202401113	-
dc.rights.note	未授權	-
dc.date.accepted	2024-06-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	3.72 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。