Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92981
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅楸善zh_TW
dc.contributor.advisorChiou-Shann Fuhen
dc.contributor.author李詠億zh_TW
dc.contributor.authorYong-Yi Lien
dc.date.accessioned2024-07-12T16:06:35Z-
dc.date.available2024-07-13-
dc.date.copyright2024-07-12-
dc.date.issued2024-
dc.date.submitted2024-06-11-
dc.identifier.citation[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, " Generative Adversarial Networks," Proceedings of Advances in Neural Information Processing Systems, https://arxiv.org/pdf/1406.2661, 2014.
[2] A. Radford, L. Metz, and S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," Proceedings of the International Conference on Learning Representations, https://arxiv.org/pdf/1511.06434, 2016.
[3] J. Y. Zhu, et al., "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," Proceedings of the IEEE International Conference on Computer Vision, https://arxiv.org/pdf/1703.10593, 2017.
[4] D. P Kingma, M. Welling, "Auto-Encoding Variational Bayes," arXiv preprint arXiv:1312.6114, https://arxiv.org/pdf/1312.6114, 2013.
[5] J. Ho, A. Jain, P. Abbeel, "Denoising Diffusion Probabilistic Models," Advances in neural information processing systems 33, https://arxiv.org/pdf/2006.11239, 2020.
[6] J. Song, C. Meng, S. Ermon, “Denoising Diffusion Implicit Models,” arXiv preprint arXiv:2010.02502, https://arxiv.org/pdf/2010.02502, 2019.
[7] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, I. Sutskever, "Zero-Shot Text-to-Image Generation," International conference on machine learning. Pmlr, https://arxiv.org/pdf/2102.12092, 2021.
[8] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, "Hierarchical Text-Conditional Image Generation with CLIP Latents," arXiv preprint arXiv:2204.06125 1.2, https://arxiv.org/abs/2204.06125, 2022.
[9] J. Betker, et al., "Improving Image Captioning with Better Use of Captions," Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf 2.3 (2023): 8.
[10] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J Fleet, M. Norouzi, “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” Advances in neural information processing systems 35, https://arxiv.org/pdf/2205.11487, 2022.
[11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, " High-Resolution Image Synthesis with Latent Diffusion Models," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, https://arxiv.org/pdf/2112.10752, 2022
[12] S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in neural information processing systems 28, https://arxiv.org/pdf/1506.01497, 2015.
[13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, “SSD: Single Shot MultiBox Detector,” Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
[14] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, " You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE conference on computer vision and pattern recognition, https://arxiv.org/pdf/1506.02640, 2016.
[15] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, "End-to-End Object Detection with Transformers," European conference on computer vision. Cham: Springer International Publishing, https://arxiv.org/pdf/2005.12872, 2020
[16] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly, "Parameter-Efficient Transfer Learning for NLP," Proceedings of International Conference on Biometrics, International conference on machine learning. PMLR, https://arxiv.org/pdf/1902.00751, 2019.
[17] X. L. Li, P. Liang, "Prefix-Tuning: Optimizing Continuous Prompts for Generation," arXiv preprint arXiv:2101.00190, https://arxiv.org/pdf/2101.00190, 2021.
[18] E. J. Hu, Y. Shen, P. Wallis, Z. A. Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, https://arxiv.org/pdf/2106.09685, 2021.
[19] S. Han, J. Pool, J. Tran, W. J. Dally, "Learning both Weights and Connections for Efficient Neural Networks," Advances in neural information processing systems 28, https://arxiv.org/pdf/1506.02626, 2015.
[20] G. Fang, X. Ma, M. Song, M. B. Mi, X. Wang, "DepGraph: Towards Any Structural Pruning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, https://arxiv.org/pdf/2301.12900, 2023.
[21] G. Hinton, O. Vinyals, J. Dean. "Distilling the Knowledge in a Neural Network," arXiv preprint arXiv:1503.02531, https://arxiv.org/pdf/1503.02531, 2015.
[22] C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, https://arxiv.org/pdf/2207.02696, 2023
[23] S. Lin, A. Wang, X. Yang. "SDXL-Lightning: Progressive Adversarial Diffusion Distillation", arXiv preprint arXiv:2402.13929, https://arxiv.org/pdf/2402.13929, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92981-
dc.description.abstract隨著家庭中寵物數量的增加,它們的健康和福利已成為現代社會的重要關注點。實時行為識別技術的發展,特別是其在嵌入式系統中的應用,為監控和理解寵物行為提供了新的途徑。本文探討了如何通過生成技術和模型優化來實現在嵌入式系統中的實時寵物行為辨識,並研究了這項技術在寵物照護中的有效應用。

本文首先通過使用Stable Diffusion技術生成各種貓行為的高品質影像,以解決訓練數據集不足的問題。接著,選用YOLOv7-tiny模型來進行貓行為辨識,並使用修剪技術進一步優化模型,減少其大小和計算需求。最後,將優化後的模型部署到Realtek AMB82-mini晶片上,實現高效的實時推斷。

本研究展示了利用生成技術擴充數據並修剪網路來優化模型以適應資源受限環境的有效方法,並驗證了其在實時貓行為辨識中的實用性和高效性。

關鍵字:寵物行為識別、嵌入式系統、生成模型、Stable Diffusion、YOLOv7-tiny、模型剪枝
zh_TW
dc.description.abstractWith the rising number of pets in households, their health and welfare have become significant concerns in modern society. The development of real-time behavior recognition technology, especially its application in embedded systems, offers new ways to monitor and understand pet behaviors. This study explores how to implement real-time pet behavior recognition on embedded systems through generative techniques, network pruning, and model optimization and investigates the effective application of this technology in pet care.

First, high-quality images of various cat behaviors were generated using Stable Diffusion to address the issue of insufficient training datasets. Next, the YOLOv7-tiny model was employed for cat behavior recognition, further optimized using pruning techniques to reduce its size and computational requirements. Finally, the optimized model was deployed on the Realtek AMB82-mini chip, achieving efficient real-time inference.

This research demonstrates the effectiveness of using generative techniques to augment datasets and optimize models for resource-constrained environments, validating its practicality and efficiency in real-time cat behavior recognition.

Keywords: pet behavior recognition, embedded systems, generative models, Stable Diffusion, YOLOv7-tiny, model pruning
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-12T16:06:35Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-07-12T16:06:35Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
中文摘要 iii
ABSTRACT iv
LIST OF FIGURES x
LIST OF TABLES xii
Chapter 1 Introduction 1
1.1 Overview 1
1.2 Data Collection Challenges 2
1.3 Embedded System 3
1.4 Thesis Organization 4
Chapter 2 Related Works 5
2.1 Generative Artificial Intelligence Model 5
2.1.1 Generative Adversarial Networks (GANs) 5
2.1.2 Variational Auto-Encoders (VAEs) 7
2.1.3 Diffusion Models 8
2.2 Object Detection with Deep Learning 12
2.2.1 Faster R-CNN (Region-based Convolutional Neural Networks) 12
2.2.2 Single-Shot Multi-Box Detector (SSD) 13
2.2.3 You Only Look Once (YOLO) 14
2.2.4 Detection Transformer (DeTr) 15
2.3 Parameter-Efficient Fine-Tuning (PEFT) 16
2.4 Model Optimization in Embedded Systems 18
2.4.1 Model Pruning 19
2.4.2 Knowledge Distillation 20
Chapter 3 Background 22
3.1 Stable Diffusion 22
3.2 Low-Rank Adaptation (LoRA) [18] 25
3.3 YOLOv7 [22] 28
3.4 Torch Pruning 30
Chapter 4 Methodology 32
4.1 Overview 32
4.2 Synthetic Data Generation 34
4.3 Pet Behavior Recognition 37
Chapter 5 Experimental Results 41
5.1 Evaluation Metric 41
5.1.1 Precision and Recall 41
5.1.2 Intersection over Union (IoU) 42
5.1.3 Average Precision (AP) and Mean Average Precision (mAP) 42
5.1.4 Floating Point Operations (FLOPs) 44
5.2 Effectiveness of Synthetic Data Generation 44
5.2.1 Addressing Limitations of SDXL-Lightning 45
5.2.2 Improved Training Dataset 45
5.2.3 Impact on Model Performance 47
5.3 Impact of Model Pruning 48
5.3.1 Size, Computational Efficiency, and Speed 48
5.3.2 Performance Metrics 49
5.4 Test Image Result 50
Chapter 6 Conclusion and Future Works 59
6.1 Implications and Contributions 59
6.2 Future Work 60
6.3 Conclusion 61
References 62
-
dc.language.isoen-
dc.subject寵物行為識別zh_TW
dc.subject嵌入式系統zh_TW
dc.subject生成模型zh_TW
dc.subjectStable Diffusionzh_TW
dc.subjectYOLOv7-tinyzh_TW
dc.subject模型剪枝zh_TW
dc.subjectgenerative modelsen
dc.subjectpet behavior recognitionen
dc.subjectmodel pruningen
dc.subjectYOLOv7-tinyen
dc.subjectStable Diffusionen
dc.subjectembedded systemsen
dc.title李辨識:寵物行為實時辨識於嵌入式系統zh_TW
dc.titleLiRecognition: Real-time Pet Behavior Recognition on Embedded Systemen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee方瓊瑤;巫宗昇zh_TW
dc.contributor.oralexamcommitteeChiung-Yao Fang;Zong-Sheng Wuen
dc.subject.keyword寵物行為識別,嵌入式系統,生成模型,Stable Diffusion,YOLOv7-tiny,模型剪枝,zh_TW
dc.subject.keywordpet behavior recognition,embedded systems,generative models,Stable Diffusion,YOLOv7-tiny,model pruning,en
dc.relation.page66-
dc.identifier.doi10.6342/NTU202401113-
dc.rights.note未授權-
dc.date.accepted2024-06-12-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
  未授權公開取用
3.72 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved