在HoloLens2上部屬YOLO模型：平衡擴增實境應用中的效能與效率

陳沛君; Pei-Chun Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98394

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃瀅瑛	zh_TW
dc.contributor.advisor	Ying-Yin Huang	en
dc.contributor.author	陳沛君	zh_TW
dc.contributor.author	Pei-Chun Chen	en
dc.date.accessioned	2025-08-05T16:11:56Z	-
dc.date.available	2025-08-06	-
dc.date.copyright	2025-08-05	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-30	-
dc.identifier.citation	[1] Zaccardi, S., Frantz, T., Beckwée, D., Swinnen, E., & Jansen, B. (2023). On-device execution of deep learning models on hololens2 for real-time augmented reality medical applications. Sensors, 23(21), 8698. [2] Ghasemi, Y., Jeong, H., Choi, S. H., Park, K. B., & Lee, J. Y. (2022). Deep learning-based object detection in augmented reality: A systematic review. Computers in Industry, 139, 103661. [3] Łysakowski, M., Żywanowski, K., Banaszczyk, A., Nowicki, M. R., Skrzypczyński, P., & Tadeja, S. K. (2023, July). Real-time onboard object detection for augmented reality: Enhancing head-mounted display with yolov8. In 2023 IEEE International Conference on Edge Computing and Communications (EDGE) (pp. 364-371). IEEE. [4] Stanescu, A., Mohr, P., Kozinski, M., Mori, S., Schmalstieg, D., & Kalkofen, D. (2023, October). State-aware configuration detection for augmented reality step-by-step tutorials. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 157-166). IEEE. [5] Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., & Han, J. (2024). Yolov10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, 107984-108011. [6] Khanam, R., & Hussain, M. (2024). Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725. [7] Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524. [8] Microsoft. (2021, October 19). Hologram stability. Microsoft Learn. https://learn.microsoft.com/en-us/windows/mixed-reality/develop/advanced-concepts/hologram-stability [9] Khan, T., Zhu, T. S., Downes, T., Cheng, L., Kass, N. M., Andrews, E. G., & Biehl, J. T. (2023). Understanding effects of visual feedback delay in ar on fine motor surgical tasks. IEEE Transactions on Visualization and Computer Graphics, 29(11), 4697-4707. [10] Palumbo, A. (2022). Microsoft HoloLens 2 in medical and healthcare context: state of the art and future prospects. Sensors, 22(20), 7709. [11] Zari, G., Condino, S., Cutolo, F., & Ferrari, V. (2023). Magic leap 1 versus microsoft hololens 2 for the visualization of 3d content obtained from radiological images. Sensors, 23(6), 3040. [12] Microsoft. (2023, March 13). HoloLens 2 hardware. Microsoft Learn. https://learn.microsoft.com/en-us/hololens/hololens2-hardware [13] Qin, Z., Wang, W., Dammer, K. H., Guo, L., & Cao, Z. (2021). Ag-YOLO: A real-time low-cost detector for precise spraying with case study of palms. Frontiers in Plant Science, 12, 753603. [14] Ultralytics. (2024, May 25). YOLOv10 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov10/ [15] Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464-7475). [16] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). [17] Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149. [18] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271). [19] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. [20] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. [21] Ultralytics. (n.d.). YOLOv5 overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov5/ [22] Wang, C. Y., Yeh, I. H., & Mark Liao, H. Y. (2024, September). Yolov9: Learning what you want to learn using programmable gradient information. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.\\\\ [23] Ultralytics. (2023, November 12). YOLOv8 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov8/#can-i-benchmark-yolov8-models-for-performance [24] Ultralytics. (2024, September 30). YOLOv11 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolo11/ [25] Ultralytics. (2025, February 20). YOLO12: Attention‑Centric Real‑Time Object Detectors [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolo12/ [26] Wong, A., Famuori, M., Shafiee, M. J., Li, F., Chwyl, B., & Chung, J. (2019, December). YOLO nano: A highly compact you only look once convolutional neural network for object detection. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS) (pp. 22-25). IEEE. [27] Bumbálek, R., Umurungi, S. N., Ufitikirezi, J. D. D. M., Zoubek, T., Kuneš, R., Stehlík, R., Lin, H.-I., & Bartoš, P. (2025). Deep learning in poultry farming: Comparative analysis of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for dead chickens detection. Poultry Science, 105, 105440.\\\\ [28] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., ... & Murphy, K. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7310-7311). [29] Unity Technologies. (n.d.). Introduction to Barracuda [Documentation]. Unity Barracuda. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.barracuda@1.0/manual/index.html [30] Unity Technologies. (2023, October 18). What’s new in Sentis 1.2 [Documentation]. Unity Sentis. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.sentis@1.2/manual/whats-new.html [31] Unity Technologies. (2023, October 18). IWorker interface: core of the engine [Documentation]. Unity Barracuda. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.barracuda@1.0/manual/Worker.html [32] Microsoft. (2021, December 30). Select an execution device [Documentation]. Windows Machine Learning tutorials. Retrieved June 19, 2025, from https://learn.microsoft.com/en-us/windows/ai/windows-ml/tutorials/advanced-tutorial-execution-device [33] von Atzigen, M., Liebmann, F., Hoch, A., Bauer, D. E., Snedeker, J. G., Farshad, M., & Fürnstahl, P. (2021). HoloYolo: A proof‐of‐concept study for marker‐less surgical navigation of spinal rod implants with augmented reality and on‐device machine learning. The International Journal of Medical Robotics and Computer Assisted Surgery, 17(1), 1-10. [34] Fischer, J., Neff, M., Freudenstein, D., & Bartz, D. (2004, June). Medical Augmented Reality based on Commercial Image Guided Surgery. In EGVE (pp. 83-86). [35] Carmack, J. (2013, February 22). Latency mitigation strategies. Dan Luu. https://danluu.com/latency-mitigation/ [36] Chen, K., Li, T., Kim, H. S., Culler, D. E., & Katz, R. H. (2018, November). Marvel: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (pp. 292-304). [37] Liu, L., Li, H., & Gruteser, M. (2019, August). Edge assisted real-time object detection for mobile augmented reality. In The 25th annual international conference on mobile computing and networking (pp. 1-16). [38] Gruen, R., Ofek, E., Steed, A., Gal, R., Sinclair, M., & Gonzalez-Franco, M. (2020, March). Measuring system visual latency through cognitive latency on video see-through AR devices. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (pp. 791-799). IEEE. [39] Davis, J., Hsieh, Y. H., & Lee, H. C. (2015). Humans perceive flicker artifacts at 500 Hz. Scientific reports, 5(1), 7861. [40] Ellis, S. R., Young, M. J., Adelstein, B. D., & Ehrlich, S. M. (1999, September). Discrimination of changes of latency during voluntary hand movement of virtual objects. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 43, No. 22, pp. 1182-1186). Sage CA: Los Angeles, CA: SAGE Publications. [41] Adelstein, B. D., Lee, T. G., & Ellis, S. R. (2003, October). Head tracking latency in virtual environments: psychophysics and a model. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 47, No. 20, pp. 2083-2087). Sage CA: Los Angeles, CA: SAGE Publications. [42] Feldstein, I. T., & Ellis, S. R. (2020). A simple video-based technique for measuring latency in virtual reality or teleoperation. IEEE Transactions on Visualization and Computer Graphics, 27(9), 3611-3625. [43] Jerald, J., & Whitton, M. (2009, March). Relating scene-motion thresholds to latency thresholds for head-mounted displays. In 2009 IEEE virtual reality conference (pp. 211-218). IEEE. [44] Ultralytics. (2023). Configuration [Documentation]. Ultralytics Docs. Retrieved June 23, 2025, from https://docs.ultralytics.com/zh/usage/cfg/ [45] Sahu, D., Nidhi, Prakash, S., Pandey, V. K., Yang, T., Rathore, R. S., & Wang, L. (2025). Edge assisted energy optimization for mobile AR applications for enhanced battery life and performance. Scientific Reports, 15(1), 10034.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98394	-
dc.description.abstract	本研究旨在探討四款輕量化 YOLO 模型（YOLOv8n、YOLOv10n、YOLOv11n、YOLOv12n）部署在微軟 HoloLens 2 頭戴式裝置上進行即時物件辨識的效能與效率平衡。隨著擴增實境（Augmented Reality, AR）與深度學習技術結合日益普及，頭戴式裝置如 HoloLens 2 的應用需求也逐步增加，然而由於裝置端計算資源有限，如何有效在其上部署高效能的物件辨識模型仍是重要研究議題。本研究針對這一議題，透過以 LEGO® 積木組裝流程為任務情境，比較四種 nano-scale YOLO 模型部屬在HoloLens 2上即時辨識組裝狀態的幀率（frame rate）、延遲（latency）與辨識信心度（confidence）。實測結果顯示，四款模型在 HoloLens 2 上的辨識表現存在明顯差異。在固定輸入解析度為 160×160 pixels 並透過 Unity Sentis 進行推論時，YOLOv11n 的表現最佳，幀率達到約 10–11 FPS，端到端延遲約為 90–100 ms，且辨識信心度高達 0.935，這代表 YOLOv11n 能夠有效滿足大部分即時互動的要求，提供較為流暢與可靠的使用體驗。其次，YOLOv8n 與 YOLOv12n 模型的表現則稍遜於 YOLOv11n。YOLOv8n 的幀率為 9–10 FPS，延遲約 100–111 ms，信心度約 0.912；而 YOLOv12n 幀率約 8–10 FPS，延遲稍高至 111–143 ms，但辨識信心度約0.919則略高於 YOLOv8n。相較之下，YOLOv10n 則在三個指標中明顯落後，幀率僅約 5 FPS，延遲高達 200–250 ms，信心度也最低（0.887），此性能較不適合即時應用，僅推薦用於對即時性要求較低的離線情境。而透過進一步分析比較各模型辨識細節，發現四款模型在特定步驟（例如 LEGO® 組裝流程的 Step 5 與 Step 6）均出現信心度明顯下降的情形。推測此現象源於這些步驟中 LEGO® 零件的視覺特徵較不明顯，難以區分，建議未來研究可透過增強訓練資料的多樣性與應用多尺度特徵融合技術改善此問題。此外，本研究亦驗證了以 Unity Sentis 作為裝置端推論引擎的可行性，實測結果顯示所有模型均能夠成功在此框架上穩定運行，表示此推論引擎已具備足夠的穩定性與相容性，可適用於頭戴式AR 裝置。綜合而言，本研究之結果驗證 nano-scale YOLO 模型在計算資源受限之頭戴式 AR 裝置上具有實際部署之可行性，而其中 YOLOv11n 模型達到最為平衡的即時辨識效能與穩定性。本研究成果不僅為未來 AR 應用的模型選擇提供明確指引，亦提出資料多樣性、功耗管理及使用者體驗之改善建議，期望未來研究能進一步提升裝置端物件辨識效能與使用者的整體互動體驗。	zh_TW
dc.description.abstract	This study investigates the performance and efficiency balance of four nano-scale YOLO models (YOLOv8n, YOLOv10n, YOLOv11n, and YOLOv12n) deployed on Microsoft HoloLens 2 for real-time object detection in augmented reality (AR) scenarios. With the increasing integration of AR and deep learning technologies, head-mounted devices such as HoloLens 2 are gaining prominence. However, limited computational resources on such edge devices present significant challenges for deploying effective object detection models. This research addresses this issue by evaluating the frame rate (frames per second, FPS), latency, and detection confidence of four lightweight YOLO models, using a LEGO® assembly scenario for practical testing. Experimental results reveal notable differences in performance among the four models under identical conditions (input resolution: 160×160 pixels, inference via Unity Sentis). YOLOv11n achieved the best overall performance, with approximately 10–11 FPS, an end-to-end latency of about 90–100 ms, and an average detection confidence of 0.935. This indicates YOLOv11n effectively meets the requirements of most real-time interactions, offering a smooth and reliable user experience. YOLOv8n and YOLOv12n ranked second, with slightly inferior performance compared to YOLOv11n. Specifically, YOLOv8n achieved a frame rate of about 9–10 FPS, a latency of around 100–111 ms, and a confidence score of 0.912. Meanwhile, YOLOv12n had a frame rate of approximately 8–10 FPS, higher latency at 111–143 ms, but a slightly better confidence score (0.919) than YOLOv8n. In contrast, YOLOv10n significantly lagged in all three metrics, delivering around 5 FPS, a latency of 200–250 ms, and the lowest detection confidence of 0.887, making it unsuitable for real-time applications and recommended only for scenarios with minimal real-time demands. Furthermore, detailed analysis of model performance revealed a noticeable confidence reduction in certain LEGO® assembly steps (particularly Steps 5 and 6) across all models. This drop likely resulted from less distinguishable visual features in these steps. Future research is recommended to improve this issue by enhancing dataset diversity and incorporating multi-scale feature fusion techniques. Additionally, this study validated Unity Sentis as a feasible on-device inference framework. All models successfully ran on this platform without memory overflow or system crashes, demonstrating sufficient stability and compatibility for wearable AR devices. In conclusion, this research confirms the practicality of deploying nano-scale YOLO models on resource-constrained AR head-mounted devices, highlighting YOLOv11n as the best model for balancing real-time detection performance and stability. The findings provide clear guidance for future model selection in AR applications and offer recommendations regarding dataset diversity, power management, and user experience enhancement, aiming to continually improve on-device inference performance and overall user interactions in future research.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:11:56Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-05T16:11:56Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv 目次 vii 圖次 x 表次 xi 第一章緒論 1 1.1 前言 1 1.2 研究動機與目的 2 1.3 研究流程與章節安排 4 第二章文獻回顧 6 2.1 擴增實境與裝置端推論 6 2.1.1 擴增實境互動的延遲與幀率需求 7 2.1.2 裝置端推論優勢與硬體挑戰 8 2.1.3 HoloLens 2硬體規格與運算瓶頸 10 2.2 物件辨識:從YOLO到nano-scale Models 11 2.2.1 YOLO系列架構 11 2.2.2 Nano-scale模型 14 2.3 裝置端推論與效能優化 15 2.3.1 Unity推論引擎比較（Barracuda vs. Sentis） 16 2.3.2 辨識解析度與模型大小 17 2.4 AR裝置效能評估指標 19 2.4.1 Frame rate 19 2.4.2 Latency 19 2.4.3 Confidence 20 2.4.4 綜合評比 21 第三章研究方法 23 3.1 實驗設計 23 3.2 實驗設備 24 3.3 資料集建置 27 3.3.1 LEGO®組裝流程影像 27 3.3.2 資料標註與增強 29 3.4 YOLO nano模型訓練與部屬 30 3.4.1 訓練設定與超參數 30 3.4.2 ONNX格式匯出與Unity整合 31 3.4.3 HoloLens 2物件辨識程式架構 32 3.5 實驗流程 34 3.6 數據收集與評估指標 36 3.6.1 輸入影像 36 3.6.2 幀率（frame rate）、延遲（latency）與信心度（confidence） 37 3.7 研究目標 37 第四章研究結果 39 4.1 HoloLens 2裝置端辨識結果 39 4.2 各模型frame rate表現 40 4.2.1 Nano-scale YOLO模型部署在HoloLens 2之frame rate 40 4.2.2 小結 41 4.3 各模型latency表現 42 4.3.1 Nano-scale YOLO模型部署在HoloLens 2之latency 42 4.3.2 小結 44 4.4 類別偵測confidence表現 44 4.4.1 Nano-scale YOLO模型部署在HoloLens 2之confidence 45 4.4.2 小結 46 第五章結果與討論 48 5.1 主要發現與意涵 48 5.2 研究限制與未來方向 50 參考文獻 51 附錄 57	-
dc.language.iso	zh_TW	-
dc.subject	HoloLens 2	zh_TW
dc.subject	擴增實境（AR）	zh_TW
dc.subject	Nano-scale YOLO	zh_TW
dc.subject	裝置端推論（on-device inference）	zh_TW
dc.subject	即時物件辨識	zh_TW
dc.subject	Nano-scale YOLO	en
dc.subject	HoloLens 2	en
dc.subject	Real-time Object Detection	en
dc.subject	On-device Inference	en
dc.subject	Augmented Reality (AR)	en
dc.title	在HoloLens2上部屬YOLO模型：平衡擴增實境應用中的效能與效率	zh_TW
dc.title	Deploying YOLO Models on HoloLens 2: Balancing Performance and Efficiency in AR Applications	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	瞿志行;李昀儒	zh_TW
dc.contributor.oralexamcommittee	Chih-Hsing Chu;Yun-Ju Lee	en
dc.subject.keyword	HoloLens 2,擴增實境（AR）,Nano-scale YOLO,裝置端推論（on-device inference）,即時物件辨識,	zh_TW
dc.subject.keyword	HoloLens 2,Augmented Reality (AR),Nano-scale YOLO,On-device Inference,Real-time Object Detection,	en
dc.relation.page	66	-
dc.identifier.doi	10.6342/NTU202502914	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-01	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2025-08-06	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	4.18 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。