Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 機械工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98394
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor黃瀅瑛zh_TW
dc.contributor.advisorYing-Yin Huangen
dc.contributor.author陳沛君zh_TW
dc.contributor.authorPei-Chun Chenen
dc.date.accessioned2025-08-05T16:11:56Z-
dc.date.available2025-08-06-
dc.date.copyright2025-08-05-
dc.date.issued2025-
dc.date.submitted2025-07-30-
dc.identifier.citation[1] Zaccardi, S., Frantz, T., Beckwée, D., Swinnen, E., & Jansen, B. (2023). On-device execution of deep learning models on hololens2 for real-time augmented reality medical applications. Sensors, 23(21), 8698.
[2] Ghasemi, Y., Jeong, H., Choi, S. H., Park, K. B., & Lee, J. Y. (2022). Deep learning-based object detection in augmented reality: A systematic review. Computers in Industry, 139, 103661.
[3] Łysakowski, M., Żywanowski, K., Banaszczyk, A., Nowicki, M. R., Skrzypczyński, P., & Tadeja, S. K. (2023, July). Real-time onboard object detection for augmented reality: Enhancing head-mounted display with yolov8. In 2023 IEEE International Conference on Edge Computing and Communications (EDGE) (pp. 364-371). IEEE.
[4] Stanescu, A., Mohr, P., Kozinski, M., Mori, S., Schmalstieg, D., & Kalkofen, D. (2023, October). State-aware configuration detection for augmented reality step-by-step tutorials. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 157-166). IEEE.
[5] Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., & Han, J. (2024). Yolov10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, 107984-108011.
[6] Khanam, R., & Hussain, M. (2024). Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725.
[7] Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524.
[8] Microsoft. (2021, October 19). Hologram stability. Microsoft Learn. https://learn.microsoft.com/en-us/windows/mixed-reality/develop/advanced-concepts/hologram-stability
[9] Khan, T., Zhu, T. S., Downes, T., Cheng, L., Kass, N. M., Andrews, E. G., & Biehl, J. T. (2023). Understanding effects of visual feedback delay in ar on fine motor surgical tasks. IEEE Transactions on Visualization and Computer Graphics, 29(11), 4697-4707.
[10] Palumbo, A. (2022). Microsoft HoloLens 2 in medical and healthcare context: state of the art and future prospects. Sensors, 22(20), 7709.
[11] Zari, G., Condino, S., Cutolo, F., & Ferrari, V. (2023). Magic leap 1 versus microsoft hololens 2 for the visualization of 3d content obtained from radiological images. Sensors, 23(6), 3040.
[12] Microsoft. (2023, March 13). HoloLens 2 hardware. Microsoft Learn. https://learn.microsoft.com/en-us/hololens/hololens2-hardware
[13] Qin, Z., Wang, W., Dammer, K. H., Guo, L., & Cao, Z. (2021). Ag-YOLO: A real-time low-cost detector for precise spraying with case study of palms. Frontiers in Plant Science, 12, 753603.
[14] Ultralytics. (2024, May 25). YOLOv10 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov10/
[15] Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464-7475).
[16] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[17] Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149.
[18] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[19] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
[20] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
[21] Ultralytics. (n.d.). YOLOv5 overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov5/
[22] Wang, C. Y., Yeh, I. H., & Mark Liao, H. Y. (2024, September). Yolov9: Learning what you want to learn using programmable gradient information. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.\\\\
[23] Ultralytics. (2023, November 12). YOLOv8 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolov8/#can-i-benchmark-yolov8-models-for-performance
[24] Ultralytics. (2024, September 30). YOLOv11 model overview [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolo11/
[25] Ultralytics. (2025, February 20). YOLO12: Attention‑Centric Real‑Time Object Detectors [Documentation]. Ultralytics Docs. https://docs.ultralytics.com/zh/models/yolo12/
[26] Wong, A., Famuori, M., Shafiee, M. J., Li, F., Chwyl, B., & Chung, J. (2019, December). YOLO nano: A highly compact you only look once convolutional neural network for object detection. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS) (pp. 22-25). IEEE.
[27] Bumbálek, R., Umurungi, S. N., Ufitikirezi, J. D. D. M., Zoubek, T., Kuneš, R., Stehlík, R., Lin, H.-I., & Bartoš, P. (2025). Deep learning in poultry farming: Comparative analysis of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for dead chickens detection. Poultry Science, 105, 105440.\\\\
[28] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., ... & Murphy, K. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7310-7311).
[29] Unity Technologies. (n.d.). Introduction to Barracuda [Documentation]. Unity Barracuda. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.barracuda@1.0/manual/index.html
[30] Unity Technologies. (2023, October 18). What’s new in Sentis 1.2 [Documentation]. Unity Sentis. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.sentis@1.2/manual/whats-new.html
[31] Unity Technologies. (2023, October 18). IWorker interface: core of the engine [Documentation]. Unity Barracuda. Retrieved June 19, 2025, from https://docs.unity3d.com/Packages/com.unity.barracuda@1.0/manual/Worker.html
[32] Microsoft. (2021, December 30). Select an execution device [Documentation]. Windows Machine Learning tutorials. Retrieved June 19, 2025, from https://learn.microsoft.com/en-us/windows/ai/windows-ml/tutorials/advanced-tutorial-execution-device
[33] von Atzigen, M., Liebmann, F., Hoch, A., Bauer, D. E., Snedeker, J. G., Farshad, M., & Fürnstahl, P. (2021). HoloYolo: A proof‐of‐concept study for marker‐less surgical navigation of spinal rod implants with augmented reality and on‐device machine learning. The International Journal of Medical Robotics and Computer Assisted Surgery, 17(1), 1-10.
[34] Fischer, J., Neff, M., Freudenstein, D., & Bartz, D. (2004, June). Medical Augmented Reality based on Commercial Image Guided Surgery. In EGVE (pp. 83-86).
[35] Carmack, J. (2013, February 22). Latency mitigation strategies. Dan Luu. https://danluu.com/latency-mitigation/
[36] Chen, K., Li, T., Kim, H. S., Culler, D. E., & Katz, R. H. (2018, November). Marvel: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (pp. 292-304).
[37] Liu, L., Li, H., & Gruteser, M. (2019, August). Edge assisted real-time object detection for mobile augmented reality. In The 25th annual international conference on mobile computing and networking (pp. 1-16).
[38] Gruen, R., Ofek, E., Steed, A., Gal, R., Sinclair, M., & Gonzalez-Franco, M. (2020, March). Measuring system visual latency through cognitive latency on video see-through AR devices. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (pp. 791-799). IEEE.
[39] Davis, J., Hsieh, Y. H., & Lee, H. C. (2015). Humans perceive flicker artifacts at 500 Hz. Scientific reports, 5(1), 7861.
[40] Ellis, S. R., Young, M. J., Adelstein, B. D., & Ehrlich, S. M. (1999, September). Discrimination of changes of latency during voluntary hand movement of virtual objects. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 43, No. 22, pp. 1182-1186). Sage CA: Los Angeles, CA: SAGE Publications.
[41] Adelstein, B. D., Lee, T. G., & Ellis, S. R. (2003, October). Head tracking latency in virtual environments: psychophysics and a model. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 47, No. 20, pp. 2083-2087). Sage CA: Los Angeles, CA: SAGE Publications.
[42] Feldstein, I. T., & Ellis, S. R. (2020). A simple video-based technique for measuring latency in virtual reality or teleoperation. IEEE Transactions on Visualization and Computer Graphics, 27(9), 3611-3625.
[43] Jerald, J., & Whitton, M. (2009, March). Relating scene-motion thresholds to latency thresholds for head-mounted displays. In 2009 IEEE virtual reality conference (pp. 211-218). IEEE.
[44] Ultralytics. (2023). Configuration [Documentation]. Ultralytics Docs. Retrieved June 23, 2025, from https://docs.ultralytics.com/zh/usage/cfg/
[45] Sahu, D., Nidhi, Prakash, S., Pandey, V. K., Yang, T., Rathore, R. S., & Wang, L. (2025). Edge assisted energy optimization for mobile AR applications for enhanced battery life and performance. Scientific Reports, 15(1), 10034.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98394-
dc.description.abstract本研究旨在探討四款輕量化 YOLO 模型(YOLOv8n、YOLOv10n、YOLOv11n、YOLOv12n)部署在微軟 HoloLens 2 頭戴式裝置上進行即時物件辨識的效能與效率平衡。隨著擴增實境(Augmented Reality, AR)與深度學習技術結合日益普及,頭戴式裝置如 HoloLens 2 的應用需求也逐步增加,然而由於裝置端計算資源有限,如何有效在其上部署高效能的物件辨識模型仍是重要研究議題。本研究針對這一議題,透過以 LEGO® 積木組裝流程為任務情境,比較四種 nano-scale YOLO 模型部屬在HoloLens 2上即時辨識組裝狀態的幀率(frame rate)、延遲(latency)與辨識信心度(confidence)。實測結果顯示,四款模型在 HoloLens 2 上的辨識表現存在明顯差異。在固定輸入解析度為 160×160 pixels 並透過 Unity Sentis 進行推論時,YOLOv11n 的表現最佳,幀率達到約 10–11 FPS,端到端延遲約為 90–100 ms,且辨識信心度高達 0.935,這代表 YOLOv11n 能夠有效滿足大部分即時互動的要求,提供較為流暢與可靠的使用體驗。其次,YOLOv8n 與 YOLOv12n 模型的表現則稍遜於 YOLOv11n。YOLOv8n 的幀率為 9–10 FPS,延遲約 100–111 ms,信心度約 0.912;而 YOLOv12n 幀率約 8–10 FPS,延遲稍高至 111–143 ms,但辨識信心度約0.919則略高於 YOLOv8n。相較之下,YOLOv10n 則在三個指標中明顯落後,幀率僅約 5 FPS,延遲高達 200–250 ms,信心度也最低(0.887),此性能較不適合即時應用,僅推薦用於對即時性要求較低的離線情境。而透過進一步分析比較各模型辨識細節,發現四款模型在特定步驟(例如 LEGO® 組裝流程的 Step 5 與 Step 6)均出現信心度明顯下降的情形。推測此現象源於這些步驟中 LEGO® 零件的視覺特徵較不明顯,難以區分,建議未來研究可透過增強訓練資料的多樣性與應用多尺度特徵融合技術改善此問題。此外,本研究亦驗證了以 Unity Sentis 作為裝置端推論引擎的可行性,實測結果顯示所有模型均能夠成功在此框架上穩定運行,表示此推論引擎已具備足夠的穩定性與相容性,可適用於頭戴式AR 裝置。綜合而言,本研究之結果驗證 nano-scale YOLO 模型在計算資源受限之頭戴式 AR 裝置上具有實際部署之可行性,而其中 YOLOv11n 模型達到最為平衡的即時辨識效能與穩定性。本研究成果不僅為未來 AR 應用的模型選擇提供明確指引,亦提出資料多樣性、功耗管理及使用者體驗之改善建議,期望未來研究能進一步提升裝置端物件辨識效能與使用者的整體互動體驗。zh_TW
dc.description.abstractThis study investigates the performance and efficiency balance of four nano-scale YOLO models (YOLOv8n, YOLOv10n, YOLOv11n, and YOLOv12n) deployed on Microsoft HoloLens 2 for real-time object detection in augmented reality (AR) scenarios. With the increasing integration of AR and deep learning technologies, head-mounted devices such as HoloLens 2 are gaining prominence. However, limited computational resources on such edge devices present significant challenges for deploying effective object detection models. This research addresses this issue by evaluating the frame rate (frames per second, FPS), latency, and detection confidence of four lightweight YOLO models, using a LEGO® assembly scenario for practical testing. Experimental results reveal notable differences in performance among the four models under identical conditions (input resolution: 160×160 pixels, inference via Unity Sentis). YOLOv11n achieved the best overall performance, with approximately 10–11 FPS, an end-to-end latency of about 90–100 ms, and an average detection confidence of 0.935. This indicates YOLOv11n effectively meets the requirements of most real-time interactions, offering a smooth and reliable user experience. YOLOv8n and YOLOv12n ranked second, with slightly inferior performance compared to YOLOv11n. Specifically, YOLOv8n achieved a frame rate of about 9–10 FPS, a latency of around 100–111 ms, and a confidence score of 0.912. Meanwhile, YOLOv12n had a frame rate of approximately 8–10 FPS, higher latency at 111–143 ms, but a slightly better confidence score (0.919) than YOLOv8n. In contrast, YOLOv10n significantly lagged in all three metrics, delivering around 5 FPS, a latency of 200–250 ms, and the lowest detection confidence of 0.887, making it unsuitable for real-time applications and recommended only for scenarios with minimal real-time demands. Furthermore, detailed analysis of model performance revealed a noticeable confidence reduction in certain LEGO® assembly steps (particularly Steps 5 and 6) across all models. This drop likely resulted from less distinguishable visual features in these steps. Future research is recommended to improve this issue by enhancing dataset diversity and incorporating multi-scale feature fusion techniques. Additionally, this study validated Unity Sentis as a feasible on-device inference framework. All models successfully ran on this platform without memory overflow or system crashes, demonstrating sufficient stability and compatibility for wearable AR devices. In conclusion, this research confirms the practicality of deploying nano-scale YOLO models on resource-constrained AR head-mounted devices, highlighting YOLOv11n as the best model for balancing real-time detection performance and stability. The findings provide clear guidance for future model selection in AR applications and offer recommendations regarding dataset diversity, power management, and user experience enhancement, aiming to continually improve on-device inference performance and overall user interactions in future research.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:11:56Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-05T16:11:56Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
目次 vii
圖次 x
表次 xi
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 研究流程與章節安排 4
第二章 文獻回顧 6
2.1 擴增實境與裝置端推論 6
2.1.1 擴增實境互動的延遲與幀率需求 7
2.1.2 裝置端推論優勢與硬體挑戰 8
2.1.3 HoloLens 2硬體規格與運算瓶頸 10
2.2 物件辨識:從YOLO到nano-scale Models 11
2.2.1 YOLO系列架構 11
2.2.2 Nano-scale模型 14
2.3 裝置端推論與效能優化 15
2.3.1 Unity推論引擎比較(Barracuda vs. Sentis) 16
2.3.2 辨識解析度與模型大小 17
2.4 AR裝置效能評估指標 19
2.4.1 Frame rate 19
2.4.2 Latency 19
2.4.3 Confidence 20
2.4.4 綜合評比 21
第三章 研究方法 23
3.1 實驗設計 23
3.2 實驗設備 24
3.3 資料集建置 27
3.3.1 LEGO®組裝流程影像 27
3.3.2 資料標註與增強 29
3.4 YOLO nano模型訓練與部屬 30
3.4.1 訓練設定與超參數 30
3.4.2 ONNX格式匯出與Unity整合 31
3.4.3 HoloLens 2物件辨識程式架構 32
3.5 實驗流程 34
3.6 數據收集與評估指標 36
3.6.1 輸入影像 36
3.6.2 幀率(frame rate)、延遲(latency)與信心度(confidence) 37
3.7 研究目標 37
第四章 研究結果 39
4.1 HoloLens 2裝置端辨識結果 39
4.2 各模型frame rate表現 40
4.2.1 Nano-scale YOLO模型部署在HoloLens 2之frame rate 40
4.2.2 小結 41
4.3 各模型latency表現 42
4.3.1 Nano-scale YOLO模型部署在HoloLens 2之latency 42
4.3.2 小結 44
4.4 類別偵測confidence表現 44
4.4.1 Nano-scale YOLO模型部署在HoloLens 2之confidence 45
4.4.2 小結 46
第五章 結果與討論 48
5.1 主要發現與意涵 48
5.2 研究限制與未來方向 50
參考文獻 51
附錄 57
-
dc.language.isozh_TW-
dc.subjectHoloLens 2zh_TW
dc.subject擴增實境(AR)zh_TW
dc.subjectNano-scale YOLOzh_TW
dc.subject裝置端推論(on-device inference)zh_TW
dc.subject即時物件辨識zh_TW
dc.subjectNano-scale YOLOen
dc.subjectHoloLens 2en
dc.subjectReal-time Object Detectionen
dc.subjectOn-device Inferenceen
dc.subjectAugmented Reality (AR)en
dc.title在HoloLens2上部屬YOLO模型:平衡擴增實境應用中的效能與效率zh_TW
dc.titleDeploying YOLO Models on HoloLens 2: Balancing Performance and Efficiency in AR Applicationsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee瞿志行;李昀儒zh_TW
dc.contributor.oralexamcommitteeChih-Hsing Chu;Yun-Ju Leeen
dc.subject.keywordHoloLens 2,擴增實境(AR),Nano-scale YOLO,裝置端推論(on-device inference),即時物件辨識,zh_TW
dc.subject.keywordHoloLens 2,Augmented Reality (AR),Nano-scale YOLO,On-device Inference,Real-time Object Detection,en
dc.relation.page66-
dc.identifier.doi10.6342/NTU202502914-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-01-
dc.contributor.author-college工學院-
dc.contributor.author-dept機械工程學系-
dc.date.embargo-lift2025-08-06-
顯示於系所單位:機械工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf4.18 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved