多模態人機共創框架：自動輪椅舞蹈生成系統

林宥成; Yu-Cheng Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102235

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陽毅平	zh_TW
dc.contributor.advisor	Yee-Pien Yang	en
dc.contributor.author	林宥成	zh_TW
dc.contributor.author	Yu-Cheng Lin	en
dc.date.accessioned	2026-04-08T16:31:06Z	-
dc.date.available	2026-04-09	-
dc.date.copyright	2026-04-08	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-25	-
dc.identifier.citation	[1] Y. H. Chen, “以反向傳播神經網路適應性控制實現動力輪馬達電動雙人輪椅舞,” Master’s thesis, National Taiwan University, Taipei, Taiwan, 2024. [2] World Health Organization and UNICEF, “Global report on assistive technology,” Geneva, Switzerland, 2022, accessed: 2025-11-09. [Online]. Available: https://apps.who.int/iris/handle/10665/354357 [3] Ministry of the Interior, Taiwan, “2023 statistical yearbook of interior,” Official Statistics Report, 2023, accessed: 2025-11-09. [Online]. Available: https://www.moi.gov.tw/ [4] E. Freiberger, C. Sieber, and R. Kob, “Mobility in older community-dwelling persons: A narrative review,” Frontiers in Physiology, vol. 11, p. 881, 2020. [5] A.-K. Welmer, D. Rizzuto, and Q.-L. Qi, “Association of cardiovascular burden with mobility limitation among elderly people,” PLoS ONE, vol. 8, no. 5, p. e65615, 2013. [6] M. Kalu and S.-Y. e. a. Lu, “Cognitive, psychological and social factors associated with mobility in older adults,” Psychogeriatrics, vol. 23, no. 5, pp. 723–734, 2023. [7] S. A. e. a. Conger, “The compendium of wheelchair physical activities: An update,” Journal of Physical Activity and Health, vol. 18, no. 1, pp. S1–S36, 2021. [8] X. Liu and P. e. a. Clarke, “Incidence and dynamics of mobility device use among community-dwelling older adults in the United States,” Innovation in Aging, vol. 7, no. 2, p. igad023, 2023. [9] F. Andrade and M. Campos, “Factors associated with use of assistive walking devices among older adults in Brazil,” Cadernos Saúde Coletiva, vol. 30, pp. 314–324, 2022. [10] W. Cheng and L. Zhang, “Need for assistive walking devices among older adults in Shanghai,” Bioscience Trends, vol. 17, no. 3, pp. 227–235, 2023. [11] P. Clarke and A. Chan, “The use of mobility devices among institutionalized older adults,” Journal of Aging and Health, vol. 21, no. 4, pp. 611–626, 2009. [12] J. Leaman and H. M. La, “A survey of smart wheelchairs in the context of human robot interaction,” IEEE Transactions on Human-Machine Systems, vol. 47, no. 4, pp. 486–499, 2017. [13] H. e. a. Grewal, “Autonomous wheelchair navigation: A systematic review,” IET Cyber-Systems and Robotics, 2018, early access. [14] K. e. a. Narumi, “Implementation of ballroom dance robot for wheelchair users,” in 2018 IEEE/SICE International Symposium on System Integration (SII), 2018, pp. 568–573. [15] R. e. a. Li, “AI choreographer: Music-conditioned 3D dance generation with AIST++,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. [16] L. e. a. Swartz, “Inclusion and physical activity: A review of the literature,” Disability and Rehabilitation, 2020. [17] OpenAI, “Whisper: Robust speech recognition via large-scale weak supervision,” Technical Report, 2023, accessed: 2025-11-09. [Online]. Available: https://openai.com/research/whisper [18] P. Grosche, M. Müller, and F. Kurth, “Cyclic tempogram—a mid-level tempo representation for music signals,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2010, pp. 5522–5525. [19] Q. Kong, Y. Cao, T. Iqbal, Y. Wang, M. D. Plumbley, and W. Wang, “PANNs: Large scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020. [20] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997. [21] P. W. Wu, “雙人動力輪椅追蹤與互動式輪椅舞,” Master’s thesis, National Taiwan University, Taipei, Taiwan, 2025. [22] C. H. Chang, “電動輪椅舞蹈控制系統：基於姿態識別的互動與同步策略,” Master’s thesis, National Taiwan University, Taipei, Taiwan, 2025. [23] A. e. a. Radford, “Robust speech recognition via large-scale weak supervision,” OpenAI Technical Report, 2023. [24] B. e. a. McFee, “Librosa: Audio and music signal analysis in python,” in Proceedings of the 14th Python in Science Conference, 2015. [25] D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed., 2023, draft. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/ [26] J. Wei, X. Wang, and D. e. a. Schuurmans, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems (NeurIPS), 2022. [Online]. Available: https://arxiv.org/abs/2201.11903 [27] L. Reynolds and K. McDonell, “Prompt programming for large language models: Beyond the few-shot paradigm,” arXiv preprint arXiv:2102.07350, 2021. [Online]. Available: https://arxiv.org/abs/2102.07350 [28] P. Liu, W. Yuan, and J. e. a. Fu, “Prompt engineering for large language models: Beyond the basics,” in Proceedings of ACL 2023 Tutorial, 2023. [Online]. Available: https://arxiv.org/abs/2304.14068 [29] D. Fox, “KLD-sampling: Adaptive particle filters,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 14, 2001, pp. 713–720. [30] T. Flash and N. Hogan, “The coordination of arm movements: An experimentally confirmed mathematical model,” Journal of Neuroscience, vol. 5, no. 7, pp. 1688-1703, 1985. [31] S. Balasubramanian, A. Melendez-Calderon, and E. Burdet, “A robust and sensitive metric for quantifying movement smoothness,” IEEE Transactions on Biomedical Engineering, vol. 59, no. 8, pp. 2126–2136, 2012. [32] K. Miura, M. Morisawa, S. Nakaoka, F. Kanehiro, K. Harada, S. Kajita, and H. Hirukawa, “Robot motion generation method for dancing robot based on music,” in 2006 IEEE International Conference on Robotics and Automation. IEEE, 2006, pp. 2943–2948. [33] K. Murata, K. Nakadai, K. Yoshii, M. Goto, and H. G. Okuno, “A beat-tracking robot for human-robot interaction and its evaluation,” in 2008 IEEE-RAS International Conference on Humanoid Robots. IEEE, 2008, pp. 79–84. [34] A. Camurri, I. Lagerlöf, and G. Volpe, “Recognizing emotion from dance movement: Comparison of spectator recognition and automated techniques,” International Journal of Human-Computer Studies, vol. 59, no. 1–2, pp. 213–225, 2003. [35] R. Li, S. Yang, D. A. Ross, and A. Kanazawa, “AI choreographer: Music conditioned 3D dance generation with AIST++,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13401–13412. [36] L. Siyao, W. Yu, T. Gu, C. Lin, Q. Wang, C. Qian, C. C. Loy, and Z. Liu, “Bailando: 3D dance generation by actor-critic GPT with choreographic memory,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11050–11059. [37] E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, pp. 5026–5033. [38] N. Koenig and A. Howard, “Design and use paradigms for Gazebo, an open-source multi-robot simulator,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004, pp. 2149–2154. [39] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge, MA: MIT Press, 2005. [40] R. Siegwart and I. Nourbakhsh, Introduction to Autonomous Mobile Robots. Cambridge, MA: MIT Press, 2004. doi:10.6342/NTU202600280 [41] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for grid mapping with Rao-Blackwellized particle filters,” IEEE Transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007. [42] A. Doucet, N. de Freitas, K. Murphy, and S. Russell, “Rao-Blackwellised particle filtering for dynamic Bayesian networks,” in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2000, pp. 176–183.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102235	-
dc.description.abstract	本研究旨在建構一套結合語音理解 (automatic speech recognition, ASR)、音樂節奏分析 (music information retrieval, MIR) 與智慧控制之電動輪椅舞蹈系統，使使用者可透過自然語言與音樂輸入，實現具節奏性與創造性的移動表演。系統以 ROS2 為核心通訊框架，整合語音辨識、自然語意解析 (large language model, LLM)、音樂拍點分析與即時控制模組，形成「語音—語意—音樂—動作—控制」之跨模態閉環架構。在輸入層，系統以 Whisper 模型進行語音轉錄，並透過 LLM 分析語意與情緒描述以生成舞蹈規範 (dance specification)。音樂層採用節拍追蹤與節奏強度時序分析分析萃取拍點與速度資訊，作為動作時間軸對齊依據。控制層方面，以 ROS2 Nav2 架構結合 AMCL (adaptive monte carlo localization, AMCL) 定位與里程計感測融合，確保舞步執行之平滑與安全性；最終生成之輪椅車輪速度命令 (v, ω) 會由上位控制器 Arduino Mega 2560 (以下簡稱 Arduino) 透過差動訊號傳送至數位訊號處理驅動模組，完成雙輪閉迴路控制。本研究亦對辨識層與控制層進行量化驗證。節拍偵測之平均延遲約 4–5 ms，小節動態對齊率 (measure dynamic alignment, MDA) 於不同曲風中達 87–93%，顯示系統能精準貼齊節奏結構。控制層採用速度控制與加速度平滑化策略，使速度抖動率 (velocity jitter ratio, VJR) 相較傳統純速度控制降低約 40–60%，並有效抑制瞬時加速度尖峰。定位精度方面，AMCL 融合可將長距離位姿誤差由 0.8–1.0 m 降至 0.05–0.13 m，提升約 90% 的軌跡穩定性。在障礙物環境測試中，本研究架構達成跳舞兼顧避障之功能，並能保持舞步節奏一致性，展現優於傳統純速度控制之環境適應力。綜合而言，本研究實現具創造性與互動性的電動輪椅舞蹈控制平台，不僅可作為輔助科技與藝術表演之跨域應用原型，亦為未來智慧輔具結合生成式人工智慧 (artificial intelligence, AI) 與人機共演提供新方向。	zh_TW
dc.description.abstract	This thesis aims to develop an electric wheelchair dance system that integrates speech understanding (automatic speech recognition, ASR), musical rhythm analysis (music information retrieval, MIR), and intelligent control, enabling users to perform rhythmic and creative movements through natural language and music inputs. The system is built upon the ROS2 communication framework, incorporating Automatic Speech Recognition, semantic interpretation using large language models (LLM), beat analysis, and real-time control modules, forming a cross-modal closed-loop architecture of “Speech–Semantics–Music–Motion–Control.” In the input layer, the Whisper model is used for speech transcription, and an LLM interprets semantic and emotional descriptions to generate a dance specification. The music layer applies beat tracking and tempogram analysis to extract tempo and onset features as temporal references for motion alignment. In the control layer, the ROS2 Nav2 framework combines adaptive monte carlo localization (AMCL) with odometry sensor fusion to ensure smooth and safe execution of dance trajectories. The final velocity commands (v, ω) are transmitted via RS-485 from the upper controller (Arduino Mega 2560) to the DSP motor driver, enabling dual-wheel closed-loop control. This thesis also conducts quantitative evaluations on the perception and control modules. The average beat detection latency is approximately 4–5 ms, while the measure dynamic alignment (MDA) reaches 87–93% across different musical styles, demonstrating precise alignment with rhythmic structures. The adoption of velocity control with acceleration smoothing reduces the velocity jitter ratio (VJR) by approximately 40–60% compared with traditional open-loop velocity control, effectively suppressing instantaneous acceleration spikes. For localization accuracy, AMCL reduces long-distance pose drift from 0.8–1.0 m to 0.05–0.13 m, achieving an improvement of nearly 90%. In obstacle-rich environments, the proposed system successfully performs dance movements while simultaneously avoiding collisions, maintaining rhythmic consistency and demonstrating superior adaptability compared with traditional velocity control. In conclusion, this thesis presents an intelligent and interactive electric wheelchair dance control platform that serves as a cross-domain prototype integrating assistive technology and performing arts. It also provides a new direction for future assistive devices that combine generative artificial intelligence (AI) with human–robot co-performance.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-08T16:31:06Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-04-08T16:31:06Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 …… i 致謝 …… ii 摘要 …… iv Abstract …… vi 目次 …… ix 圖次 …… xv 表次 …… xix 符號列表 …… xxi 第一章　緒論 …… 1 1.1 研究背景 …… 1 　1.1.1 高齡化社會與輔助科技需求 …… 1 　1.1.2 高齡者使用行動輔具之國際比較 …… 2 1.2 政策與社會推動 …… 3 1.3 智慧輪椅與互動舞蹈研究趨勢 …… 3 　1.3.1 智慧輪椅技術發展：AI、感測融合、人機介面 …… 3 　1.3.2 互動舞蹈與輪椅舞研究 …… 4 1.4 研究動機與目標 …… 5 1.5 研究方法概述 …… 7 1.6 文獻回顧 …… 9 　1.6.1 語音與語意理解 …… 10 　1.6.2 音樂資訊檢索與節拍分析 …… 10 　1.6.3 導航與感測融合技術 …… 11 1.7 本研究貢獻 …… 11 1.8 章節安排 …… 13 第二章　電動輪椅整車系統與硬體架構 …… 15 2.1 過往輪椅舞系統回顧 …… 15 　2.1.1 第一代：基於音樂節奏與肢體計畫之電動輪椅舞 …… 15 　2.1.2 第二代：基於姿態識別之電動輪椅舞 …… 16 　2.1.3 本研究與過往兩代系統比較 …… 17 2.2 電動輪椅平台與模組配置說明 …… 18 2.3 整車硬體架構與模組配置 …… 19 2.4 硬體規格總覽 …… 20 2.5 本研究電動輪椅舞系統概述 …… 20 2.6 主控模組（ROS2 運算平台） …… 22 　2.6.1 主控模組之系統角色與設計定位 …… 22 　2.6.2 作業系統與通訊架構選擇（Ubuntu 22.04 與 ROS2 Humble） …… 23 　2.6.3 線上與下位控制器之資料流關係 …… 23 　2.6.4 主控模組之系統架構與通訊說明 …… 23 2.7 上位控制器（Arduino Mega 2560） …… 24 2.8 下位控制器（FOC 馬達驅動器） …… 29 　2.8.1 硬體驅動電路系統 …… 30 　2.8.2 FOC 控制流程與功能模組 …… 31 　　2.8.2.1 座標轉換數學模型 …… 32 　　2.8.2.2 雙閉迴路控制架構 …… 33 　　2.8.2.3 SVPWM 與諧波注入技術 …… 34 　2.8.3 TMS320F28069 DSP 規格與特性 …… 36 2.9 動力模組 …… 37 　2.9.1 永磁同步馬達電氣方程式 …… 38 　2.9.2 永磁同步馬達機械方程式 …… 40 　2.9.3 永磁同步馬達簡化模型 …… 42 2.10 感測模組 …… 44 　2.10.1 LiDAR、編碼器與手動搖桿模組 …… 44 　　(1) LiDAR – Richbeam LDK-360 …… 45 　　(2) 動力輪編碼器（Encoder） …… 46 　　(3) 顯示控制器（手動控制介面） …… 47 2.11 電源模組 …… 48 2.12 輪椅整車動態方程式 …… 49 　2.12.1 輪轂動力學 …… 49 　2.12.2 控制用動力學建立 …… 50 　2.12.3 含摩擦之修正控制用方程 …… 52 　2.12.4 外加負載與整車推動方程圖 …… 53 2.13 小結 …… 54 第三章　自動編舞系統使用者介面 …… 57 3.1 PyQt5 圖形化介面設計與研究角色 …… 57 3.2 系統整體架構概述 …… 58 　3.2.1 音樂特徵分析 …… 62 3.3 使用流程與系統階段說明 …… 64 　3.3.1 Stage 1：語音輸入與語意理解 …… 65 　3.3.2 Stage 2：音樂特徵分析 …… 68 　3.3.3 Stage 3：舞蹈規劃生成與互動 …… 72 　3.3.4 Stage 4：動作序列生成與節拍對齊 …… 76 　3.3.5 Stage 5：導航控制與實體執行 …… 81 第四章　研究方法 …… 85 4.1 系統整體架構 …… 85 4.2 語音辨識與語意理解 …… 86 　4.2.1 語音辨識模型 …… 86 　4.2.2 語意解析與關鍵詞抽取 …… 87 4.3 音樂特徵分析 …… 91 　4.3.1 節拍追蹤與節奏建模 …… 91 　4.3.2 音樂風格分類 …… 94 4.4 編舞規劃生成與 LLM 模組 …… 96 　4.4.1 舞蹈規劃之條件生成模型 …… 97 　4.4.2 迭代式指令規劃（iterative prompting） …… 98 　4.4.3 生成結果之角色定位 …… 99 4.5 動作序列生成與節拍對齊 …… 100 　4.5.1 固定拍數分組之節奏對齊策略 …… 103 4.6 編舞執行層：ROS2 導航與感測融合 …… 107 　4.6.1 動態視窗（DWB）局部規劃與運動控制 …… 108 　4.6.2 定位與感測融合 …… 109 　4.6.3 速度限制與安全控制 …… 112 4.7 舞蹈執行性能量化指標 …… 115 　4.7.1 速度抖動率（VJR） …… 115 　4.7.2 事件與節拍同步誤差（EBA） …… 118 　4.7.3 小節動態對齊率（MDA） …… 119 　4.7.4 節奏反應指數（RRI） …… 121 第五章　實驗與研究成果 …… 125 5.1 多模擬環境動力學驗證與軌跡分析 …… 125 　5.1.1 URDF 建模與座標系統驗證 …… 127 　5.1.2 MuJoCo：動力學 Joint State 行為驗證 …… 129 　5.1.3 Gazebo：完整模擬環境與行為分析 …… 131 　5.1.4 軟體模擬與實體應用落差（sim-to-real gap）與對策 …… 132 5.2 里程計驗證實驗 …… 134 5.3 SLAM 建圖與環境驗證 …… 135 5.4 AMCL 定位與 TF 精度 …… 136 5.5 障礙物環境下之舞蹈軌跡比較實驗 …… 143 5.6 舞蹈表現指標之實體應用與量測設定 …… 145 　5.6.1 速度抖動行為之實驗量測（VJR） …… 145 　5.6.2 動作事件與節拍同步之實驗分析（EBA） …… 146 　5.6.3 小節層級動態變化之對齊評估（MDA） …… 146 　5.6.4 整體節奏響應度之量化分析（RRI） …… 147 5.7 實驗結果比較與討論 …… 147 　5.7.1 音樂同步性與表現力分析 …… 148 第六章　總結與未來展望 …… 153 6.1 總結 …… 153 6.2 未來展望 …… 154 參考文獻 …… 155 附錄 A　使用者介面操作手冊 …… 161 A.1 使用者介面部層位置與操作角色 …… 161 A.2 遠端連線方式與使用工具 …… 161 　A.2.1 SSH 終端機連線（指令操作） …… 162 　A.2.2 遠端桌面（圖形化操作） …… 162 　A.2.3 實際展示建議方式 …… 163 A.3 系統整體運作架構 …… 163 A.4 系統啟動流程 …… 163 A.5 操作注意事項與系統限制 …… 164	-
dc.language.iso	zh_TW	-
dc.subject	電動輪椅	-
dc.subject	ROS2	-
dc.subject	語音辨識	-
dc.subject	音樂資訊檢索	-
dc.subject	跨模態控制	-
dc.subject	生成式人工智慧	-
dc.subject	Electric wheelchair	-
dc.subject	ROS2	-
dc.subject	Speech recognition	-
dc.subject	Music information retrieval	-
dc.subject	Cross- modal control	-
dc.subject	Generative AI	-
dc.title	多模態人機共創框架：自動輪椅舞蹈生成系統	zh_TW
dc.title	A Multimodal Human-Robot Co-Creation Framework for Autonomous Wheelchair Dancing	en
dc.type	Thesis	-
dc.date.schoolyear	114-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	楊士進;陳冠任	zh_TW
dc.contributor.oralexamcommittee	Shih-Chin Yang;Guan-Ren Chen	en
dc.subject.keyword	電動輪椅,ROS2語音辨識音樂資訊檢索跨模態控制生成式人工智慧	zh_TW
dc.subject.keyword	Electric wheelchair,ROS2Speech recognitionMusic information retrievalCross- modal controlGenerative AI	en
dc.relation.page	164	-
dc.identifier.doi	10.6342/NTU202600280	-
dc.rights.note	未授權	-
dc.date.accepted	2026-02-25	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf 未授權公開取用	23.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。