可撓式壓阻感測器之無聲語音辨識系統設計與無線氣動手掌控制之應用

吳世瑜; Shih-Yu Wu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97922

Title:	可撓式壓阻感測器之無聲語音辨識系統設計與無線氣動手掌控制之應用 Design of a Silent Speech Recognition System Based on Flexible Piezoresistive Sensors and Its Application in Wireless Pneumatic Hand Control
Authors:	吳世瑜 Shih-Yu Wu
Advisor:	劉建豪 Chien-Hao Liu
Keyword:	無聲語音辨識,可撓式,壓阻式感測器,單晶矽,微機電系統,機器學習,氣動手掌, Silent Speech Recognition,Flexible,Piezoresistive Sensor,Single-Crystal Silicon,Micro-Electro-Mechanical Systems (MEMS),Machine Learning,Pneumatic Hand,
Publication Year :	2025
Degree:	碩士
Abstract:	近年來，無聲語音辨識系統 (Silent speech recognition) 因其在特殊溝通場景下的潛力而備受關注。然而，現有技術多依賴肌電訊號或影像辨識，常面臨侵入性、設備體積龐大或易受環境干擾等挑戰，限制了其實際應用。與此同時，多數研究致力於辨識大規模詞彙，導致系統複雜度與運算負擔俱增，難以實現即時控制。為此，本研究另闢蹊徑，採取一種目標導向的精簡化策略，專注於辨識一組有限但關鍵的指令，旨在開發一套反應快速、低延遲且高度可靠的無聲語音辨識系統，並以無線氣動仿生手掌之即時控制作為系統效能的最終驗證。本系統硬體部分包含可撓式單晶矽壓阻感測器、惠斯通電橋放大電路、資料擷取器以及用於訊號處理的電腦。系統針對六種目標指令嘴型，擷取因無聲發音引致臉部多處微小變形所對應的時變電阻訊號。這些原始訊號經過前處理、特徵萃取與選擇後，採用隨機森林 (Random forest) 模型進行分類模型的建立。為提升系統穩定性並降低誤觸，訓練資料中特別納入了佔總資料量50%的「空白指令」。此類別不僅包含日常口部動作，更刻意選用發音嘴型與目標指令相似的混淆詞彙進行訓練。本研究招募三位受試者，將八通道感測陣列對稱貼附於其臉頰及下顎區域，以擷取多通道的表面形變訊號。此外，為有效過濾非指令動作，我們透過分類最大正確率閾值的後處理，此舉顯著提升了系統在真實應用中的可靠性。最終，系統將辨識出的無聲指令透過藍牙即時無線傳輸至以Arduino微控制器為核心的氣動仿生手掌控制系統，成功實現了從無聲語音輸入到具體手勢動作輸出的完整應用流程。本論文的主要研究成果包含： (1) 提出一套精簡化、可穿戴的無聲語音辨識系統，應用於無線控制氣動仿生手掌。經由三位受試者及總計1008筆資料的驗證，系統平均辨識準確率達到91.1%，macro-F1分數為0.91。 (2) 所開發之單晶矽壓阻感測器在30%拉伸應變條件下，其靈敏度 (Gauge factor) 達到5.5，展現了優異的力學感測性能與可靠性。 (3) 透過特徵索引優先載入等處理流程優化，單次指令分類的平均處理時間從3184毫秒顯著縮短至1164毫秒。這些成果成功展現了本系統在輔助溝通與人機互動介面領域，具備快速處理能力與實際應用的潛力。 Silent Speech Recognition (SSR) has garnered considerable attention in recent years due to its potential in specialized communication scenarios. However, existing technologies, often reliant on electromyography (sEMG) or image recognition, face challenges such as invasiveness, bulkiness, and susceptibility to environmental interference, limiting their practical application. Concurrently, many studies focus on large-vocabulary recognition, leading to increased system complexity and computational load, thereby hindering real-time control. To address this, our research pioneers a target-oriented, streamlined strategy, focusing on recognizing a limited yet critical set of commands. The aim is to develop a responsive, low-latency, and highly reliable SSR system, with its performance ultimately validated through the real-time control of a wireless pneumatic bionic hand. The system hardware comprises flexible single-crystal silicon piezoresistive sensors, a Wheatstone bridge amplification circuit, a data acquisition (DAQ) unit, and a computer for signal processing. The system targets six command-specific mouth shapes, capturing time-varying resistance signals corresponding to minute facial deformations caused by silent articulation. These raw signals undergo preprocessing, feature extraction, and feature selection, followed by model construction and classification using a Random Forest algorithm. To enhance system stability and reduce false positives, the training dataset uniquely incorporates a 50% proportion of "blank commands." This category includes not only daily oral movements but also deliberately chosen confounding words with mouth shapes similar to the target commands. Three participants were recruited for this study, with an eight-channel sensor array symmetrically attached to their cheek and jaw regions to capture multi-channel surface deformation signals. Furthermore, a post-processing strategy based on a maximum classification accuracy threshold was implemented to effectively filter non-command actions, significantly improving system reliability in real-world applications. Finally, the recognized silent commands are transmitted in real-time via Bluetooth to a pneumatic bionic hand control system centered around an Arduino microcontroller, successfully demonstrating a complete application pipeline from silent speech input to tangible gesture output. The main contributions of this thesis include: (1) The development of a streamlined, wearable SSR system for wireless control of a pneumatic bionic hand. Validated with three participants and a total of 1008 data samples, the system achieved an average recognition accuracy of 91.1% and a macro-F1 score of 0.91. (2) The developed single-crystal silicon piezoresistive sensor exhibited a Gauge Factor (GF) of 5.5 under 30% tensile strain, demonstrating excellent mechanical sensing performance and reliability. (3) Through processing flow optimizations, such as prioritized feature index loading, the average single-command classification time was significantly reduced from 3184 ms to 1164 ms. These achievements successfully demonstrate the system's rapid processing capabilities and practical application potential in assistive communication and human-machine interaction.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97922
DOI:	10.6342/NTU202501758
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2025-07-24
Appears in Collections:	機械工程學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf	30.71 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets