時頻模型之盲攻擊應用於語音辨識應用程式

陳怡安; Yi-An Chen

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96307

Title:	時頻模型之盲攻擊應用於語音辨識應用程式 Blind Adversarial Attack Based on Time-Frequency Model for Speech Recognition API
Authors:	陳怡安 Yi-An Chen
Advisor:	丁建均 Jian-Jiun Ding
Keyword:	語音辨識,對抗式攻擊,對抗樣本,短時傅立葉變換,加伯轉換, Automatic Speech Recognition,Adversarial Attack,Adversarial Example,Short-Time Fourier Transform,Gabor Transform,
Publication Year :	2024
Degree:	碩士
Abstract:	隨著智慧型設備在我們日常生活中的普及，我們的隱私暴露程度也隨之增加。為了應對隱私保護和這些設備的穩健性測試，Ian J. Goodfellow等人於2014年提出了對抗性攻擊的概念。最初，對抗性攻擊主要應用於影像辨識任務。然而，由於音訊資料的獨特性，大多數現代研究仍集中於基於圖像的攻擊，主要涉及添加擾動。在本研究中，我們介紹了十一種不同的噪音添加方法和三種降低精度的技術，用於產生自動語音辨識（ASR）系統的對抗樣本。值得注意的是，三種降低精度的方法在攻擊效果上始終優於十一種噪音添加技術。我們提出的方法利用了透過濾波和時頻變換提取的音頻特徵。使用我們的方法產生的對抗樣本不僅保留了對人類聽眾的可理解性以及相對較高的語音品質，而且在對未知架構和參數的ASR系統進行盲攻擊時取得了100%的成功率。此外，為了展示這些對抗樣本的不同特徵，我們使用短時傅立葉變換（STFT）和加伯轉換（Gabor Transfrom）進行比較分析。這項比較旨在闡明我們提出的方法在音訊資料對抗攻擊中的獨特影響和效果。 As intelligent devices become increasingly prevalent in our daily lives, the degree of our privacy exposure has similarly risen. To address the issue of privacy protection, the topic of adversarial attack appeared in recent year. Initially, adversarial attacks were predominantly applied to image recognition tasks. However, due to the unique characteristics of audio data, most contemporary research on adversarial attacks remains focused on image-based attacks, which are primarily centered around additive perturbations. In this study, we introduce eleven distinct methods for adding noise and three precision-reducing techniques to generate adversarial examples for automatic speech recognition (ASR) systems. Remarkably, the three precision-reducing methods outperform the eleven noise-adding techniques in terms of attack effectiveness. Our proposed approach leverages audio features extracted through filtering and time-frequency transformations. The adversarial samples generated using our methodology not only retain their intelligibility for human listeners, as well as relatively higher audio quality, but also achieve a 100% success rate in blind attacks against ASR systems with unknown architectures and parameters. Furthermore, to illustrate the varied characteristics of these adversarial samples, we conduct a comparative analysis using Short-Time Fourier Transform (STFT) and Gabor Transform to depict the time-frequency representations. This comparison aims to elucidate the distinct impact and effectiveness of our proposed methods in the context of adversarial attacks on audio data.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96307
DOI:	10.6342/NTU202404690
Fulltext Rights:	未授權
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-113-1.pdf Restricted Access	87.09 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets