請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88819完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蘇柏青 | zh_TW |
| dc.contributor.advisor | Borching Su | en |
| dc.contributor.author | 丁文淵 | zh_TW |
| dc.contributor.author | Wen-Yuan Ting | en |
| dc.date.accessioned | 2023-08-15T17:55:01Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-08-15 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-08-06 | - |
| dc.identifier.citation | [1] J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943–950, 1979.
[2] S. Araki, H. Sawada, R. Mukai, and S. Makino. DOA estimation for multiple sparse sources with normalized observation vector clustering. In Proc. ICASSP, pages 33–36, 2006. [3] M. R. Bai, J.-G. Ih, and J. Benesty. Acoustic array systems: theory, implementation, and application. John Wiley & Sons, 2013. [4] J. Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969. [5] J. H. DiBiase. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. PhD thesis, Brown University, Providence, R.I., 2000. [6] L. Drude, C. Boeddeker, J. Heymann, R. Haeb-Umbach, K. Kinoshita, M. Delcroix, and T. Nakatani. Integrating neural network based beamforming and weighted prediction error dereverberation. In Proc. INTERSPEECH, pages 3043–3047, 2018. [7] O. L. Frost. An algorithm for linearly constrained adaptive array processing. Proceedings of the IEEE, 60(8):926–935, 1972. [8] S.-W. Fu, Y. Tsao, H.-T. Hwang, and H.-M. Wang. Quality-Net: An end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344, 2018. [9] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, and Y. Tsao. MetricGAN+: An improved version of MetricGAN for speech enhancement. arXiv preprint arXiv:2104.03538, 2021. [10] E. A. P. Habets, J. Benesty, I. Cohen, S. Gannot, and J. Dmochowski. New insights into the MVDR beamformer in room acoustics. IEEE Transactions on Audio, Speech, and Language Processing, 18(1):158–170, 2009. [11] J. Heymann, L. Drude, and R. Haeb-Umbach. Neural network based spectral mask estimation for acoustic beamforming. In Proc. ICASSP, pages 196–200, 2016. [12] G. Huang, J. Benesty, I. Cohen, and J. Chen. A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1079–1093, 2020. [13] M.-W. Huang. Development of Taiwan Mandarin hearing in noise test. Master’s thesis, Department of speech language pathology and audiology, National Taipei University of Nursing and Health science, 2005. [14] W. Huang and J. Feng. Differential beamforming for uniform circular array with directional microphones. In Proc. INTERSPEECH, pages 71–75, 2020. [15] A. N. S. Institute S3.5-1997. Methods for calculation of the speech intelligibility index. American National Standards Institute (ANSI), 1997. [16] U. Kjems and J. Jensen. Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement. In Proc. EUSIPCO, pages 295–299. IEEE, 2012. [17] C. Knapp and G. C. Carter. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4):320–327, 1976. [18] H. Kuttruff and E. Mommertz. Room acoustics. In Handbook of engineering acoustics, pages 239–267. Springer, 2012. [19] B. Kwon, Y. Park, and Y.-S. Park. Analysis of the GCC-PHAT technique for multiple sources. In Proc. ICCAS, pages 2070–2073, 2010. [20] X. Le, H. Chen, K. Chen, and J. Lu. DPCRN: Dual-path convolution recurrent network for single channel speech enhancement. arXiv preprint arXiv:2107.05429, 2021. [21] N. Le Goff, J. Jensen, M. S. Pedersen, and S. L. Callaway. An introduction to opensound navigator™. Oticon A/S, 2016. [22] C. Li, J. Benesty, and J. Chen. Beamforming based on null-steering with small spacing linear microphone arrays. The Journal of the Acoustical Society of America, 143(5):2651–2665, 2018. [23] Y. Liu and D. Wang. Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):2092–2102, 2019. [24] P. C. Loizou. Speech enhancement: theory and practice. CRC press, 2007. [25] Y.-J. Lu, X. Chang, C. Li, W. Zhang, S. Cornell, Z. Ni, Y. Masuyama, B. Yan, R. Scheibler, Z.-Q. Wang, et al. ESPnet-SE+ +: Speech enhancement for robust speech recognition, translation, and understanding. arXiv preprint arXiv:2207.09514, 2022. [26] Y. Luo and N. Mesgarani. Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(8):1256–1266, 2019. [27] S. Markovich-Golan and S. Gannot. Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method. In Proc. ICASSP, pages 544–548, 2015. [28] U. Michel. History of acoustic beamforming. In Proc. 1st. BeBeC, pages 1–17, 2006. [29] S. Mohan, M. E. Lockwood, M. L. Kramer, and D. L. Jones. Localization of multiple acoustic sources with small arrays using a coherence test. The Journal of the Acoustical Society of America, 123(4):2136–2147, 2008. [30] R. P. Mueller, R. S. Brown, H. Hop, and L. Moulton. Video and acoustic camera techniques for studying fish under ice: a review and comparison. Reviews in Fish Biology and Fisheries, 16:213–226, 2006. [31] T. Ochiai, S. Watanabe, T. Hori, J. R. Hershey, and X. Xiao. Unified architecture for multichannel end-to-end speech recognition with neural beamforming. IEEE Journal of Selected Topics in Signal Processing, 11(8):1274–1288, 2017. [32] D. B. Paul and J. Baker. The design for the Wall Street Journal-based CSR corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, pages 357–362, 1992. [33] L. Pfeifenberger, M. Zöhrer, and F. Pernkopf. Deep complex-valued neural beamformers. In Proc. ICASSP, pages 2902–2906, 2019. [34] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In Proc. ICASSP, pages 749–752, 2001. [35] R. Roy and T. Kailath. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(7):984–995, 1989. [36] D. Salvati, C. Drioli, and G. L. Foresti. Incoherent frequency fusion for broadband steered response power algorithms in noisy environments. IEEE Signal Processing Letters, 21(5):581–585, 2014. [37] R. Scheibler, E. Bezzam, and I. Dokmanić. Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In Proc. ICASSP, pages 351–355, 2018. [38] H. Schepker, S. E. Nordholm, L. T. T. Tran, and S. Doclo. Null-steering beamformer-based feedback cancellation for multi-microphone hearing aids with incoming signal preservation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4):679–691, 2019. [39] H. Schepker, L. T. T. Tran, S. Nordholm, and S. Doclo. Acoustic feedback cancellation for a multi-microphone earpiece based on a null-steering beamformer. In Proc. IWAENC, pages 1–5, 2016. [40] H. Schepker, L. T. T. Tran, S. Nordholm, and S. Doclo. Null-steering beamformer for acoustic feedback cancellation in a multi-microphone earpiece optimizing the maximum stable gain. In Proc. ICASSP, pages 341–345, 2017. [41] R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276–280, 1986. [42] M. Souden, J. Benesty, and S. Affes. On optimal frequency-domain multichannel linear filtering for noise reduction. IEEE Transactions on audio, speech, and language processing, 18(2):260–276, 2009. [43] M. Souden, J. Chen, J. Benesty, and S. Affes. An integrated solution for online multichannel noise tracking and reduction. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2159–2169, 2011. [44] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2125–2136, 2011. [45] J. Thiemann, N. Ito, and E. Vincent. The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings. Proceedings of Meetings on Acoustics, 19(1):035081, 2013. [46] N. T. N. Tho, S. Zhao, and D. L. Jones. Robust DOA estimation of multiple speech sources. In Proc. ICASSP, pages 2287–2291, 2014. [47] W.-Y. Ting, S.-S. Wang, Y. Tsao, and B. Su. IANS: Intelligibility-aware null-steering beamforming for dual-microphone arrays. arXiv preprint arXiv:2307.04179, 2023. [48] H. L. Van Trees. Optimum array processing: Part IV of detection, estimation, and modulation theory. John Wiley & Sons, 2004. [49] A. Varga and H. J. M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247–251, 1993. [50] M. Wang, C. Boeddeker, R. G. Dantas, and A. Seelan. PESQ (perceptual evaluation of speech quality) wrapper for python users, May 2022. [51] D. B. Ward and R. C. Williamson. Beamforming for a source located in the interior of a sensor array. In Proc. ISSPA, volume 2, pages 873–876, 1999. [52] R. E. Zezario, S.-W. Fu, F. Chen, C.-S. Fuh, H.-M. Wang, and Y. Tsao. Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:54–70, 2022. [53] R. E. Zezario, S.-W. Fu, C.-S. Fuh, Y. Tsao, and H.-M. Wang. STOI-Net: A deep learning based non-intrusive speech intelligibility assessment model. In Proc. APSIPA, pages 482–486, 2020. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88819 | - |
| dc.description.abstract | 波束成形技術時常用於許多多通道語音增強系統中,以抑制降低語音理解度的指向性干擾訊號。傳統波束成形器通常基於「波達方向」(direction-of-arrival, DOA)、「能量頻譜密度」(power spectral density, PSD)、「相對轉移函數」(relative transfer function, RTF)、共變異數矩陣等參數的精準估計值來進行最佳化。但精準估計這一些參數有時候是一件很不容易的任務。在這一本論文中,我們提出了一個新的波束成形框架,此框架是基於一個能預測訊號的「短時客觀理解度」(short-time objective intelligibility, STOI) 的預訓練模型:STOI-Net 來提升吵雜語音訊號的理解度。該方法稱作「具理解度意識的零點控制波束成形技術」(intelligibility-aware null-steering beamforming, IANS)。吵雜語音訊號會先送進一群零點控制波束成形器來產生一序列的訊號。這一些訊號會再送進STOI-Net 來決定何者具有最高的理解度。實驗結果顯示我們可以利用一個雙麥克風陣列搭配我們提出的方法在多個情境中提升語音訊號的理解度。其STOI 增強效果類似於在已知目標以及干擾訊號之DOA 的狀況下所產生的波束成形結果。 | zh_TW |
| dc.description.abstract | Beamforming technology is commonly used in many multi-channel speech enhancement systems to suppress directional interfering signals that degrade speech intelligibility. Traditional beamformers are usually optimized based on accurate estimations of parameters such as the direction-of-arrival (DOA), power spectral densities, relative transfer functions, and covariance matrices. However, accurately estimating these parameters could be a challenging task. In this thesis, a novel beamforming framework is proposed to enhance the intelligibility of noisy speech signals based on a pre-trained short-time objective intelligibility (STOI) prediction model, STOI-Net. This framework is referred to as intelligibility-aware null-steering beamforming (IANS). The noisy speech signal is first sent into a set of null-steering beamformer to generate a set of signals. These signals are then sent into STOI-Net which determines the signal corresponding to the highest intelligibility. Experiment results show that our proposed method, using a two-channel microphone array, is capable of generating intelligibility-enhanced speech signals in multiple scenarios. These signals have STOI scores similar to those generated using beamforming methods given the DOAs of the speech and interfering signals. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:55:01Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-08-15T17:55:01Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員審定書 i
誌謝 iii 摘要 v Abstract vii 目錄 ix 圖目錄 xiii 表目錄 xv 第一章 緒論 1 第二章 相關研究 7 2.1 訊號模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 濾波和加總波束成形技術. . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 傳統MVDR/MPDR 波束成形技術. . . . . . . . . . . . . . . . . . . 11 2.4 傳統MVDR/MPDR 技術中的限制. . . . . . . . . . . . . . . . . . . 13 2.4.1 傳統DOA 估計演算法. . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 傳統共變異數矩陣估計法. . . . . . . . . . . . . . . . . . . . . . 15 2.4.3 Rxx[n, k] 估計法. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.4 Rii[n, k] 估計法. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.5 Rss[n, k] 估計法. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 零點控制波束成形. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.1 定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.2 例子. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.3 實務上的限制. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.3.1 有限訊號的影響. . . . . . . . . . . . . . . . . . . . 23 2.5.3.2 非自由場環境的影響. . . . . . . . . . . . . . . . . . 23 2.6 STOI-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 第三章 IANS最佳化 25 3.1 最佳化問題. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 最佳化演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 階段一:NSBF 階段. . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 階段二:STOI-Net 階段. . . . . . . . . . . . . . . . . . . . . . . 28 3.3 旁瓣訊號增強. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 第四章 實驗架設與結果分析 31 4.1 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.1 場景設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.2 訊號設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.3 IANS 參數設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 實驗比較對象. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 實驗結果(一) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 θs = 45◦(自由場) . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 θs = 45◦(RT60 = 150 毫秒) . . . . . . . . . . . . . . . . . . . . 37 4.2.3 θs = 90◦ (自由場) . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.4 θs = 90◦(RT60 = 150 毫秒) . . . . . . . . . . . . . . . . . . . . 40 4.3 實驗結果(二) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4 實驗結果(三) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 實驗結果(四) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 第五章 總結 53 參考文獻 55 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 短時客觀理解度 | zh_TW |
| dc.subject | 波束成形 | zh_TW |
| dc.subject | STOI-Net | zh_TW |
| dc.subject | 零點控制 | zh_TW |
| dc.subject | beamforming | en |
| dc.subject | STOI-Net | en |
| dc.subject | null-steering | en |
| dc.subject | STOI | en |
| dc.title | 具有預測理解度之預訓練模型和零點控制波束成形技術的雙通道語音增強系統 | zh_TW |
| dc.title | A Two-channel Speech Enhancement System with a Pre-trained Intelligibility Prediction Model and Null-steering Beamforming | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 曹昱;劉俊麟;彭盛裕 | zh_TW |
| dc.contributor.oralexamcommittee | Yu Tsao;Chun-Lin Liu;Sheng-Yu Peng | en |
| dc.subject.keyword | 波束成形,零點控制,短時客觀理解度,STOI-Net, | zh_TW |
| dc.subject.keyword | beamforming,null-steering,STOI,STOI-Net, | en |
| dc.relation.page | 61 | - |
| dc.identifier.doi | 10.6342/NTU202302638 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-08-09 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電信工程學研究所 | - |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf | 7.32 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
