卷積神經網路之語音密碼系統

Chun-Cheng Mai; 麥鈞程

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21702

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁肇隆
dc.contributor.author	Chun-Cheng Mai	en
dc.contributor.author	麥鈞程	zh_TW
dc.date.accessioned	2021-06-08T03:43:11Z	-
dc.date.copyright	2019-07-02
dc.date.issued	2019
dc.date.submitted	2019-06-09
dc.identifier.citation	REFERENCE [1] Kotsiantis Sotiris B. , K. D., Pintelas Panayiotis E. (2007). Data Preprocessing for Supervised Leaning. [2] Ayad, B., Faucon, Gérard , Bouquin-Jeannès, Régine Le (1996). Optimization of a noise reduction preprocessing in an acoustic echo and noise controller. ICASSP. [3] Shen Jia-Lin, H. J.-W., Lee, Lin-Shan (1998). Robust entropy-based endpoint detection for speech recognition in noisy environments. ICSLP. [4] B. Atal, Rabiner, Lawrence, “A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition,” IEEE Trans on Acoustics, Speech, and Signal Proc., pp. 201-12, 1976. [5] A. Ganapathiraju, L. W., J. Trimble, K. Bush, P. Kornman (1996). Comparison of energy-based endpoint detectors for speech signal processing Proceedings of The IEEE - PIEEE. [6] R. Vergin, D. O. S. (1995). Pre-emphasis and speech recognition. Canadian Conference on Electrical and Computer Engineering. Montreal, Quebec, Canada, Canada. [7] D. P. C. Aparna Ramdoss “A STUDY ON IMPACT OF VARIOUS WINDOWING TECHNIQUES IN CONTINUOUS SPEECH SIGNAL SEGMENTATION,” International Journal of Applied Engineering Research, pp. 394-401, 2015. [8] P. M. Steven B. Davis “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences ” ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE TRANSACTIONS ON, pp. 357-366 1980. [9] C. C.-f. Han Wei, Choy Oliver Chiu-sing, Pun Kong-Pang, “An efficient MFCC extraction method in speech recognition,” 2006 IEEE International Symposium on Circuits and Systems, pp. 4 pp.-, 2006. [10] K. I. Dash, D. P., Banshidhar Panda, Prof. Sanghamitra Mohanty (2012). Speaker Identification using Mel Frequency Cepstral Coefficient and BPNN. [11] D. Graupe, Principles of Artificial Neural Networks: World Scientific Publishing Co., Inc., 2007. [12] V. E. B. Marius-Constantin Popescu, Liliana Perescu-Popescu, Nikos Mastorakis “Multilayer perceptron and neural networks,” WSEAS Trans. Cir. and Sys., vol. 8, pp. 579-588, 2009. [13] Cun, Y. A. L. (1988). A Theoretical Framework for Back-Propagation. [14] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” in Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, 2007, pp. 3-24. [15] L. B. Y. LeCun, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE pp. 2278 - 2324, Nov 1998. [16] Youssef, A. (2007). Image Downsampling and Upsampling Methods 1. [17] Caruana, R., Lawrence, Steve,Giles, C. Lee (2000). Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping. NIPS. [18] T. G. Dietterich, “Overfitting and Undercomputing in Machine Learning,” ACM Comput. Surv., vol. 27, pp. 326-327, 1995. [19] Pham, H. N. A., Triantaphyllou, Evangelos (2008). The Impact of Overfitting and Overgeneralization on the Classification Accuracy in Data Mining. [20] Hauberg, S., Freifeld, Oren , Larsen, Anders Boesen Lindbo , Fisher, John W. , Hansen, Lars Kai (2016). Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation. AISTATS. [21] N. Takahashi, Gygli, Michael, Pfister, Beat, Gool, Luc Van, “Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection,” CoRR, vol. abs/1604.07160, 2016, 2016. [22] B. Xu, Wang, Naiyan , Chen, Tianqi , Li, Mu, “Empirical Evaluation of Rectified Activations in Convolutional Network,” CoRR, vol. abs/1505.00853, 2015, 2015. [23] Godfrey, M. D. (2009). The TANH Transformation. [24] N. Srivastava, Hinton, Geoffrey E.,Krizhevsky, Alex,Sutskever, Ilya, Salakhutdinov, Ruslan R., “ Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014. [25] Srivastava, N. (2013). Improving Neural Networks with Dropout. [26] A. S. Krizhevsky, Ilya & E. Hinton, Geoffrey, “ImageNet Classification with Deep Convolutional Neural Networks,” in Conference and Workshop on Neural Information Processing Systems, 2012, pp. 1097-1105. [27] A. Z. Karen Simonyan, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, 2014. [28] X. Z. Kaiming He, Shaoqing Ren,Jian Sun, “Deep Residual Learning for Image Recognition,” CoRR, 2015. [29] Q. Yang, “When Deep Learning Meets Transfer Learning,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 5-5. [30] Q. Y. Sinno Jialin Pan, “A Survey on Transfer Learning,” IEEE Trans. on Knowl. and Data Eng., vol. 22, pp. 1345-1359, 2010. [31] B. J. L. Rabiner, “An introduction to hidden Markov models,” IEEE ASSP Magazine, vol. 3, pp. 4-16, Jan 1986, 1986. [32] S. Y. Mark Gales, “The application of hidden Markov models in speech recognition,” Found. Trends Signal Process., pp. 195-304, 2007. [33] Campbell, J. P., Member, Senior (1997). Speaker recognition: a tutorial. [34] G. R. Doddington, “Speaker recognition—Identifying people by their voices,” Proceedings of the IEEE, vol. 73, pp. 1651-1664, 1985. [35] B. J.-F. Bimbot Frédéric, Fredouille Corinne, Gravier Guillaume, Magrin-Chagnolleau Ivan, Meignier Sylvain,Merlin Téva, Ortega-Garcia Javier, “A Tutorial on Text-Independent Speaker Verification,” EURASIP J. Adv. Sig. Proc., pp. 430-451, 2004. [36] T. F. Q. Douglas A. Reynolds , Robert B. Dunn, “Speaker verification using Adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19-41, 2000. [37] J. luc Gauvain, hui Lee, Chin,, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” 1994, pp. 291-298. [38] Snyder, D., Garcia-Romero,Daniel,Povey,Daniel,Khudanpur, Sanjeev (2017). Deep Neural Network Embeddings for Text-Independent Speaker Verification, INTERSPEECH. [39] Chung, J. S., Nagrani, Arsha,Zisserman, Andrew (2018). VoxCeleb2: Deep Speaker Recognition. Interspeech. [40] Tirumala Sreenivas Sremath, S. S. R. (2016). A review on Deep Learning approaches in Speaker Identification. ICSPS. [41] Y. K. Larry P. Heck, M. Kemal Sönmez,Mitch Weintraub, “Robustness to telephone handset distortion in speaker recognition by discriminative feature design,” Speech Commun., vol. 31, pp. 181-192, 2000. [42] V. C. Lukic Yanick, Durr Oliver,Stadelmann Thilo, “Speaker identification and clustering using convolutional neural networks,” 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, 2016. [43] S. Hossein, “Speaker Verification using Convolutional Neural Networks,” CoRR, vol. abs/1803.05427, 2018. [44] J. S. J. Sola, “Importance of input data normalization for the application of neural networks to complex industrial problems,” IEEE Transactions on Nuclear Science, pp. 1464-1468, Jun 1997, 1997. [45] S. G. K. Patro, Sahu, Kishore Kumar, “Normalization: A Preprocessing Stage,” CoRR, vol. abs/1503.06462, 2015. [46] S. Chandra, “Color Image to Grayscale Image Conversion,” 2010 Second International Conference on Computer Engineering and Applications, pp. 196-199, 2010. [47] Boureau Y-Lan, P. J., LeCun Yann (2010). A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML. [48] Wu Haibing, G. X. (2015). Max-Pooling Dropout for Regularization of Convolutional Neural Networks. ICONIP. [49] Z. J.-S. Zang Fei “Softmax Discriminant Classifier,” 2011 Third International Conference on Multimedia Information Networking and Security, pp. 16-19, 2011. [50] Visa Sofia , R. B., Ralescu, Anca L. , Knaap, Esther van der (2011). Confusion Matrix-based Feature Selection. MAICS.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21702	-
dc.description.abstract	現今的社會基於安全上的需要以及便利性，有需多不同種類的生物辨識系統因應而生，所謂的生物辨識技術就是以每一個生物獨有的生物特徵當作辨識的依據像是指紋辨識、虹膜辨識等等，其中說話人辨認也是生物辨識的其中一種，如果有一天，解鎖系統能透過說話人的聲音以及說出的密碼來辨別是不是手機的擁有者，勢必能讓生活更方便。由於AlphaGo與李世石的圍棋對決使得深度學習突然成為了顯學，如何將類神經網路應用於各個領域的問題也成為了大家爭相研究的題目，其中卷積神經網路的發展也是類神經網路發展的其中一個重要的領域，本論文提出了一個基於卷積神經網路設計的語音密碼系統，利用說話人的語音訊號生成之灰階影像，將之輸入至卷積神經網路並產出分類結果，並搭配辨識語者說出的密碼，以達成辨識語音密碼的功能。	zh_TW
dc.description.abstract	There are many different types of biometric systems that are developed because of the need for security and convenience. The biometric technology is based on the unique biological characteristics of each organism such as fingerprint recognition, iris recognition, etc. The voice recognition is also one of the biometric characteristic. One day, people may unlock their cellphone by just talking to their cellphone which make life more convenient. Deep Learning has become one of the most popular research topic becase of ALPHAGO. Everyone started to study how to apply deep learning to a variety of problems and the convolutional neural networks is also an important area in the development of neural networks. This research proposes a vocal password recognition system based on convolutional neural network. Using the grayscale image generated by the speaker’s voice signals as an input to the convolutional neural network and use it to produce the classfication result to build the vocal password recognition system.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:43:11Z (GMT). No. of bitstreams: 1 ntu-108-R06525054-1.pdf: 2352377 bytes, checksum: ed72ce4e67f31aba777eb3261eb651a5 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	目錄口試委員會審定書 # 誌謝 i 摘要 ii ABSTRACT iii 目錄 iv 圖表目錄 vi 表目錄 viii 第一章緒論 1 1.1 研究動機與目的 1 1.2 語音預處理 2 1.2.1 端點偵測 2 1.2.2 預強調 2 1.2.3 分幀加窗 2 1.2.4 梅爾頻率倒譜係數 3 1.3 論文架構 6 第二章文獻回顧 7 2.1 類神經網路 7 2.1.1 人工神經元 7 2.1.2 多層感知機. 8 2.1.3 卷積神經網路. 9 2.1.4 遷移學習. 14 2.2 語音辨識 14 2.2.1 隱藏式馬可夫模型 14 2.3 說話人辨識 16 2.3.1 GMM-UBM模型 18 2.3.2 類神經網路與說話人辨識 19 第三章研究方法及實驗流程 23 3.1 MFCCs區塊灰階影像產生之方式 23 3.1.1 特徵萃取 23 3.1.2 正規化處理 26 3.1.3 輸入資料之產生 27 3.1.4 神經網路架構 28 3.2 頻譜圖輸入產生之方式 29 3.2.1 頻譜圖特徵萃取 29 3.2.2 正規化處理 29 3.2.3 神經網路架構 30 第四章實驗結果與討論 32 4.1 語音密碼系統 32 4.2 資料蒐集 36 4.3 設備差異 36 4.4 環境差異 37 4.5 辨識結果比較 38 第五章結論 40 REFERENCE 41
dc.language.iso	zh-TW
dc.title	卷積神經網路之語音密碼系統	zh_TW
dc.title	Convolutional Neural Networks for Vocal Password Recognition System	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張瑞益,張恆華,謝傳璋
dc.subject.keyword	卷積神經網路,說話人識別,生物辨識,語音密碼,深度學習,	zh_TW
dc.subject.keyword	convolutional neural network,biometric,speaker recognition,vocal password,deep learning,	en
dc.relation.page	45
dc.identifier.doi	10.6342/NTU201900871
dc.rights.note	未授權
dc.date.accepted	2019-06-10
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	2.3 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。