Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77684Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 張智星 | |
| dc.contributor.author | Wan-Jung Chen | en |
| dc.contributor.author | 陳婉容 | zh_TW |
| dc.date.accessioned | 2021-07-10T22:15:46Z | - |
| dc.date.available | 2021-07-10T22:15:46Z | - |
| dc.date.copyright | 2017-08-31 | |
| dc.date.issued | 2017 | |
| dc.date.submitted | 2017-08-17 | |
| dc.identifier.citation | [1] S.Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Trans Acoust., Speech, Signal Process, vol. ASSP-27, no. 2, pp. 113–120, 2979.
[2] D. L.Wang, “Deep Learning Reinvents the Hearing Aid,” 2016. [Online]. Available: http://spectrum.ieee.org/consumer-electronics/audiovideo/deep-learning-reinvents-the-hearing-aid. [3] J. S.Lim andA. V.Oppenheim, “All-pole Modeling of Degraded Speech,” IEEE Trans. Acoust., Speech, Signal Process, vol. 26, no. 3, pp. 197–210, 1978. [4] Y.Ephraim andD.Malah, “Speech Enhancement Using a Minimum-mean Square Error Short-time Spectral Amplitude Estimator,” IEEE Trans. Acoust., Speech, Signal Process, vol. ASSP-32, no. 6, pp. 1109–1121, 1984. [5] Y.Ephraim, “Statistical-model-based Speech Enhancement Systems,” in Proceedings of the IEEE, 1992, vol. 80, no. 10, pp. 1526–1555. [6] L.Zhang, G.Bao, J.Zhang, andZ.Ye, “Supervised Single-channel Speech Enhancement Using Ratio Mask with Joint Dictionary Learning,” Speech Commun., vol. 82, pp. 38–52, 2016. [7] P.Sprechmann, A.Bronstein, andG.Sapiro, “Real-time Online Singing Voice Separation from Monaural Recordings using Robust Low-Rank Modeling,” Proc. 13th Int. Soc. Music Inf. Retr. Conf., 2012. [8] Y.-H.Yang, “Low-rank Representation of Both Singing Voice and Music Accompaniment via Learned Dictonaries,” in Proc. 14th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), 2013. [9] D. L.Wang, “On Ideal Binary Mask as the Computational Goal of Auditory Scene Analysis,” Speech Sep. by humans Mach., vol. 1, pp. 181–197, 2005. [10] Ö.Yilmaz andS.Rickard, “Blind Separation of Speech Mixtures via Time-frequency Masking,” IEEE Trans. Signal Process., vol. 52, no. 7, pp. 1830–1846, 2004. [11] H.K. andW.D., “A Classification Based Approach to Speech Segregation,” J. Acoust. Soc. Amer., vol. 132, pp. 3475–3483, 2012. [12] Z. Z.Jin andD. L.Wang, “A Supervised Learning Approach to Monaural Segregation of Reverberant Speech,” IEEE Trans Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625–638, 2009. [13] G.Kim, L.Y., H.Y., andL.P., “An Algorithm That Improves Speech Intelligibility in Noise for Normal-hearing Listeners,” J. Acoust. Soc. Amer., pp. 1486–1494, 2009. [14] Y. X.Wang, A.Narayanan, andD. L.Wang, “On Training Targets For Supervised Speech Separation,” IEEE/ACM Trans. Speech Lang. Process., vol. 22, no. 12, pp. 1849–1858, 2014. [15] A.Narayanan andD. L.Wang, “Ideal Ratio Mask Estimation Using Deep Neural Networks for Robust Speech Recognition,” Int. Conf. Acoust. Speech Signal Process., vol. 1, pp. 7092–7096, 2013. [16] D. S.Williamson, Y.Wang, andD. L.Wang, “Complex Ratio Masking for Monaural Speech Separation,” IEEE/ACM Trans. Speech Lang. Process., vol. 24, no. 3, pp. 483–492, 2016. [17] 陳祖昊, “基於複數頻譜圖時域變化之消除迴響演算法,” 2016. [18] Y.Bengio, “Practical Recommendations for Gradient-based Training of Deep Architectures,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7700 LECTU, pp. 437–478, 2012. [19] X.Glorot andY.Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” Proc. 13th Int. Conf. Artif. Intell. Stat., vol. 9, pp. 249–256, 2010. [20] K.He, X.Zhang, S.Ren, andJ.Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. [21] T.Yanhui, D.Jun, X.Yong, D.Lirong, andL.Chin-hui, “Deep Neural Network Based Speech Separation for Robust Speech Recognition,” in 2014 12th International Conference on Signal Processing (ICSP), 2014, pp. 532–536. [22] Z. C.Fan, J. S.Jang, andC. L.Lu, “Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music via DNN and Adaptive Pitch Tracking,” in 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), 2016, pp. 178–185. [23] Y.Wang, K.Han, andD.Wang, “Exploring Monaural Features for Classification-based Speech Segregation,” IEEE Trans. Audio, Speech Lang. Process., vol. 21, no. 2, pp. 270–279, 2013. [24] S.Tamura andA.Waibel, “Noise Reduction Using Connectionist Models,” Icassp, pp. 553–556, 1988. [25] A.Maas andQ.Le, “Recurrent Neural Networks for Noise Reduction in Robust ASR.,” Interspeech, pp. 3–6, 2012. [26] P.SenHuang, M.Kim, M.Hasegawa-Johnson, andP.Smaragdis, “Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation,” IEEE/ACM Trans. Speech Lang. Process., vol. 23, no. 12, pp. 2136–2147, 2015. [27] R.Jang, “7-1 Introduction to Pitch Tracking.” [Online]. Available: http://mirlab.org/jang/books/audiosignalprocessing/ptIntro.asp?title=7-1 Introduction to Pitch Tracking (%AD%B5%B0%AA%B0l%C2%DC2%A4%B6). [Accessed: 19-May-2016]. [28] R.Jang, “7-2 Time-domain: PDF: ACF.” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/ptTimeDomainAcf.asp?title=7-2 Time-domain: PDF: ACF. [Accessed: 19-May-2017]. [29] R.Jang, “5-4 Pitch.” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/basicFeaturePitch.asp?title=5-4 Pitch (%AD%B5%B0%AA). [30] J.-C.Chen andJ. R.Jang, “TRUES: Tone Recognition Using Extended Segments,” ACM Trans. Asian Lang. Inf. Process., vol. 7, no. 3, 2008. [31] E.Vincent, R. G.Gribonval, andC.Févotte, “Performance Measurement in Blind Audio Source Separation,” Intell. Audio Anal., vol. 14, no. 4, pp. 1462–1469, 2006. [32] E.Vincent, R. G.Gribonval, andC.Févotte, “Performance Measurement in Blind Audio Source Separation,” Intell. Audio Anal., vol. 14, no. 4, pp. 1462–1469, 2006. [33] T.Chan andY.Yang, “Complex and Quaternionic Robust Principal Component Analysis and Its Application to Singing,” vol. XX, no. Xx, pp. 1–5, 2016. [34] “2016:Audio Melody Extraction,” 2016. [Online]. Available: http://www.music-ir.org/mirex/wiki/2016:Audio_Melody_Extraction#Evaluation_Procedures. [35] “2015:MIREX2015 Results,” MIREX, 2015. [Online]. Available: http://www.music-ir.org/mirex/wiki/2015:MIREX2015_Results. [Accessed: 19-May-2017]. [36] H.Erdogan, J. R.Hershey, S.Watanabe, andJ.LeRoux, “Phase-Sensitive and Recognition-Boosted Speech Separation Using Deep Recurrent Neural Networks,” in Proceedings 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, 2015, pp. 708–712. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77684 | - |
| dc.description.abstract | 近年來類神經網路隨著平行處理的技術以及硬體快速的發展而再度盛行,越來越多的音訊處理採用類神經網路的優勢,而本篇論文使用類神經網路將聲音從背景音樂中分離,區分為時域及複數域,另外因深度類神經網路估測出的結果仍有改善空間,我們遂提出新的聲音復原後處理方法,來增強GSIR的分數以及提升總音高的正確率。 | zh_TW |
| dc.description.abstract | Recently, neural networks prevails again due to the progress of fast hardware with parallel processing capability. More and more audio processing tasks take advantage of neural networks to achieve better performance. This paper utilizes deep neural networks (DNNs) to separate singing voice from background music in real and complex domain, respectively. Because the output of DNN are still not good enough, we propose two new methods of voice recovery for improving the GSIR of vocal separation. The proposed methods also achieves better accuracy for vocal pitch extraction. | en |
| dc.description.provenance | Made available in DSpace on 2021-07-10T22:15:46Z (GMT). No. of bitstreams: 1 ntu-106-R02922168-1.pdf: 3690719 bytes, checksum: f340fe48849dc88b5287322f2f5554be (MD5) Previous issue date: 2017 | en |
| dc.description.tableofcontents | 口試委員審定書 I
中文摘要 III ABSTRACT IV CONTENTS V LIST OF FIGURES VII LIST OF TABLES X 1 Introduction 1 1.1 Research Motive and Purpose 1 1.2 Background of Research 1 1.2.1 Source Separation 1 1.2.2 Artificial Neural Network 2 1.2.3 Pitch Extraction 3 1.3 Research Conceptual Framework 3 2 Related Work 4 2.1 Research of Source Separation 4 2.2 Research of Deep Neural Network 6 2.2.1 Initial Weight of Deep Neural Network 6 2.2.2 Architecture of Deep Neural Network 7 2.3 Research of Pitch Tracking 9 3 Proposed Methods 12 3.1 Deep Neural Network Architecture 12 3.1.1 DNN Architecture of IRM 12 3.1.2 DNN Architecture of cIRM 13 3.2 Training Objectives 15 3.2.1 Training Objective of IRM 15 3.2.2 Training Objective of cIRM 15 3.3 Voice Recovery 16 3.4 Pitch Extracting 19 4 Implementation and Result 20 4.1 Dataset and System Setup 20 4.2 Evaluation Criteria 20 4.3 IRM and cIRM Distribution 23 4.4 Voice Recovery 25 4.5 Experiment Result 31 4.6 Discussion 50 5 Conclusion and Future Work 52 5.1 Conclusion 52 5.2 Future Work 52 6 References 53 | |
| dc.language.iso | en | |
| dc.subject | 音訊旋律萃取 | zh_TW |
| dc.subject | 歌唱音高萃取 | zh_TW |
| dc.subject | 歌曲聲音分離 | zh_TW |
| dc.subject | 音樂分析 | zh_TW |
| dc.subject | 音樂資訊萃取 | zh_TW |
| dc.subject | 類神經網路 | zh_TW |
| dc.subject | music analysis | en |
| dc.subject | deep neural networks | en |
| dc.subject | Audio melody extraction | en |
| dc.subject | singing pitch extraction | en |
| dc.subject | music information retrieval | en |
| dc.subject | singing voice separation | en |
| dc.title | 運用深度類神經網路模型進行歌聲分離及音高抽取 | zh_TW |
| dc.title | Singing Voice Separation and Pitch Extraction
from Monaural Polyphonic Audio Music Via DNN | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 105-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 王新民,鄭士康 | |
| dc.subject.keyword | 音訊旋律萃取,歌唱音高萃取,歌曲聲音分離,音樂分析,音樂資訊萃取,類神經網路, | zh_TW |
| dc.subject.keyword | Audio melody extraction,singing pitch extraction,singing voice separation,music analysis,music information retrieval,deep neural networks, | en |
| dc.relation.page | 56 | |
| dc.identifier.doi | 10.6342/NTU201703697 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2017-08-18 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| Appears in Collections: | 資訊工程學系 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-106-R02922168-1.pdf Restricted Access | 3.6 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
