Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80035
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐宏民(Winston Hsu)
dc.contributor.authorPo-Yu Wuen
dc.contributor.author吳柏鋙zh_TW
dc.date.accessioned2022-11-23T09:22:21Z-
dc.date.available2021-08-23
dc.date.available2022-11-23T09:22:21Z-
dc.date.copyright2021-08-23
dc.date.issued2021
dc.date.submitted2021-08-11
dc.identifier.citationA. Adler, V. Emiya, M. G. Jafari, M. Elad, R. Gribonval, and M. D. Plumbley. Audio inpainting. IEEE Transactions on Audio, Speech, and Language Processing, 20(3):922–932, 2011. M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017. Y. Bahat, Y. Y. Schechner, and M. Elad. Self­content­based audio inpainting. Signal Processing, 111:61–72, 2015. R. Balan. On signal reconstruction from its spectrogram. In 2010 44th Annual Conference on Information Sciences and Systems (CISS), pages 1–4. IEEE, 2010. T. Bazin, G. Hadjeres, P. Esling, and M. Malt. Spectrogram inpainting for interactive generation of instrument sounds. arXiv preprint arXiv:2104.07519, 2021. Y.­L. Chang, K.­Y. Lee, P.­Y. Wu, H.­y. Lee, and W. Hsu. Deep long audio inpainting. arXiv preprint arXiv:1911.06476, 2019. Y.­L. Chang, Z. Y. Liu, K.­Y. Lee, and W. Hsu. Free­form video inpainting with 3d gated convolution and temporal patchgan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9066–9075, 2019. C. Donahue, J. McAuley, and M. Puckette. Adversarial audio synthesis. arXiv preprint arXiv:1802.04208, 2018. P. P. Ebner and A. Eltelt. Audio inpainting with generative adversarial network. arXiv preprint arXiv:2003.07704, 2020. A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 341–346, 2001. A. A. Efros and T. K. Leung. Texture synthesis by non­parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1033–1038. IEEE, 1999. I. J. Goodfellow, J. Pouget­Abadie, M. Mirza, B. Xu, D. Warde­Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014. D. Griffin and J. Lim. Signal estimation from modified short­time fourier transform. IEEE Transactions on acoustics, speech, and signal processing, 32(2):236–243, 1984. C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.­Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck. Enabling factorized piano music modeling and generation with the maestro dataset. arXiv preprint arXiv:1810.12247, 2018. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time­scale update rule converge to a local nash equilibrium. arXiv preprint arXiv:1706.08500, 2017. S. Iizuka, E. Simo­Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017. P. Isola, J.­Y. Zhu, T. Zhou, and A. A. Efros. Image­to­image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017. K. Ito and L. Johnson. The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/, 2017. J. Johnson, A. Alahi, and L. Fei­Fei. Perceptual losses for real­time style transfer and super­resolution. In European conference on computer vision, pages 694–711. Springer, 2016. N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. Oord, S. Dieleman, and K. Kavukcuoglu. Efficient neural audio synthesis. In International Conference on Machine Learning, pages 2410–2419. PMLR, 2018. M. Kegler, P. Beckmann, and M. Cernak. Deep speech inpainting of time­frequency masks. arXiv preprint arXiv:1910.09058, 2019. K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi. Fr\’echet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2018. D. P. Kingma and M. Welling. Auto­encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brébisson, Y. Bengio, and A. Courville. Melgan: Generative adversarial networks for conditional waveform synthesis. arXiv preprint arXiv:1910.06711, 2019. J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama. Fast signal reconstruction from magnitude stft spectrogram based on spectrogram consistency. In Proc. DAFx, volume 10, pages 397–403, 2010. F. Lieb and H.­G. Stark. Audio inpainting: Evaluation of time­frequency representations and structured sparsity approaches. Signal Processing, 153:291–299, 2018. J. H. Lim and J. C. Ye. Geometric gan. arXiv preprint arXiv:1705.02894, 2017. G. Liu, F. A. Reda, K. J. Shih, T.­C. Wang, A. Tao, and B. Catanzaro. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), pages 85–100, 2018. X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2794–2802, 2017. A. Marafioti, N. Holighaus, P. Majdak, N. Perraudin, et al. Audio inpainting of music by means of neural networks. In Audio Engineering Society Convention 146. Audio Engineering Society, 2019. A. Marafioti, P. Majdak, N. Holighaus, and N. Perraudin. Gacela­a generative adversarial context encoder for long audio inpainting of music. IEEE Journal of Selected Topics in Signal Processing, 2020. A. Marafioti, N. Perraudin, N. Holighaus, and P. Majdak. A context encoder for audio inpainting. arXiv preprint arXiv:1810.12138, 2018. M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018. O. Mokrỳ and P. Rajmic. Audio inpainting: Revisited and reweighted. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2906–2918, 2020. O. Mokrỳ, P. Záviška, P. Rajmic, and V. Veselỳ. Introducing spain (sparse audio inpainter). In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1–5. IEEE, 2019. V. S. Narayanaswamy, J. J. Thiagarajan, and A. Spanias. On the design of deep priors for unsupervised audio restoration. arXiv preprint arXiv:2104.07161, 2021. K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212, 2019. A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016. T. Park, M.­Y. Liu, T.­C. Wang, and J.­Y. Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016. N. Perraudin, N. Holighaus, P. Majdak, and P. Balazs. Inpainting of long audio segments with similarity graphs. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(6):1083–1094, 2018. K. J. Piczak. Esc: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia, pages 1015–1018, 2015. R. Prenger, R. Valle, and B. Catanzaro. Waveglow: A flow­based generative network for speech synthesis. In ICASSP 2019­2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3617–3621. IEEE, 2019. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016. P. Smaragdis, B. Raj, and M. Shashanka. Missing data imputation for time­frequency representations of audio signals. Journal of signal processing systems, 65(3):361-370, 2011. I. Toumi and V. Emiya. Sparse non­local similarity modeling for audio inpainting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 576–580. IEEE, 2018. T. E. Tremain. The government standard linear predictive coding algorithm: Lpc­10. Speech Technology, pages 40–49, 1982. D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9446–9454, 2018. T.­C. Wang, M.­Y. Liu, J.­Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. Highresolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. P. J. Wolfe and S. J. Godsill. Interpolation of missing data values for audio signal restoration using a gabor regression model. In ICASSP, volume 5, pages v–517. IEEE, 2005. R. Yamamoto, E. Song, and J.­M. Kim. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi­resolution spectrogram. In ICASSP 2020­2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6199–6203. IEEE, 2020. G. Yang, S. Yang, K. Liu, P. Fang, W. Chen, and L. Xie. Multi­band melgan: Faster waveform generation for high­quality text­to­speech. In 2021 IEEE Spoken Language Technology Workshop (SLT), pages 492–498. IEEE, 2021. Z. Yi, Q. Tang, S. Azizi, D. Jang, and Z. Xu. Contextual residual aggregation for ultra high­resolution image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7508–7517, 2020. J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018. J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Free­form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4471–4480, 2019. Z. Zhang, Y. Wang, C. Gan, J. Wu, J. B. Tenenbaum, A. Torralba, and W. T. Freeman. Deep audio priors emerge from harmonic convolutional networks. In International Conference on Learning Representations, 2019. H. Zhao, C. Gan, A. Rouditchenko, C. Vondrick, J. McDermott, and A. Torralba. The sound of pixels. In Proceedings of the European conference on computer vision (ECCV), pages 570–586, 2018. H. Zhou, Z. Liu, X. Xu, P. Luo, and X. Wang. Vision­infused deep audio inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 283–292, 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80035-
dc.description.abstract我們輿論文中介紹一種實用、彈性且有效的長片段音訊修復方法。這個基於條件對抗式網路的架構稱為SLAIN,能夠恢復音訊的毀損部分,包括各類音效和樂器錄音。我們利用源自風格遷移的架構並進行精心設計的修改,使此方法可以處理未被形變的音訊頻譜圖,並根據人類的聲學特徵進行衡量。另外與最新神經聲碼器的集成使得輸出音訊質量比傳統演算法Griffin­Lim好上不少。除了重建函數和生成對抗函數之外,預訓練的聲碼器還提供了額外聲學函數來指導模型。透過分析實驗在兩個有挑戰性的數據集上,平均意見分數(MOS)的人工評估表明我們的方法可以處理彈性長度的毀損並在44.1 kHz(常見採樣頻率)的1.5秒長音訊樣本中能夠達到最多1秒的修補長度。生成的聲音其分數平均在MOS上最高5分中超過4分,這代表與現有的長音訊修復方法相比,我們的方法具有最佳效能。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-23T09:22:21Z (GMT). No. of bitstreams: 1
U0001-1607202118463800.pdf: 3482085 bytes, checksum: ca6fbf0f9af3758efa34d6637fec0dbc (MD5)
Previous issue date: 2021
en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Proposed Methods 8 3.1 Audio analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 4 Experiments 11 4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 5 Conclusion 16 References 17 Appendix A — Further Details 25 A.1 Deformation of the Mel spectrogram. . . . . . . . . . . . . . . . . . 25 A.2 Discussion of anomaly detection. . . . . . . . . . . . . . . . . . . . 25 A.3 Free­from mask comparison. . . . . . . . . . . . . . . . . . . . . . . 26 A.3.1 Additional LJ­Speech dataset. . . . . . . . . . . . . . . . . . . . . 28 A.4 Failure samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 A.5 Training curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
dc.language.isoen
dc.subject平均主觀意見分zh_TW
dc.subject音訊修補zh_TW
dc.subject條件對抗式網路zh_TW
dc.subject聲碼器zh_TW
dc.subject聲學zh_TW
dc.subjectcGANsen
dc.subjectMOSen
dc.subjectAcousticen
dc.subjectVocoderen
dc.subjectAudio Inpaintingen
dc.title基於條件對抗式網路進行長片段音訊修補zh_TW
dc.titleSLAIN: A Second Long Audio Inpainting with Conditional GAN.en
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.coadvisor陳文進(Wen-Chin Chen)
dc.contributor.oralexamcommittee余能豪(Hsin-Tsai Liu),葉梅珍(Chih-Yang Tseng),陳奕廷
dc.subject.keyword音訊修補,條件對抗式網路,聲碼器,聲學,平均主觀意見分,zh_TW
dc.subject.keywordAudio Inpainting,cGANs,Vocoder,Acoustic,MOS,en
dc.relation.page30
dc.identifier.doi10.6342/NTU202101523
dc.rights.note同意授權(全球公開)
dc.date.accepted2021-08-13
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1607202118463800.pdf3.4 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved