Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99289
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星zh_TW
dc.contributor.advisorJyh-Shing Roger Jangen
dc.contributor.author陳宥華zh_TW
dc.contributor.authorYu-Hua Chenen
dc.date.accessioned2025-08-21T17:08:32Z-
dc.date.available2025-08-22-
dc.date.copyright2025-08-21-
dc.date.issued2025-
dc.date.submitted2025-08-02-
dc.identifier.citation[1] DI unit. [Online] https://en.wikipedia.org/wiki/DI_unit.
[2] Guitar Rig. Native Instruments. [Online] https://www.native-instruments.com/en/products/komplete/guitar/guitar-rig-6-pro/.
[3] JUCE: An open-source cross-platform C+ + application framework. [Online]https://github.com/juce-framework/JUCE.
[4] WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499,2016.
[5] J. Abeßer, H. Lukashevich, and G. Schuller. Feature-based extraction of pluckingand expression styles of the electric bass guitar. In Proc. ICASSP, 2010.
[6] H. P. W. Abreu, R. Corey, and I. Roman. Leveraging electric guitar tones and effects to improve robustness in guitar tablature transcription modeling. International Conference on Digital Audio Effects (DAFx), 2024.
[7] A. Agostinelli, T. I. Denk, Z. Borsos, J. Engel, M. Verzetti, A. Caillon, Q. Huang, A. Jansen, A. Roberts, M. Tagliasacchi, et al. MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325, 2023.
[8] M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas. Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pages 456–473. Springer, 2022.
[9] A. M. Barbancho, A. Klapuri, L. J. Tardon, and I. Barbancho. Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech and Language Processing, 20(3):915–921, 2012.
[10] E. Benetos, S. Dixon, Z. Duan, and S. Ewert. Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36:20–30, 2019.
[11] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri. Automatic music transcription: Challenges and future directions. Journal of Intelligent Information Systems, 41:407–434, 2013.
[12] S. Böck, F. Krebs, and M. Schedl. Evaluating the online capabilities of onset detection methods. In Proc. ISMIR, pages 49–54, 2012.
[13] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems (Neurips), 2020.
[14] K. Chen, X. Du, B. Zhu, Z. Ma, T. Berg-Kirkpatrick, and S. Dubnov. Zero-shot audio source separation through query-based learning from weakly-labeled data. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022.
[15] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning (ICML), 2020.
[16] Y.-H. Chen, W. Choi, W.-H. Liao, M. A. Martínez Ramírez, K. W. Cheuk, Y. Mitsufuji, J.-S. R. Jang, and Y.-H. Yang. Improving unsupervised clean-to-rendered guitar tone transformation using GANs and integrated unaligned clean data. In International Conference on Digital Audio Effects (DAFx), 2024.
[17] Y.-H. Chen, W.-Y. Hsiao, T.-K. Hsieh, J.-S. R. Jang, and Y.-H. Yang. Towards automatic transcription of polyphonic electric guitar music: A new dataset and a multi-loss transformer model In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[18] Y.-H. Chen, Y.-H. Huang, W.-Y. Hsiao, and Y.-H. Yang. Automatic composition of guitar tabs by Transformers and groove modeling. In Proc. ISMIR, 2020.
[19] Y.-H. Chen, Y.-T. Yeh, Y.-C. Cheng, J.-T. Wu, Y.-H. Ho, J.-S. R. Jang, and Y.-H. Yang. Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control. International Society for Music Information Retrieval (ISMIR), 2024.
[20] Y.-P. Chen, L. Su, and Y.-H. Yang. Electric guitar playing technique detection in real-world recordings based on f0 sequence pattern recognition. In Proc. ISMIR, 2015.
[21] M. Comunità, C. J. Steinmetz, H. Phan, and J. D. Reiss. Modelling black-box audio effects with time-varying feature modulation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[22] M. Comunità, D. Stowell, and J. D. Reiss. Guitar effects recognition and parameter estimation with convolutional neural networks. arXiv preprint arXiv:2012.03216, 2020.
[23] J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y. Adi, and A. Défossez. Simple and controllable music generation. Advances in Neural Information Processing Systems, 36, 2024.
[24] J. Covert and D. L. Livingston. A vacuum-tube guitar amplifier model using a recurrent neural network. In Proceedings of IEEE Southeastcon, 2013.
[25] E.-P. Damskägg, L. Juvela, E. Thuillier, and V. Välimäki. Deep learning for tube amplifier emulation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[26] C. Deng, C. Yu, H. Lu, C. Weng, and D. Yu. PitchNet: Unsupervised singing voice conversion with pitch adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7749–7753, 2020.
[27] A. Défossez, J. Copet, G. Synnaeve, and Y. Adi. High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022.
[28] F. Eichas and U. Zölzer. Virtual analog modeling of guitar amplifiers with Wiener-Hammerstein models. In Proceedings of Annual Convention on Acoustics, 2018.
[29] F. Eichas and U. Zölzer. Gray-box modeling of guitar amplifiers. Journal of the Audio Engineering Society, 2018.
[30] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2009.
[31] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio, Speech, and Language Processing, 18(6):1643–1654, 2010.
[32] J. Gardner, I. Simon, E. Manilow, C. Hawthorne, and J. Engel. Mt3: Multi-task multitrack music transcription. International Conference on Learning Representations (ICLR), 2022.
[33] Y. Gong, Y.-A. Chung, and J. Glass. AST: Audio Spectrogram Transformer. In International Speech Communication Association (INTERSPEECH), pages 571–575, 2021.
[34] I. Goodfellow et al. Generative adversarial nets. Proc. Advances in neural information processing systems, 63(11):139–144, 2014.
[35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
[36] A. Gui, H. Gamper, S. Braun, and D. Emmanouilidou. Adapting Fréchet audio distance for generative music evaluation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1331–1335, 2024.
[37] H. Guo, H. Lu, N. Hu, C. Zhang, S. Yang, L. Xie, D. Su, and D. Yu. Phonetic posteriorgrams based many-to-many singing voice conversion via adversarial training. arXiv preprint arXiv:2012.01837, 2020
[38] H. Guo, Z. Zhou, F. Meng, and K. Liu. Improving adversarial waveform generation based singing voice conversion with harmonic signals. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6657–6661, 2022.
[39] D. Ha, A. Dai, and Q. V. Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
[40] C. Hawthorne, E. Elsen, J. Song, A. Roberts, I. Simon, C. Raffel, J. Engel, S. Oore, and D. Eck. Onsets and frames: Dual-objective piano transcription. International Society for Music Information Retrieval (ISMIR), 2017.
[41] C. Hawthorne, I. Simon, R. Swavely, E. Manilow, and J. Engel. Sequence-to-sequence piano transcription with transformers. International Society for Music Information Retrieval (ISMIR), 2021.
[42] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck. Enabling factorized piano music modeling and generation with the maestro dataset. International Conference on Learning Representations (ICLR), 2018.
[43] T.-S. Huang, P.-C. Yu, and L. Su. Note and playing technique transcription of electric guitar solos in real-world music performance. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[44] J. Imort, G. Fabbro, M. A. M. Ramírez, S. Uhlich, Y. Koyama, and Y. Mitsufuji. Distortion audio effects: Learning how to recover the clean signal. arXiv preprint arXiv:2202.01664, 2022.
[45] W. Jang, D. Lim, J. Yoon, B. Kim, and J. Kim. Univnet: A neural vocoder with multi-resolution spectrogram discriminators for high-fidelity waveform generation. arXiv preprint arXiv:2106.07889, 2021.
[46] L. Juvela, E.-P. Damskägg, A. Peussa, J. Mäkinen, T. Sherson, S. I. Mimilakis, K. Rauhanen, and A. Gotsopoulos. End-to-end amp modeling: from data to controllable guitar amplifier models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[47] T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo. CycleGAN-VC2: Improved cyclegan-based non-parallel voice conversion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6820–6824, 2019.
[48] C. Kehling, J. Abeßer, C. Dittmar, and G. Schuller. Automatic tablature transcription of electric guitar recordings by estimation of score-and instrument-related parameters. In International Conference on Digital Audio Effects (DAFx), 2014.
[49] K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi. Fréchet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2018.
[50] J. W. Kim and J. P. Bello. Adversarial learning for improved onsets and frames music transcription. In Proc. ISMIR, 2019.
[51] D. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
[52] D. P. Kingma. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
[53] J. Kong, J. Kim, and J. Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33:17022–17033, 2020.
[54] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang. High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.
[55] J. Koo, M. A. Martínez-Ramírez, W.-H. Liao, S. Uhlich, K. Lee, and Y. Mitsufuji. Music mixing style transfer: A contrastive learning approach to disentangle audio effects. arXiv preprint arXiv:2211.02247, 2023.
[56] K. Kumar, R. Kumar, T. De Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. De Brebisson, Y. Bengio, and A. C. Courville. MelGAN: Generative adversarial networks for conditional waveform synthesis. Advances in Neural Information Processing systems, 32, 2019.
[57] R. Kumar, P. Seetharaman, A. Luebs, I. Kumar, and K. Kumar. High-fidelity audio compression with improved rvqgan. Advances in Neural Information Processing Systems (NeurIPS), 2024.
[58] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning, pages 1558–1566, 2016.
[59] J. H. Lee, H.-S. Choi, and K. Lee. Audio query-based music source separation. International Society for Music Information Retrieval (ISMIR), 2019.
[60] X. Li, S. Liu, and Y. Shan. A hierarchical speaker representation framework for oneshot singing voice conversion. International Speech Communication Association (INTERSPEECH), pages 4307–4311, 2022.
[61] Y. A. Li, A. Zare, and N. Mesgarani. StarGANv2-VC: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion. arXiv preprint arXiv:2107.10394, 2021.
[62] J. H. Lim and J. C. Ye. Geometric GAN. arXiv preprint arXiv:1705.02894, 2017.
[63] S. Liu, Y. Cao, N. Hu, D. Su, and H. Meng. FastSVC: Fast cross-domain singing voice conversion with feature-wise linear modulation. In IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2021.
[64] X. Liu, Q. Kong, Y. Zhao, H. Liu, Y. Yuan, Y. Liu, R. Xia, Y. Wang, M. D. Plumbley, and W. Wang. Separate anything you describe. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2024.
[65] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
[66] E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux. Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019.
[67] M. A. Martínez Ramírez, E. Benetos, and J. D. Reiss. Deep learning for black-box modeling of audio effects. Applied Sciences, 10(2):638, 2020.
[68] M. A. Martínez Ramírez, W. Liao, C. Nagashima, G. Fabbro, S. Uhlich, and Y. Mitsufuji. Automatic music mixing with deep learning and out-of-domain data. In International Society for Music Information Retrieval (ISMIR), 2022.
[69] S. Oore et al. This time with feeling: Learning expressive musical performance. Neural Computing and Applications, 2018.
[70] H. Pedroza, G. Meza, and I. R. Roman. Egfxset: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb. ISMIR Late Breaking Demo, 2022.
[71] E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. In AAAI Conference on Artificial Intelligence, 2018.
[72] C. Raffel et al. mir_eval: A transparent implementation of common MIR. In Proc. ISMIR, pages 367–372, 2014.
[73] L. Reboursière et al. Left and right-hand guitar playing techniques detection. In Proc. NIME, pages 7–10, 2012.
[74] D. Rethage, J. Pons, and X. Serra. A WaveNet for speech denoising. arXiv preprint arXiv:1706.07162, 2018.
[75] A. Richard, D. Markovic, I. D. Gebru, S. Krenn, G. A. Butler, F. Torre, and Y. Sheikh. Neural synthesis of binaural speech from mono audio. In International Conference on Learning Representations, 2021.
[76] X. Riley, D. Edwards, and S. Dixon. High resolution guitar transcription via domain adaptation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[77] P. Sarmento, A. Kumar, C. Carr, Z. Zukowski, M. Barthet, and Y.-H. Yang. Dadagp: A dataset of tokenized guitarpro songs for sequence models. International Society for Music Information Retrieval (ISMIR), 2021.
[78] S. Sigtia, E. Benetos, and S. Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio, Speech, and Language Processing, 24(5):927–939, 2016.
[79] P. Sobot. pedalboard, 2021.
[80] J. Spijkervet and J. A. Burgoyne. Contrastive learning of musical representations. In International Society for Music Information Retrieval (ISMIR), 2021.
[81] M. Stein, J. Abeßer, C. Dittmar, and G. Schuller. Automatic detection of audio effects in guitar and bass recordings. In Audio Engineering Society Convention 128 (AES), 2010.
[82] C. J. Steinmetz, N. J. Bryan, and J. D. Reiss. Style transfer of audio effects with differentiable signal processing. J. Audio Eng. Soc, 70(9):708–721, 2022.
[83] C. J. Steinmetz, J. Pons, S. Pascual, and J. Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[84] C. J. Steinmetz and J. D. Reiss. pyloudnorm: A simple yet flexible loudness meter in python. In 150th AES Convention, 2021.
[85] C. J. Steinmetz and J. D. Reiss. Steerable discovery of neural audio effects. arXiv preprint arXiv:2112.02926, 2021.
[86] C. J. Steinmetz and J. D. Reiss. Efficient neural networks for real-time analog audio effect modeling. In 152nd Audio Engineering Society Convention, 2022.
[87] L. Su and Y.-H. Yang. Escaping from the abyss of manual annotation: New methodology of building polyphonic datasets for automatic music transcription. In Proc. CMMR, 2015.
[88] L. Su, L.-F. Yu, and Y.-H. Yang. Sparse cepstral, phase codes for guitar playing technique classification. In Proc. ISMIR, pages 9–14, 2014.
[89] T.-W. Su, Y.-P. Chen, L. Su, and Y.-H. Yang. Tent: Technique-embedded note tracking for real-world guitar solo recordings. Transactions of the International Society for Music Information Retrieval (TISMIR), 2019.
[90] N. Takahashi, M. K. Singh, and Y. Mitsufuji. Hierarchical disentangled representation learning for singing voice conversion. In International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2021.
[91] J. Thickstun, Z. Harchaoui, D. P. Foster, and S. M. Kakade. Invariances and data augmentation for supervised music transcription. In Proc. ICASSP, pages 2241–2245, 2018.
[92] K. Toyama, T. Akama, Y. Ikemiya, Y. Takida, W.-H. Liao, and Y. Mitsufuji. Automatic piano transcription with hierarchical frequency-time transformer. International Society for Music Information Retrieval (ISMIR), 2023.
[93] L. Van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2008.
[94] A. Vaswani. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 2017.
[95] A. Wiggins and Y. E. Kim. Guitar tablature estimation with a convolutional neural network. In International Society for Music Information Retrieval (ISMIR), 2019.
[96] A. Wright, E.-P. Damskägg, L. Juvela, and V. Välimäki. Real-time guitar amplifier emulation with deep learning. Applied Sciences, 10(3):766, 2020.
[97] A. Wright, E.-P. Damskägg, and V. Välimäki. Real-time black-box modelling with recurrent neural networks. In International Conference on Digital Audio Effects, 2019.
[98] A. Wright and V. Valimaki. Neural modeling of phaser and flanging effects. Journal of the Audio Engineering Society, 69(7/8):517–529, 2021.
[99] A. Wright and V. Välimäki. Grey-box modelling of dynamic range compression. In International Conference on Digital Audio Effects, pages 304–311, 2022.
[100] A. Wright, V. Välimäki, and L. Juvela. Adversarial guitar amplifier modelling with unpaired data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[101] J.-T. Wu, J.-Y. Wang, J.-S. R. Jang, and L. Su. A unified model for zero-shot singing voice conversion and synthesis. In International Society for Music Information Retrieval Conference (ISMIR), 2022.
[102] Y. Wu et al. Omnizart: A general toolbox for automatic music transcription. arXiv preprint arXiv:2106.00497, 2021.
[103] Q. Xi, R. M. Bittner, J. Pauwels, X. Ye, and J. P. Bello. Guitarset: A dataset for guitar transcription. In International Society for Music Information Retrieval (ISMIR), 2018.
[104] H. Yakura, K. Watanabe, and M. Goto. Self-supervised contrastive learning for singing voices. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:1614–1623, 2022.
[105] J. Yang, J. Lee, Y. Kim, H. Cho, and I. Kim. Vocgan: A high-fidelity real-time vocoder with a hierarchically-nested adversarial network. arXiv preprint arXiv:2007.15256, 2020.
[106] Y.-T. Yeh, B.-Y. Chen, and Y.-H. Yang. Exploiting pre-trained feature networks for generative adversarial networks in audio-domain loop generation. In International Society for Music Information Retrieval Conference (ISMIR), 2022.
[107] Y.-T. Yeh, W.-Y. Hsiao, and Y.-H. Yang. Hyper recurrent neural network: Condition mechanisms for black-box audio effect modeling, 2024.
[108] H. Yin, G. Cheng, C. J. Steinmetz, R. Yuan, R. M. Stern, and R. B. Dannenberg. Modeling analog dynamic range compressors using deep learning and state-space models. arXiv preprint arXiv:2403.16331, 2024.
[109] Y. Zang, Y. Zhong, F. Cwitkowitz, and Z. Duan. Synthtab: Leveraging synthe- sized data for guitar tablature transcription. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[110] L. Zhang, C. Yu, H. Lu, C. Weng, C. Zhang, Y. Wu, X. Xie, Z. Li, and D. Yu. DurIAN-SC: Duration informed attention network based singing voice conversion system. In International Speech Communication Association (INTERSPEECH), pages 1231–1235, 2020.
[111] Z. Zhang, E. Olbrych, J. Bruchalski, T. J. McCormick, and D. L. Livingston. A vacuum-tube guitar amplifier model using long/short-term memory networks. SoutheastCon, 2018.
[112] H. Zhao, C. Zhang, B. Zhu, Z. Ma, and K. Zhang. S3T: Self-supervised pre-training with Swin Transformer for music classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 606–610, 2022
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99289-
dc.description.abstract電吉他是現代音樂的重要元素,其與原聲吉他及鋼琴的最大區別,在於其音色高度依賴經由擴大器與效果器處理後所產生的各種變化。然而,相較於以鋼琴為主的研究,吉他導向的音樂資訊檢索(Music Information Retrieval, MIR)發展相對落後,主因包括資料集稀缺(受限於版權與蒐集困難)以及效果處理音訊所需的特殊表示需求。本論文致力於推進吉他導向的 MIR,提出兩項核心貢獻:首先,建立並釋出兩個新資料集──Electric Guitar Database(EGDB)及其擴充版本 EGDB-PG;其次,針對兩項關鍵表示轉換任務設計創新深度學習演算法:(1)透過電吉他自動轉譜,實現由效果處理音訊轉換為符號樂譜的 audio-to-score 轉換;(2)透過電吉他音箱音色建模,實現由乾淨音色轉換為帶有效果的音色的 clean-to-wet 轉換,並探索無監督與零樣本等深度學習場景以重建多樣音色效果。透過實驗驗證,本研究有效提升吉他音訊、樂譜與音色的表示能力,有效解決電吉他效果處理所帶來的挑戰,為更穩健的基於深度學習的吉他音樂分析與吉他音箱模擬奠定基礎。zh_TW
dc.description.abstractElectric guitars, central to modern music, are distinguished from acoustic guitars and pianos by their reliance on effects processing through amplifiers and pedals, which introduces complex tonal variations. However, guitar-oriented music information retrieval (MIR) lags behind piano-oriented research due to scarce datasets, constrained by copyright and collection challenges, and the unique representational demands of effect-laden audio. This thesis advances guitar-oriented MIR by curating novel datasets, the Electric Guitar Database (EGDB) and its expanded version (EGDB-PG), and proposing original deep learning algorithms for two key representation transformations: (1) audio-to-score transformation via electric guitar transcription, leveraging EGDB and EGDB-PG to map effect-processed audio to symbolic notation; and (2) clean-to-wet audio transformation through electric guitar amplifier tone modeling, exploring unsupervised and zero-shot paradigms to replicate diverse effects. Validated through empirical evaluations, these contributions enhance audio, score, and tone representations in guitar-oriented MIR, addressing the distinct challenges posed by electric guitar effects and paving the way for robust deep-learning based electric guitar music analysis and effect modeling systems.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T17:08:32Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-21T17:08:32Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures xiii
List of Tables xvii
Chapter 1 Introduction 1
1.1 Guitar in Music Information Retrieval . . . . . . . . . . . . . . . . . 1
1.2 Representation and Transformation in Guitar Music . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Background 7
2.1 Acoustic Guitar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Electric Guitar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Guitar Amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 How Amplifiers Shape Tone . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Effects and Amplifiers in MIR . . . . . . . . . . . . . . . . . . . . 10
Chapter 3 Dataset 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Related Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 IDMT-SMT-GUITAR Database . . . . . . . . . . . . . . . . . . . . 14
3.2.2 GuitarSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.3 Guitar Playing Techniques Database . . . . . . . . . . . . . . . . . 15
3.2.4 SynthTab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 EGDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 Audio Collection Process . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 Annotation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.4 Timbre Re-render Process . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 4 Electric Guitar Transcription 21
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.1 Proposed Transcription Model . . . . . . . . . . . . . . . . . . . . 24
4.2.1.1 Transformer Encoder . . . . . . . . . . . . . . . . . . 26
4.2.1.2 Transformer Decoder . . . . . . . . . . . . . . . . . . 26
4.2.1.3 Implementation Details . . . . . . . . . . . . . . . . . 27
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Evaluation on DI Recordings . . . . . . . . . . . . . . . . . . . . . 28
4.3.2 Evaluation on Unseen Timbres Rendered with Amps . . . . . . . . 29
4.3.3 Evaluation on Real-world Recordings . . . . . . . . . . . . . . . . 30
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5 Unsupervised Amplifier Modeling 33
5.1 Tone Modeling in Commercial Products . . . . . . . . . . . . . . . . 33
5.2 Neural Virtual Analog Modeling . . . . . . . . . . . . . . . . . . . . 35
5.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.1 Neural Amplifier Modeling . . . . . . . . . . . . . . . . . . . . . . 37
5.3.2 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . 39
5.3.3 Backbone Model for Generator . . . . . . . . . . . . . . . . . . . . 39
5.3.4 Discriminators for GANs Training . . . . . . . . . . . . . . . . . . 40
5.3.5 Clean Audio from Existing Datasets . . . . . . . . . . . . . . . . . 41
5.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4.3 GAN Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 48
5.5.4 Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.6 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6.1 Comparison with Baseline Methods . . . . . . . . . . . . . . . . . 50
5.6.2 Clean Audio Combination . . . . . . . . . . . . . . . . . . . . . . 51
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.7.1 Benefit of the GAN-based Approach for VA Modeling . . . . . . . 54
5.7.2 Artifacts Generated by the Proposed Model . . . . . . . . . . . . . 55
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 6 One-to-Many and Zero-Shot Amplifier Modeling 57
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3 Multi-Tone Amplifier Modeling . . . . . . . . . . . . . . . . . . . . 63
6.3.1 Tone Embedding Encoder . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.2 Conditional Generator . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.3 Source of the Reference Audio Signal . . . . . . . . . . . . . . . . 67
6.3.4 Zero-Shot Tone Transfer . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4.1 Tone Embedding Visualization . . . . . . . . . . . . . . . . . . . . 70
6.4.2 Efficacy of One-to-Many Neural Amp Modeling . . . . . . . . . . 71
6.4.3 Zero-Shot Learning on Unseen Amplifiers . . . . . . . . . . . . . . 73
6.4.4 Case Study on Zero-Shot Amp Tone Transfer . . . . . . . . . . . . 74
6.5 Plugin Version in DAW . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.5.3 Real-Time Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Chapter 7 Electric Guitar Transcription with Large-Scale DatasetS 81
7.1 Tone and Presets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.3 Challenges for Amplifier-Rendered Automatic Guitar Transcription . 83
7.3.1 Limited Data Quantity in Datasets . . . . . . . . . . . . . . . . . . 84
7.3.2 Lack of Tone Diversity . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3.3 Complexities of Tablature Format from Guitar Fretboard Design . . 85
7.3.4 Expressive Playing Techniques . . . . . . . . . . . . . . . . . . . . 85
7.3.5 Current Efforts and Remaining Gaps . . . . . . . . . . . . . . . . . 86
7.4 Tone-Informed MIR . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.5 EGDB-PG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.6 Amplifier-Rendered Transcription Approaches . . . . . . . . . . . . 89
7.6.1 Tone Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.7 Transcription System . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.7.1 hFT-Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.7.2 Architecture of Tone-informed Transformer . . . . . . . . . . . . . 92
7.7.3 Content Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 94
7.7.4 Audio Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.8.1 Impact of Content Augmentation and Audio Normalization on Transcription Performance . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.8.2 Baseline Performance with Audio Normalization . . . . . . . . . . 97
7.8.3 Effect of Content Augmentation . . . . . . . . . . . . . . . . . . . 98
7.8.4 Impact of Audio Normalization under Content Augmentation . . . . 98
7.8.5 Performance without Tone Embeddings . . . . . . . . . . . . . . . 99
7.8.6 Comparison with Other Transcription Models . . . . . . . . . . . . 99
7.8.7 Evaluating Generalizability to Tone and Content Variations . . . . . 102
7.8.8 Impact of Tone Augmentation (# of Tones) . . . . . . . . . . . . . . 103
7.8.9 Impact of Content Augmentation . . . . . . . . . . . . . . . . . . . 104
7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 8 Conclusions and Future Work 107
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
References 109
-
dc.language.isoen-
dc.subject電吉他zh_TW
dc.subject虛擬類比建模zh_TW
dc.subject音訊效果建模zh_TW
dc.subject音樂資訊檢索zh_TW
dc.subject轉譜zh_TW
dc.subjectTranscriptionen
dc.subjectMusic Information Retrievalen
dc.subjectElectric Guitaren
dc.subjectEffect Modelingen
dc.subjectVirtual Analog Modelingen
dc.title從音訊到樂譜與音色:探討以吉他為核心的音樂資訊 檢索中的表示法與轉換zh_TW
dc.titleFrom Audio to Score and Tone: Exploring Representations and Transformations in Guitar-Oriented Music Information Retrievalen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree博士-
dc.contributor.coadvisor楊奕軒zh_TW
dc.contributor.coadvisorYi-Hsuan Yangen
dc.contributor.oralexamcommittee李宏毅;蘇黎;王新民zh_TW
dc.contributor.oralexamcommitteeHung-Yi Lee;Li Su;Hsin-Min Wangen
dc.subject.keyword音樂資訊檢索,電吉他,轉譜,虛擬類比建模,音訊效果建模,zh_TW
dc.subject.keywordMusic Information Retrieval,Electric Guitar,Transcription,Virtual Analog Modeling,Effect Modeling,en
dc.relation.page123-
dc.identifier.doi10.6342/NTU202502953-
dc.rights.note未授權-
dc.date.accepted2025-08-06-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
9.53 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved