請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99289完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 張智星 | zh_TW |
| dc.contributor.advisor | Jyh-Shing Roger Jang | en |
| dc.contributor.author | 陳宥華 | zh_TW |
| dc.contributor.author | Yu-Hua Chen | en |
| dc.date.accessioned | 2025-08-21T17:08:32Z | - |
| dc.date.available | 2025-08-22 | - |
| dc.date.copyright | 2025-08-21 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-02 | - |
| dc.identifier.citation | [1] DI unit. [Online] https://en.wikipedia.org/wiki/DI_unit.
[2] Guitar Rig. Native Instruments. [Online] https://www.native-instruments.com/en/products/komplete/guitar/guitar-rig-6-pro/. [3] JUCE: An open-source cross-platform C+ + application framework. [Online]https://github.com/juce-framework/JUCE. [4] WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499,2016. [5] J. Abeßer, H. Lukashevich, and G. Schuller. Feature-based extraction of pluckingand expression styles of the electric bass guitar. In Proc. ICASSP, 2010. [6] H. P. W. Abreu, R. Corey, and I. Roman. Leveraging electric guitar tones and effects to improve robustness in guitar tablature transcription modeling. International Conference on Digital Audio Effects (DAFx), 2024. [7] A. Agostinelli, T. I. Denk, Z. Borsos, J. Engel, M. Verzetti, A. Caillon, Q. Huang, A. Jansen, A. Roberts, M. Tagliasacchi, et al. MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325, 2023. [8] M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas. Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pages 456–473. Springer, 2022. [9] A. M. Barbancho, A. Klapuri, L. J. Tardon, and I. Barbancho. Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech and Language Processing, 20(3):915–921, 2012. [10] E. Benetos, S. Dixon, Z. Duan, and S. Ewert. Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36:20–30, 2019. [11] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri. Automatic music transcription: Challenges and future directions. Journal of Intelligent Information Systems, 41:407–434, 2013. [12] S. Böck, F. Krebs, and M. Schedl. Evaluating the online capabilities of onset detection methods. In Proc. ISMIR, pages 49–54, 2012. [13] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems (Neurips), 2020. [14] K. Chen, X. Du, B. Zhu, Z. Ma, T. Berg-Kirkpatrick, and S. Dubnov. Zero-shot audio source separation through query-based learning from weakly-labeled data. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022. [15] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning (ICML), 2020. [16] Y.-H. Chen, W. Choi, W.-H. Liao, M. A. Martínez Ramírez, K. W. Cheuk, Y. Mitsufuji, J.-S. R. Jang, and Y.-H. Yang. Improving unsupervised clean-to-rendered guitar tone transformation using GANs and integrated unaligned clean data. In International Conference on Digital Audio Effects (DAFx), 2024. [17] Y.-H. Chen, W.-Y. Hsiao, T.-K. Hsieh, J.-S. R. Jang, and Y.-H. Yang. Towards automatic transcription of polyphonic electric guitar music: A new dataset and a multi-loss transformer model In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. [18] Y.-H. Chen, Y.-H. Huang, W.-Y. Hsiao, and Y.-H. Yang. Automatic composition of guitar tabs by Transformers and groove modeling. In Proc. ISMIR, 2020. [19] Y.-H. Chen, Y.-T. Yeh, Y.-C. Cheng, J.-T. Wu, Y.-H. Ho, J.-S. R. Jang, and Y.-H. Yang. Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control. International Society for Music Information Retrieval (ISMIR), 2024. [20] Y.-P. Chen, L. Su, and Y.-H. Yang. Electric guitar playing technique detection in real-world recordings based on f0 sequence pattern recognition. In Proc. ISMIR, 2015. [21] M. Comunità, C. J. Steinmetz, H. Phan, and J. D. Reiss. Modelling black-box audio effects with time-varying feature modulation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [22] M. Comunità, D. Stowell, and J. D. Reiss. Guitar effects recognition and parameter estimation with convolutional neural networks. arXiv preprint arXiv:2012.03216, 2020. [23] J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y. Adi, and A. Défossez. Simple and controllable music generation. Advances in Neural Information Processing Systems, 36, 2024. [24] J. Covert and D. L. Livingston. A vacuum-tube guitar amplifier model using a recurrent neural network. In Proceedings of IEEE Southeastcon, 2013. [25] E.-P. Damskägg, L. Juvela, E. Thuillier, and V. Välimäki. Deep learning for tube amplifier emulation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [26] C. Deng, C. Yu, H. Lu, C. Weng, and D. Yu. PitchNet: Unsupervised singing voice conversion with pitch adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7749–7753, 2020. [27] A. Défossez, J. Copet, G. Synnaeve, and Y. Adi. High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022. [28] F. Eichas and U. Zölzer. Virtual analog modeling of guitar amplifiers with Wiener-Hammerstein models. In Proceedings of Annual Convention on Acoustics, 2018. [29] F. Eichas and U. Zölzer. Gray-box modeling of guitar amplifiers. Journal of the Audio Engineering Society, 2018. [30] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2009. [31] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio, Speech, and Language Processing, 18(6):1643–1654, 2010. [32] J. Gardner, I. Simon, E. Manilow, C. Hawthorne, and J. Engel. Mt3: Multi-task multitrack music transcription. International Conference on Learning Representations (ICLR), 2022. [33] Y. Gong, Y.-A. Chung, and J. Glass. AST: Audio Spectrogram Transformer. In International Speech Communication Association (INTERSPEECH), pages 571–575, 2021. [34] I. Goodfellow et al. Generative adversarial nets. Proc. Advances in neural information processing systems, 63(11):139–144, 2014. [35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [36] A. Gui, H. Gamper, S. Braun, and D. Emmanouilidou. Adapting Fréchet audio distance for generative music evaluation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1331–1335, 2024. [37] H. Guo, H. Lu, N. Hu, C. Zhang, S. Yang, L. Xie, D. Su, and D. Yu. Phonetic posteriorgrams based many-to-many singing voice conversion via adversarial training. arXiv preprint arXiv:2012.01837, 2020 [38] H. Guo, Z. Zhou, F. Meng, and K. Liu. Improving adversarial waveform generation based singing voice conversion with harmonic signals. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6657–6661, 2022. [39] D. Ha, A. Dai, and Q. V. Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016. [40] C. Hawthorne, E. Elsen, J. Song, A. Roberts, I. Simon, C. Raffel, J. Engel, S. Oore, and D. Eck. Onsets and frames: Dual-objective piano transcription. International Society for Music Information Retrieval (ISMIR), 2017. [41] C. Hawthorne, I. Simon, R. Swavely, E. Manilow, and J. Engel. Sequence-to-sequence piano transcription with transformers. International Society for Music Information Retrieval (ISMIR), 2021. [42] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck. Enabling factorized piano music modeling and generation with the maestro dataset. International Conference on Learning Representations (ICLR), 2018. [43] T.-S. Huang, P.-C. Yu, and L. Su. Note and playing technique transcription of electric guitar solos in real-world music performance. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [44] J. Imort, G. Fabbro, M. A. M. Ramírez, S. Uhlich, Y. Koyama, and Y. Mitsufuji. Distortion audio effects: Learning how to recover the clean signal. arXiv preprint arXiv:2202.01664, 2022. [45] W. Jang, D. Lim, J. Yoon, B. Kim, and J. Kim. Univnet: A neural vocoder with multi-resolution spectrogram discriminators for high-fidelity waveform generation. arXiv preprint arXiv:2106.07889, 2021. [46] L. Juvela, E.-P. Damskägg, A. Peussa, J. Mäkinen, T. Sherson, S. I. Mimilakis, K. Rauhanen, and A. Gotsopoulos. End-to-end amp modeling: from data to controllable guitar amplifier models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [47] T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo. CycleGAN-VC2: Improved cyclegan-based non-parallel voice conversion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6820–6824, 2019. [48] C. Kehling, J. Abeßer, C. Dittmar, and G. Schuller. Automatic tablature transcription of electric guitar recordings by estimation of score-and instrument-related parameters. In International Conference on Digital Audio Effects (DAFx), 2014. [49] K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi. Fréchet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2018. [50] J. W. Kim and J. P. Bello. Adversarial learning for improved onsets and frames music transcription. In Proc. ISMIR, 2019. [51] D. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015. [52] D. P. Kingma. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015. [53] J. Kong, J. Kim, and J. Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33:17022–17033, 2020. [54] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang. High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021. [55] J. Koo, M. A. Martínez-Ramírez, W.-H. Liao, S. Uhlich, K. Lee, and Y. Mitsufuji. Music mixing style transfer: A contrastive learning approach to disentangle audio effects. arXiv preprint arXiv:2211.02247, 2023. [56] K. Kumar, R. Kumar, T. De Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. De Brebisson, Y. Bengio, and A. C. Courville. MelGAN: Generative adversarial networks for conditional waveform synthesis. Advances in Neural Information Processing systems, 32, 2019. [57] R. Kumar, P. Seetharaman, A. Luebs, I. Kumar, and K. Kumar. High-fidelity audio compression with improved rvqgan. Advances in Neural Information Processing Systems (NeurIPS), 2024. [58] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning, pages 1558–1566, 2016. [59] J. H. Lee, H.-S. Choi, and K. Lee. Audio query-based music source separation. International Society for Music Information Retrieval (ISMIR), 2019. [60] X. Li, S. Liu, and Y. Shan. A hierarchical speaker representation framework for oneshot singing voice conversion. International Speech Communication Association (INTERSPEECH), pages 4307–4311, 2022. [61] Y. A. Li, A. Zare, and N. Mesgarani. StarGANv2-VC: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion. arXiv preprint arXiv:2107.10394, 2021. [62] J. H. Lim and J. C. Ye. Geometric GAN. arXiv preprint arXiv:1705.02894, 2017. [63] S. Liu, Y. Cao, N. Hu, D. Su, and H. Meng. FastSVC: Fast cross-domain singing voice conversion with feature-wise linear modulation. In IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2021. [64] X. Liu, Q. Kong, Y. Zhao, H. Liu, Y. Yuan, Y. Liu, R. Xia, Y. Wang, M. D. Plumbley, and W. Wang. Separate anything you describe. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2024. [65] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [66] E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux. Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019. [67] M. A. Martínez Ramírez, E. Benetos, and J. D. Reiss. Deep learning for black-box modeling of audio effects. Applied Sciences, 10(2):638, 2020. [68] M. A. Martínez Ramírez, W. Liao, C. Nagashima, G. Fabbro, S. Uhlich, and Y. Mitsufuji. Automatic music mixing with deep learning and out-of-domain data. In International Society for Music Information Retrieval (ISMIR), 2022. [69] S. Oore et al. This time with feeling: Learning expressive musical performance. Neural Computing and Applications, 2018. [70] H. Pedroza, G. Meza, and I. R. Roman. Egfxset: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb. ISMIR Late Breaking Demo, 2022. [71] E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. In AAAI Conference on Artificial Intelligence, 2018. [72] C. Raffel et al. mir_eval: A transparent implementation of common MIR. In Proc. ISMIR, pages 367–372, 2014. [73] L. Reboursière et al. Left and right-hand guitar playing techniques detection. In Proc. NIME, pages 7–10, 2012. [74] D. Rethage, J. Pons, and X. Serra. A WaveNet for speech denoising. arXiv preprint arXiv:1706.07162, 2018. [75] A. Richard, D. Markovic, I. D. Gebru, S. Krenn, G. A. Butler, F. Torre, and Y. Sheikh. Neural synthesis of binaural speech from mono audio. In International Conference on Learning Representations, 2021. [76] X. Riley, D. Edwards, and S. Dixon. High resolution guitar transcription via domain adaptation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. [77] P. Sarmento, A. Kumar, C. Carr, Z. Zukowski, M. Barthet, and Y.-H. Yang. Dadagp: A dataset of tokenized guitarpro songs for sequence models. International Society for Music Information Retrieval (ISMIR), 2021. [78] S. Sigtia, E. Benetos, and S. Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio, Speech, and Language Processing, 24(5):927–939, 2016. [79] P. Sobot. pedalboard, 2021. [80] J. Spijkervet and J. A. Burgoyne. Contrastive learning of musical representations. In International Society for Music Information Retrieval (ISMIR), 2021. [81] M. Stein, J. Abeßer, C. Dittmar, and G. Schuller. Automatic detection of audio effects in guitar and bass recordings. In Audio Engineering Society Convention 128 (AES), 2010. [82] C. J. Steinmetz, N. J. Bryan, and J. D. Reiss. Style transfer of audio effects with differentiable signal processing. J. Audio Eng. Soc, 70(9):708–721, 2022. [83] C. J. Steinmetz, J. Pons, S. Pascual, and J. Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021. [84] C. J. Steinmetz and J. D. Reiss. pyloudnorm: A simple yet flexible loudness meter in python. In 150th AES Convention, 2021. [85] C. J. Steinmetz and J. D. Reiss. Steerable discovery of neural audio effects. arXiv preprint arXiv:2112.02926, 2021. [86] C. J. Steinmetz and J. D. Reiss. Efficient neural networks for real-time analog audio effect modeling. In 152nd Audio Engineering Society Convention, 2022. [87] L. Su and Y.-H. Yang. Escaping from the abyss of manual annotation: New methodology of building polyphonic datasets for automatic music transcription. In Proc. CMMR, 2015. [88] L. Su, L.-F. Yu, and Y.-H. Yang. Sparse cepstral, phase codes for guitar playing technique classification. In Proc. ISMIR, pages 9–14, 2014. [89] T.-W. Su, Y.-P. Chen, L. Su, and Y.-H. Yang. Tent: Technique-embedded note tracking for real-world guitar solo recordings. Transactions of the International Society for Music Information Retrieval (TISMIR), 2019. [90] N. Takahashi, M. K. Singh, and Y. Mitsufuji. Hierarchical disentangled representation learning for singing voice conversion. In International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2021. [91] J. Thickstun, Z. Harchaoui, D. P. Foster, and S. M. Kakade. Invariances and data augmentation for supervised music transcription. In Proc. ICASSP, pages 2241–2245, 2018. [92] K. Toyama, T. Akama, Y. Ikemiya, Y. Takida, W.-H. Liao, and Y. Mitsufuji. Automatic piano transcription with hierarchical frequency-time transformer. International Society for Music Information Retrieval (ISMIR), 2023. [93] L. Van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2008. [94] A. Vaswani. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 2017. [95] A. Wiggins and Y. E. Kim. Guitar tablature estimation with a convolutional neural network. In International Society for Music Information Retrieval (ISMIR), 2019. [96] A. Wright, E.-P. Damskägg, L. Juvela, and V. Välimäki. Real-time guitar amplifier emulation with deep learning. Applied Sciences, 10(3):766, 2020. [97] A. Wright, E.-P. Damskägg, and V. Välimäki. Real-time black-box modelling with recurrent neural networks. In International Conference on Digital Audio Effects, 2019. [98] A. Wright and V. Valimaki. Neural modeling of phaser and flanging effects. Journal of the Audio Engineering Society, 69(7/8):517–529, 2021. [99] A. Wright and V. Välimäki. Grey-box modelling of dynamic range compression. In International Conference on Digital Audio Effects, pages 304–311, 2022. [100] A. Wright, V. Välimäki, and L. Juvela. Adversarial guitar amplifier modelling with unpaired data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [101] J.-T. Wu, J.-Y. Wang, J.-S. R. Jang, and L. Su. A unified model for zero-shot singing voice conversion and synthesis. In International Society for Music Information Retrieval Conference (ISMIR), 2022. [102] Y. Wu et al. Omnizart: A general toolbox for automatic music transcription. arXiv preprint arXiv:2106.00497, 2021. [103] Q. Xi, R. M. Bittner, J. Pauwels, X. Ye, and J. P. Bello. Guitarset: A dataset for guitar transcription. In International Society for Music Information Retrieval (ISMIR), 2018. [104] H. Yakura, K. Watanabe, and M. Goto. Self-supervised contrastive learning for singing voices. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:1614–1623, 2022. [105] J. Yang, J. Lee, Y. Kim, H. Cho, and I. Kim. Vocgan: A high-fidelity real-time vocoder with a hierarchically-nested adversarial network. arXiv preprint arXiv:2007.15256, 2020. [106] Y.-T. Yeh, B.-Y. Chen, and Y.-H. Yang. Exploiting pre-trained feature networks for generative adversarial networks in audio-domain loop generation. In International Society for Music Information Retrieval Conference (ISMIR), 2022. [107] Y.-T. Yeh, W.-Y. Hsiao, and Y.-H. Yang. Hyper recurrent neural network: Condition mechanisms for black-box audio effect modeling, 2024. [108] H. Yin, G. Cheng, C. J. Steinmetz, R. Yuan, R. M. Stern, and R. B. Dannenberg. Modeling analog dynamic range compressors using deep learning and state-space models. arXiv preprint arXiv:2403.16331, 2024. [109] Y. Zang, Y. Zhong, F. Cwitkowitz, and Z. Duan. Synthtab: Leveraging synthe- sized data for guitar tablature transcription. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. [110] L. Zhang, C. Yu, H. Lu, C. Weng, C. Zhang, Y. Wu, X. Xie, Z. Li, and D. Yu. DurIAN-SC: Duration informed attention network based singing voice conversion system. In International Speech Communication Association (INTERSPEECH), pages 1231–1235, 2020. [111] Z. Zhang, E. Olbrych, J. Bruchalski, T. J. McCormick, and D. L. Livingston. A vacuum-tube guitar amplifier model using long/short-term memory networks. SoutheastCon, 2018. [112] H. Zhao, C. Zhang, B. Zhu, Z. Ma, and K. Zhang. S3T: Self-supervised pre-training with Swin Transformer for music classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 606–610, 2022 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99289 | - |
| dc.description.abstract | 電吉他是現代音樂的重要元素,其與原聲吉他及鋼琴的最大區別,在於其音色高度依賴經由擴大器與效果器處理後所產生的各種變化。然而,相較於以鋼琴為主的研究,吉他導向的音樂資訊檢索(Music Information Retrieval, MIR)發展相對落後,主因包括資料集稀缺(受限於版權與蒐集困難)以及效果處理音訊所需的特殊表示需求。本論文致力於推進吉他導向的 MIR,提出兩項核心貢獻:首先,建立並釋出兩個新資料集──Electric Guitar Database(EGDB)及其擴充版本 EGDB-PG;其次,針對兩項關鍵表示轉換任務設計創新深度學習演算法:(1)透過電吉他自動轉譜,實現由效果處理音訊轉換為符號樂譜的 audio-to-score 轉換;(2)透過電吉他音箱音色建模,實現由乾淨音色轉換為帶有效果的音色的 clean-to-wet 轉換,並探索無監督與零樣本等深度學習場景以重建多樣音色效果。透過實驗驗證,本研究有效提升吉他音訊、樂譜與音色的表示能力,有效解決電吉他效果處理所帶來的挑戰,為更穩健的基於深度學習的吉他音樂分析與吉他音箱模擬奠定基礎。 | zh_TW |
| dc.description.abstract | Electric guitars, central to modern music, are distinguished from acoustic guitars and pianos by their reliance on effects processing through amplifiers and pedals, which introduces complex tonal variations. However, guitar-oriented music information retrieval (MIR) lags behind piano-oriented research due to scarce datasets, constrained by copyright and collection challenges, and the unique representational demands of effect-laden audio. This thesis advances guitar-oriented MIR by curating novel datasets, the Electric Guitar Database (EGDB) and its expanded version (EGDB-PG), and proposing original deep learning algorithms for two key representation transformations: (1) audio-to-score transformation via electric guitar transcription, leveraging EGDB and EGDB-PG to map effect-processed audio to symbolic notation; and (2) clean-to-wet audio transformation through electric guitar amplifier tone modeling, exploring unsupervised and zero-shot paradigms to replicate diverse effects. Validated through empirical evaluations, these contributions enhance audio, score, and tone representations in guitar-oriented MIR, addressing the distinct challenges posed by electric guitar effects and paving the way for robust deep-learning based electric guitar music analysis and effect modeling systems. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T17:08:32Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-21T17:08:32Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements i
摘要 iii Abstract v Contents vii List of Figures xiii List of Tables xvii Chapter 1 Introduction 1 1.1 Guitar in Music Information Retrieval . . . . . . . . . . . . . . . . . 1 1.2 Representation and Transformation in Guitar Music . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Background 7 2.1 Acoustic Guitar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Electric Guitar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Guitar Amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 How Amplifiers Shape Tone . . . . . . . . . . . . . . . . . . . . . 9 2.3.2 Effects and Amplifiers in MIR . . . . . . . . . . . . . . . . . . . . 10 Chapter 3 Dataset 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Related Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 IDMT-SMT-GUITAR Database . . . . . . . . . . . . . . . . . . . . 14 3.2.2 GuitarSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.3 Guitar Playing Techniques Database . . . . . . . . . . . . . . . . . 15 3.2.4 SynthTab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 EGDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.2 Audio Collection Process . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.3 Annotation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.4 Timbre Re-render Process . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Electric Guitar Transcription 21 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Proposed Transcription Model . . . . . . . . . . . . . . . . . . . . 24 4.2.1.1 Transformer Encoder . . . . . . . . . . . . . . . . . . 26 4.2.1.2 Transformer Decoder . . . . . . . . . . . . . . . . . . 26 4.2.1.3 Implementation Details . . . . . . . . . . . . . . . . . 27 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Evaluation on DI Recordings . . . . . . . . . . . . . . . . . . . . . 28 4.3.2 Evaluation on Unseen Timbres Rendered with Amps . . . . . . . . 29 4.3.3 Evaluation on Real-world Recordings . . . . . . . . . . . . . . . . 30 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 5 Unsupervised Amplifier Modeling 33 5.1 Tone Modeling in Commercial Products . . . . . . . . . . . . . . . . 33 5.2 Neural Virtual Analog Modeling . . . . . . . . . . . . . . . . . . . . 35 5.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3.1 Neural Amplifier Modeling . . . . . . . . . . . . . . . . . . . . . . 37 5.3.2 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . 39 5.3.3 Backbone Model for Generator . . . . . . . . . . . . . . . . . . . . 39 5.3.4 Discriminators for GANs Training . . . . . . . . . . . . . . . . . . 40 5.3.5 Clean Audio from Existing Datasets . . . . . . . . . . . . . . . . . 41 5.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4.3 GAN Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.5.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 48 5.5.4 Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.6 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.6.1 Comparison with Baseline Methods . . . . . . . . . . . . . . . . . 50 5.6.2 Clean Audio Combination . . . . . . . . . . . . . . . . . . . . . . 51 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.7.1 Benefit of the GAN-based Approach for VA Modeling . . . . . . . 54 5.7.2 Artifacts Generated by the Proposed Model . . . . . . . . . . . . . 55 5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Chapter 6 One-to-Many and Zero-Shot Amplifier Modeling 57 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3 Multi-Tone Amplifier Modeling . . . . . . . . . . . . . . . . . . . . 63 6.3.1 Tone Embedding Encoder . . . . . . . . . . . . . . . . . . . . . . . 63 6.3.2 Conditional Generator . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3.3 Source of the Reference Audio Signal . . . . . . . . . . . . . . . . 67 6.3.4 Zero-Shot Tone Transfer . . . . . . . . . . . . . . . . . . . . . . . 68 6.3.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.1 Tone Embedding Visualization . . . . . . . . . . . . . . . . . . . . 70 6.4.2 Efficacy of One-to-Many Neural Amp Modeling . . . . . . . . . . 71 6.4.3 Zero-Shot Learning on Unseen Amplifiers . . . . . . . . . . . . . . 73 6.4.4 Case Study on Zero-Shot Amp Tone Transfer . . . . . . . . . . . . 74 6.5 Plugin Version in DAW . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 77 6.5.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5.3 Real-Time Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Chapter 7 Electric Guitar Transcription with Large-Scale DatasetS 81 7.1 Tone and Presets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.3 Challenges for Amplifier-Rendered Automatic Guitar Transcription . 83 7.3.1 Limited Data Quantity in Datasets . . . . . . . . . . . . . . . . . . 84 7.3.2 Lack of Tone Diversity . . . . . . . . . . . . . . . . . . . . . . . . 84 7.3.3 Complexities of Tablature Format from Guitar Fretboard Design . . 85 7.3.4 Expressive Playing Techniques . . . . . . . . . . . . . . . . . . . . 85 7.3.5 Current Efforts and Remaining Gaps . . . . . . . . . . . . . . . . . 86 7.4 Tone-Informed MIR . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.5 EGDB-PG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.6 Amplifier-Rendered Transcription Approaches . . . . . . . . . . . . 89 7.6.1 Tone Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.7 Transcription System . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.7.1 hFT-Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.7.2 Architecture of Tone-informed Transformer . . . . . . . . . . . . . 92 7.7.3 Content Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 94 7.7.4 Audio Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.8.1 Impact of Content Augmentation and Audio Normalization on Transcription Performance . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.8.2 Baseline Performance with Audio Normalization . . . . . . . . . . 97 7.8.3 Effect of Content Augmentation . . . . . . . . . . . . . . . . . . . 98 7.8.4 Impact of Audio Normalization under Content Augmentation . . . . 98 7.8.5 Performance without Tone Embeddings . . . . . . . . . . . . . . . 99 7.8.6 Comparison with Other Transcription Models . . . . . . . . . . . . 99 7.8.7 Evaluating Generalizability to Tone and Content Variations . . . . . 102 7.8.8 Impact of Tone Augmentation (# of Tones) . . . . . . . . . . . . . . 103 7.8.9 Impact of Content Augmentation . . . . . . . . . . . . . . . . . . . 104 7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 8 Conclusions and Future Work 107 8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 References 109 | - |
| dc.language.iso | en | - |
| dc.subject | 電吉他 | zh_TW |
| dc.subject | 虛擬類比建模 | zh_TW |
| dc.subject | 音訊效果建模 | zh_TW |
| dc.subject | 音樂資訊檢索 | zh_TW |
| dc.subject | 轉譜 | zh_TW |
| dc.subject | Transcription | en |
| dc.subject | Music Information Retrieval | en |
| dc.subject | Electric Guitar | en |
| dc.subject | Effect Modeling | en |
| dc.subject | Virtual Analog Modeling | en |
| dc.title | 從音訊到樂譜與音色:探討以吉他為核心的音樂資訊 檢索中的表示法與轉換 | zh_TW |
| dc.title | From Audio to Score and Tone: Exploring Representations and Transformations in Guitar-Oriented Music Information Retrieval | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 博士 | - |
| dc.contributor.coadvisor | 楊奕軒 | zh_TW |
| dc.contributor.coadvisor | Yi-Hsuan Yang | en |
| dc.contributor.oralexamcommittee | 李宏毅;蘇黎;王新民 | zh_TW |
| dc.contributor.oralexamcommittee | Hung-Yi Lee;Li Su;Hsin-Min Wang | en |
| dc.subject.keyword | 音樂資訊檢索,電吉他,轉譜,虛擬類比建模,音訊效果建模, | zh_TW |
| dc.subject.keyword | Music Information Retrieval,Electric Guitar,Transcription,Virtual Analog Modeling,Effect Modeling, | en |
| dc.relation.page | 123 | - |
| dc.identifier.doi | 10.6342/NTU202502953 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-08-06 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 9.53 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
