Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66127
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-shan Lee)
dc.contributor.authorTsung-Han Hsiehen
dc.contributor.author謝宗翰zh_TW
dc.date.accessioned2021-06-17T00:22:45Z-
dc.date.available2020-02-18
dc.date.copyright2020-02-18
dc.date.issued2019
dc.date.submitted2020-02-11
dc.identifier.citation[1] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Au-tomatic differentiation in PyTorch. In NIPS-W, 2017.
[2] Basaran, D. and Essid, S. and Peeters, G. Main melody extraction with source-filter NMF and CRNN. In Proc. ISMIR, 2018.
[3] Bernhard Lehner, Gerhard Widmer, and Reinhard Sonnleitner. On the reduction of false positives in singing voice detection. In Proc. ICASSP, pages 7480–7484, 2014.
[4] R. M. Bittner et al. Deep salience representations for f0 estimation in polyphonic music. In Proc. ISMIR, pages 63–70, 2017.
[5] R. M. Bittner et al. Deep salience representations for f0 estimation in poly-phonic music. In Proc. ISMIR, 2017. [Online] https://github.com/rabitt/ ismir2017-deepsalience.
[6] Carl Southall, Ryan Stables and Jason Hockman. Improving peak-picking using multiple time-step loss. In Proc. ISMIR, pages 313–320, 2018.
[7] Chao-Ling Hsu and Jyh-Shing Roger Jang. On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Trans. Audio, Speech, and Lang. Proc., 18(2):310–319, 2010. [Online]https://sites.google.com/site/unvoicedsoundseparation/.
[8] L.-C. Chen, Y. Zhu, P. George, S. Florian, and A. Hartwig. Encoder-decoder with atrous separable convolution for semantic image segmentation. eprint arXiv:1802.02611, 2018.
[9] Daniel Stoller, Sebastian Ewert, Simon Dixon. Jointly detecting and separating singing voice: A multi-task approach. In Proc. Latent Variable Analysis and Signal Separation, pages 329–339, 2018.
[10] Hao-Wen Dong and Yi-Hsuan Yang. Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In Proc. ISMIR, 2018.
[11] H. Indefrey, W. Hess, and G. Seeser. Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain-preliminary results. In Proc. ICASSP, 1985.
[12] B. D. J. L. Durrieu, G. Richard and C. Fevotte. Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Transactions on Audio, Speech, and Language Processing, pages 564–575, 2010.
[13] Jan Schluter and Thomas Grill. Exploring data augmentation for improved singing voice detection with neural networks. In Proc. ISMIR, 2015.
[14] Justin Salamon, Emilia Gomez, Daniel P. W. Ellis and Gael Richard. Melody extrac-tion from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, 31(2):118–134, 2014.
[15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[16] G. Klambauer et al. Self-normalizing neural networks. arXiv preprint arXiv:1706.02515, 2017.
[17]A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio, Speech, Lang. Proc., 16(2):255-266, 2008.
[18] T. Kobayashi and S. Imai. Spectral analysis using generalized cepstrum. IEEE Trans. Acoust., Speech, Signal Proc., 32(5):1087-1089, 1984.
[19] S. Kum, C. Oh, and J. Nam. Melody extraction on vocal segments using multi-column deep neural networks. In Proc. ISMIR, pages 819-225, 2016.
[20] T.-Y. Lin et al. Focal loss for dense object detection. eprint arXiv:1708.02002, 2017.
[21] W.-T. Lu and L. Su. Vocal melody extraction with semantic segmentation and audio-symbolic domain transfer learning. In Proc. ISMIR, pages 521-528, 2018. [Online] https://github.com/s603122001/Vocal-Melody-Extraction.
[22] Nadine Kroher and Emilia Gomez. Automatic transcription of flamenco singing from polyphonic music recordings. IEEE/ACM Trans. Audio, Speech, and Language Pro-cessing, 24(5):901-913, 2016.
[23] Peeters, G. Music pitch representation by periodicity measures based on combined temporal and spectral representations. In Proc. IEEE ICASSP, 2006.
[24] Rachel Bittner, Justin Salamon, Mike Tierney , Matthias Mauch, Chris Cannam and Juan Bello1. MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Proc. ISMIR, 2014. [Online] http://medleydb.weebly.com/.
[25] Rachel M Bittner, Justin Salamon, Juan J Bosch and Juan Pablo Bello. Pitch contours as a mid-level representation for music informatics. In AES Int. Conf. Semantic Audio, 2017.
[26] Raffel, Colin, Brian McFee, Eric J. Humphrey, Justin Salamon, Oriol Nieto, Dawen Liang, and Daniel PW Ellis. mir eval: a transparent implementation of common mir metrics. In Proc. ISMIR, 2014. [Online] https://github.com/craffel/mir\_eval.
[27] F. Rigaud and M. Radenen. Singing voice melody transcription using deep neural networks. In ISMIR, pages 737-743, 2016.
[28] Scott Beveridge and Don Knox. Popular music and the role of vocal melody in perceived emotion. Psychology of Music, 46(3):411-423, 2018.
[29] L. Su. Between homomorphic signal processing and deep neural networks: Construct-ing deep algorithms for polyphonic music transcription. In Proc. APSIPA ASC, 2017.
[30] L. Su. Vocal melody extraction using patch-based CNN. In Proc. ICASSP, 2018.
[31] L. Su and Y.-H. Yang. Combining spectral and temporal representations for mul-tipitch estimation of polyphonic music. IEEE Trans. Audio, Speech, and Language Processing, 23(10):1600-1612, 2015.
[32]Tero Tolonen, and Matti Karjalainen. A computationally eÿcient multipitch analysis model. IEEE Speech Audio Processing, 8(6):708–716, 2000.
[33] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai. Mel-generalized cepstral analysis: a unified approach to speech spectral estimation. In Proc. Int. Conf. Spoken Language Processing, 1994.
[34] Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla. SegNet: A deep convo-lutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 39(12):2481–2495, 2017.
[35] Y.-T. Wu, B. Chen, and L. Su. Automatic music transcription leveraging generalized cepstral features and deep learning. In Proc. ICASSP, 2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66127-
dc.description.abstract在音樂信號處理的領域中,旋律提取一直是很重要的任務。在本論文中,我們提出了一個專為此設計的流線型編碼/解碼器網路模型。我們有兩項技術貢獻。首先,啟發於一個最先進的語意像素分割模型,我們通過向下池化層和向上池化層之間的池化索引來定位旋律頻率。我們用更少的卷機層與更簡單的卷積模塊就可以達到接近最先進水平的結果。第二,我們提出了一種使用神經網路中瓶頸層來預測每ㄧ楨中旋律是否存在的方法,並且使得我們不需要取闕值,可以用簡單的arg-max函數來獲得最終結果。我們的實驗在人聲旋律提取及主旋律旋律提取上,兩者都驗證了模型的有效性。zh_TW
dc.description.abstractMelody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixelwise segmentation, we pass through the pooling indices between pooling and un-pooling layers to localize the melody in frequency. We can achieve result close to the state-of-the-art with much fewer convolutional layers and simpler convolution modules. Second, we propose a way to use the bottleneck layer of the network to estimate the existence of a melody line for each time frame, and make it possible to use a simple argmax function instead of ad-hoc thresholding to get the final estimation of the melody line. Our experiments on both vocal melody extraction and general melody extraction validate the effectiveness of the proposed model.en
dc.description.provenanceMade available in DSpace on 2021-06-17T00:22:45Z (GMT). No. of bitstreams: 1
ntu-108-R06946013-1.pdf: 2116393 bytes, checksum: 922b7edaa2984f588014460102073238 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsContents
Abstract ii
List of Figures vi
List of Tables vii
Chapter 1 Introduction Chapter 2 Related Works 1
Chapter 2 Related Works 7
2.1 Deep Salience Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 SF-NMF-CRNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Lu and Su’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 Proposed Model 12
3.1 Model Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Encoder and Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Max-pooling with Rectangle Kernel . . . . . . . . . . . . . . . 17
3.4 Non-melody Detector and ArgMax Layer . . . . . . . . . . 18
3.5 Weighted BCELoss with Melody/non-melody Ratio . .20
3.6 Model Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 4 Experiments 26
4.1 Baseline Methods and Evaluation Metrics . . . . . . . . . 27
4.2 Vocal Melody Extraction . . . . . . . . . . . . . . . . . . . . . . . .28
4.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 General Melody Extraction . . . . . . . . . . . . . . . . . . . . . . .30
4.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Experiment in Different Input Representation . . . . . . . .31
4.5 Experimental Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
4.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
Chapter 5 Conclusions and Future work 37
Bibliography 39
dc.language.isoen
dc.title專為旋律提取設計的流線型編碼器/解碼器架構zh_TW
dc.titleA streamlined encoder/decoder architecture for melody
extraction
en
dc.typeThesis
dc.date.schoolyear108-1
dc.description.degree碩士
dc.contributor.coadvisor楊奕軒(Yi-Hsuan Yang)
dc.contributor.oralexamcommittee劉奕汶(Yi-Wen Liu),蔡偉和(Wei-Ho Tsai),陳冠宇(Kuan-Yu Chen),尤信程(Shing-Chern You)
dc.subject.keyword旋律提取,編碼/解碼器,zh_TW
dc.subject.keywordmelody extraction,encoder/decoder,en
dc.relation.page44
dc.identifier.doi10.6342/NTU202000419
dc.rights.note有償授權
dc.date.accepted2020-02-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資料科學學位學程zh_TW
顯示於系所單位:資料科學學位學程

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  目前未授權公開取用
2.07 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved