Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73892
Title: 標準漢語歌聲中歌詞辨識與一般語音辨識差異之研究
Research on the Difference between Lyric Recognition and Speech Recognition in Mandarin Songs
Authors: Chu-An Yu
游筑安
Advisor: 黃乾綱(Chien-Kang Huang)
Keyword: 語音辨識,音樂資訊檢索,卷積神經網路,伴奏歌聲,清唱訊號,漢語拼音,
Speech recognition,Lyric recognition,Music Information Retrieval,Convolutional Neural Network,Mandarin pinyin,
Publication Year : 2019
Degree: 碩士
Abstract: 歌曲搜尋為大眾生活中不可或缺的一部份,在不知道曲名、不知道歌手的情況下,只要能夠哼唱幾句,就可以搜尋到想找的歌曲。如今歌唱搜尋的網站、手機應用程式大多是採用旋律搜尋。但這樣的搜尋方式,對無法將準確音調重現的使用者們來說,並不方便。在旋律不準確的情況下,無法順利得到正確的結果。因此若能將歌聲中的字音辨識出來,將能大幅提升使用者搜尋的正確率。
本研究目的為比較歌唱與朗讀語音的不同,並藉此提升歌唱中歌詞的語音辨識正確率。研究流程分成三部分,第一部分先對歌唱與朗讀的語音辨識做觀察比較,再由觀察結果決定實驗方向。第二部分做歌聲前處理,去除噪音及背景音,留下人聲。第三部分特徵抽取,經過預加重、加窗等處理,將音訊轉換成聲譜圖,做為歌唱模型訓練的輸入圖像。第四部分使用端對端卷積神經網路 (CNN) 搭配鏈結式時間分類算法 (CTC) 訓練語音模型,實現歌曲字音辨識的功能。
Song retrieval is an indispensable part in modern life. One expects to find the song simply by singing few words or humming a period of the it. Most websites and mobile apps use features of melody in song retrieval tasks nowadays. However, the search method is inconvenient for users who cannot sing in accurate tones. It will not get the correct result for the inaccurate melody. Therefore, if the words in the song can be recognized, it will greatly improve the accuracy of the song search.
This research is divided into four parts. The first part compare singing audio and reading audio. The second part is preprocessing the song. It removes the background noise and forces vocal. The third part is extracting features. It converts the audio into a spectrogram as an input image for the training model. The fourth part uses convolutional neural network (CNN) model and connectionist temporal classification (CTC) model to train acoustic model.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73892
DOI: 10.6342/NTU201903515
Fulltext Rights: 有償授權
Appears in Collections:工程科學及海洋工程學系

Files in This Item:
File SizeFormat 
ntu-108-1.pdf
  Restricted Access
4.34 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved