Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48330
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李明穗(Ming-Sui Lee)
dc.contributor.authorCheng-Te Leeen
dc.contributor.author李政德zh_TW
dc.date.accessioned2021-06-15T06:52:47Z-
dc.date.available2012-09-18
dc.date.copyright2011-09-18
dc.date.issued2011
dc.date.submitted2011-08-30
dc.identifier.citation[1] M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-based music information retrieval: current directions and future challenges,” Proc. IEEE, vol. 96, no. 4, pp. 668–696, Apr. 2008.
[2] MIREX: Music Information Retrieval Evaluation eXchange. [Online] http://www.music-ir.org/mirex/.
[3] M. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: Stateof-the-art and challenges,” ACM Trans. Multimedia Comput. Commun. Appl. Vol. 2, no. 1, pp. 1–19, 2006.
[4] S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press, 1990.
[5] H. F. Olson, Music, Physics and Engineering. Dover Publications, 1967.
[6] A. de Cheveigne and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, pp. 1917–1930, 2002.
[7] G. Peeters, “Music pitch representation by periodicity measures based on combined temporal and spectral representations,” in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, pp. 53–56, May 2006.
[8] J. A. Moorer, “On the transcription of musical sound by computer,” Comput. Music J., vol. 1, no. 4, pp. 32–38, 1977.
[9] M. Marolt, “A connectionist approach to automatic transcription of polyphonic piano music,” IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439–449, 2004.
[10] J. Bello, L. Daudet, and M. Sandler, “Automatic piano transcription using frequency and time-domain information,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp. 2242–2251, Nov. 2006.
[11] G. Poliner and D. Ellis, “A discriminative model for polyphonic piano transcription,” EURASIP J. Advances Signal Process., vol. 8, pp. 1–9, 2007.
[12] V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Trans. Audio, Speech Lang. Process., vol. 18, no.6, pp. 1643-1654, Aug. 2010.
[13] A. Klapuri, “Multiple fundamental frequency estimation based on harmonicity and spectral smoothness,” IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 804–816, Nov. 2003.
[14] A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. Int. Conf. Music Inform. Retrieval, Victoria, Canada, pp. 216–221, Oct. 2006.
[15] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf
[16] C. Yeh, A. Roebel, and X. Rodet, “Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1116–1126, Aug. 2010.
[17] Z. Duan, B. Pardo, and C. Zhang, “Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2121–2133, Nov. 2010.
[18] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription,” In Proc. Workshop Applicat. Signal Process. Audio Acoust., pp. 177–180, New Paltz, NY, USA, Oct. 2003.
[19] S. A. Abdallah and M. D. Plumbley, “Polyphonic transcription by non-negative sparse coding of power spectra,” in Proc. Int. Conf. Music Inform. Retrieval, Barcelona, Spain, pp. 318–325, Oct. 2004.
[20] T. Blumensath and M. Davies, “Sparse and shift-invariant representations of music,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 50–57, Jan. 2006.
[21] A. Cont, “Realtime multiple pitch observation using sparse non-negative constraints,” in Proc. Int. Conf. Music Inform. Retrieval, Victoria, Canada, pp. 206–212, Oct. 2006.
[22] C. Fevotte, N. Bertin, and J.-L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis,” Neural Comput., vol. 21, no. 3, pp. 793–830, Mar. 2009.
[23] A. Dessein, A. Cont, and G. Lemaitre, “Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence,” in Proc. Int. Conf. Music Inform. Retrieval, Utrecht, Netherlands, pp. 489–494, Aug. 2010.
[24] E. Vincent, N. Bertin, and R. Badeau, “Adaptive harmonic spectral decomposition for multiple pitch estimation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 3, pp. 528–537, Mar. 2010.
[25] C. Yeh, N. Bogaards, and A. Roebel, “Synthesized polyphonic music database with verifiable ground truth for multiple-F0 estimation,” in Proc. Int. Conf. Music Inform. Retrieval, Vienna, Austria, pp. 393–398, 2007.
[26] MIDI Manufacturers Association, “Complete MIDI 1.0 detailed specification,” Nov. 2001.
[27] Charles Dodge, Thomas A. Jerse, Computer Music: Synthesis, Composition and Performance. New York: Schirmer Books, pp. 80–84, 1997.
[28] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.
[29] Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification via sparse representations of auditory temporal modulations,” in Proc. 17th European Signal Process. Conf., Glasgow, Scotland, Aug. 2009.
[30] J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-based sparse representations for noise robust automatic speech recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2067–2080, Sep. 2011.
[31] C.-T. Lee, Y.-H. Yang, K.-S. Lin and H. H. Chen. (2010). Multiple fundamental frequency estimation of piano signals via sparse representation of Fourier Coefficients. Music Information Retrieval Evaluation eXchange. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/LYLC1.pdf
[32] C.-T. Lee, Y.-H. Yang, and H. H. Chen, “Automatic transcription of piano music by sparse representation of magnitude spectra,” in Proc. IEEE Int. Conf. Multimedia Expo., Barcelona, Spain, Jul. 2011.
[33] A. de Cheveigne, , D. Wang and G. J. Brown, Eds., “Multiple F0 estimation,” in Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Piscataway, NJ: Wiley-IEEE Press, 2006.
[34] Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, “Unsupervised single-channel music source separation by average harmonic structure modeling,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 766–778, May 2008.
[35] A. Klapuri and M. Davy, Signal Processing Methods for Music Transcription. New York, NY: Springer, 2006.
[36] S. A. Abdallah and M. D. Plumbley, “An independent component analysis approach to automatic music transcription,” in Proc. Audio Eng. Soc. 114th Convention, Amsterdam, The Netherlands, Mar. 2003.
[37] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Advances Neural Inform. Process. Syst., vol. 13. Cambridge, MA: MIT Press, pp. 556–562, 2001.
[38] S.A. Abdallah. “Towards Music Perception by Redundancy Reduction and Unsupervised Learning in Probabilistic Models,” Ph.D. dissertation, Dept. Elect. Eng., King's College, London, UK, 2002.
[39] D. Donoho, “For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution,” Commun. Pure Appl. Math., vol. 59, no. 6, pp. 797–829, 2006.
[40] E. Candes, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math., vol. 59, no. 8, pp. 1207–1223, Aug. 2006.
[41] IBM Corporation and Microsoft Corporation, “Multimedia Programming Interface and Data Specifications 1.0,” Aug. 1991.
[42] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinvesky, “An interiorpoint method for large-scale l1-regularized least squares,” IEEE J. Sel. Topics Signal Process., vol. 1, pp. 606–617, 2007.
[43] M. Brand, “Coupled Hidden Markov Models for Modeling Interacting Processes,” Technical Report 405, MIT Media Lab Vision and Modeling, Nov. 1996.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48330-
dc.description.abstract音高資訊和其他中階特徵被視為跨越語意隔閡的關鍵資訊,因其與人類感知十分密切相關。本篇論文提出了一個極為有效的解決方法來估計鋼琴音樂中的音高資訊,並將此資訊轉為樂譜。實驗結果證實,我們的系統其轉譜準確率目前居世界領先地位。我們將轉譜問題以頻譜的稀疏表示法來解決。我們所提出的系統首先找到有可能出現的音高,再透過解l1最小化問題及隱馬可夫模型,抽取出鋼琴音樂中每個音符的音高及持續時間,完成轉譜。zh_TW
dc.description.abstractLike rhythm and timbre, pitch as a mid-level music feature holds the promise of bridging the well-known semantic gap between low-level features and high-level semantics of music. Pitch estimation is an important first step towards this ultimate goal. In this thesis, we target the extraction of multiple pitch contours from piano music signals. Specifically, the pitch estimation is formulated as a sparse representation problem, in which the feature vector of a piano music segment (or frame) is represented as a linear combination of the feature vectors of individual piano notes. The note candidates of the input piano music segment are determined according to the harmonic structure of piano sounds. Then, the sparse representation problem is solved by l1-regularized minimization. A post-processing method based on hidden Markov models (HMMs) is applied to the resulting sparse vector for accuracy refinement. The system performance is evaluated using l1 classical music recordings of a real piano. The results show that the proposed system outperforms three state-of-the-art systems.en
dc.description.provenanceMade available in DSpace on 2021-06-15T06:52:47Z (GMT). No. of bitstreams: 1
ntu-100-R98922162-1.pdf: 1337154 bytes, checksum: b7ec363f9cb091b5b1fa1860e11d7997 (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
中文摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1 Automatic music transcription 1
1.2 Contributions 5
1.3 Thesis organization 7
Chapter 2 Related Work 8
2.1 Learning-based approaches 8
2.2 Dictionary-based approaches 10
2.3 Sparse representation method 12
Chapter 3 Automatic Transcription of Piano Music 14
3.1 Data preprocessing 14
3.2 Note candidate selection 16
3.3 Computation of sparse representation 19
3.4 Hidden Markov model post-processing 21
Chapter 4 Performance Evaluations 24
4.1 Experiment set-up 24
4.2 Frame-based evaluation 25
4.3 Note-based evaluation 26
4.4 Analysis of system components 28
4.5 Number of atoms 29
Chapter 5 Conclusion 31
5.1 Summary 31
5.2 Future Direction 32
dc.language.isoen
dc.subject稀疏表示法zh_TW
dc.subject自動轉譜zh_TW
dc.subject基頻偵測zh_TW
dc.subject音高偵測zh_TW
dc.subjectmultiple pitch estimationen
dc.subjectF0 estimationen
dc.subjectsparse representationen
dc.subjectl1-regularized minimizationen
dc.subjectautomatic music transcriptionen
dc.title利用稀疏表示法之自動鋼琴轉譜zh_TW
dc.titleAutomatic Transcription of Piano Music by Sparse Representationen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.coadvisor陳宏銘(Homer H. Chen)
dc.contributor.oralexamcommittee陳志宏(Jyh-Horng Chen),張智星(Jyh-Shing Jang),蘇文鈺(Wen-Yu Su)
dc.subject.keyword基頻偵測,音高偵測,稀疏表示法,自動轉譜,zh_TW
dc.subject.keywordautomatic music transcription,F0 estimation,multiple pitch estimation,sparse representation,l1-regularized minimization,en
dc.relation.page37
dc.rights.note有償授權
dc.date.accepted2011-08-30
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  未授權公開取用
1.31 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved