請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48330完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李明穗(Ming-Sui Lee) | |
| dc.contributor.author | Cheng-Te Lee | en |
| dc.contributor.author | 李政德 | zh_TW |
| dc.date.accessioned | 2021-06-15T06:52:47Z | - |
| dc.date.available | 2012-09-18 | |
| dc.date.copyright | 2011-09-18 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-08-30 | |
| dc.identifier.citation | [1] M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-based music information retrieval: current directions and future challenges,” Proc. IEEE, vol. 96, no. 4, pp. 668–696, Apr. 2008.
[2] MIREX: Music Information Retrieval Evaluation eXchange. [Online] http://www.music-ir.org/mirex/. [3] M. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: Stateof-the-art and challenges,” ACM Trans. Multimedia Comput. Commun. Appl. Vol. 2, no. 1, pp. 1–19, 2006. [4] S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press, 1990. [5] H. F. Olson, Music, Physics and Engineering. Dover Publications, 1967. [6] A. de Cheveigne and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, pp. 1917–1930, 2002. [7] G. Peeters, “Music pitch representation by periodicity measures based on combined temporal and spectral representations,” in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, pp. 53–56, May 2006. [8] J. A. Moorer, “On the transcription of musical sound by computer,” Comput. Music J., vol. 1, no. 4, pp. 32–38, 1977. [9] M. Marolt, “A connectionist approach to automatic transcription of polyphonic piano music,” IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439–449, 2004. [10] J. Bello, L. Daudet, and M. Sandler, “Automatic piano transcription using frequency and time-domain information,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp. 2242–2251, Nov. 2006. [11] G. Poliner and D. Ellis, “A discriminative model for polyphonic piano transcription,” EURASIP J. Advances Signal Process., vol. 8, pp. 1–9, 2007. [12] V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Trans. Audio, Speech Lang. Process., vol. 18, no.6, pp. 1643-1654, Aug. 2010. [13] A. Klapuri, “Multiple fundamental frequency estimation based on harmonicity and spectral smoothness,” IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 804–816, Nov. 2003. [14] A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. Int. Conf. Music Inform. Retrieval, Victoria, Canada, pp. 216–221, Oct. 2006. [15] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf [16] C. Yeh, A. Roebel, and X. Rodet, “Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1116–1126, Aug. 2010. [17] Z. Duan, B. Pardo, and C. Zhang, “Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2121–2133, Nov. 2010. [18] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription,” In Proc. Workshop Applicat. Signal Process. Audio Acoust., pp. 177–180, New Paltz, NY, USA, Oct. 2003. [19] S. A. Abdallah and M. D. Plumbley, “Polyphonic transcription by non-negative sparse coding of power spectra,” in Proc. Int. Conf. Music Inform. Retrieval, Barcelona, Spain, pp. 318–325, Oct. 2004. [20] T. Blumensath and M. Davies, “Sparse and shift-invariant representations of music,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 50–57, Jan. 2006. [21] A. Cont, “Realtime multiple pitch observation using sparse non-negative constraints,” in Proc. Int. Conf. Music Inform. Retrieval, Victoria, Canada, pp. 206–212, Oct. 2006. [22] C. Fevotte, N. Bertin, and J.-L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis,” Neural Comput., vol. 21, no. 3, pp. 793–830, Mar. 2009. [23] A. Dessein, A. Cont, and G. Lemaitre, “Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence,” in Proc. Int. Conf. Music Inform. Retrieval, Utrecht, Netherlands, pp. 489–494, Aug. 2010. [24] E. Vincent, N. Bertin, and R. Badeau, “Adaptive harmonic spectral decomposition for multiple pitch estimation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 3, pp. 528–537, Mar. 2010. [25] C. Yeh, N. Bogaards, and A. Roebel, “Synthesized polyphonic music database with verifiable ground truth for multiple-F0 estimation,” in Proc. Int. Conf. Music Inform. Retrieval, Vienna, Austria, pp. 393–398, 2007. [26] MIDI Manufacturers Association, “Complete MIDI 1.0 detailed specification,” Nov. 2001. [27] Charles Dodge, Thomas A. Jerse, Computer Music: Synthesis, Composition and Performance. New York: Schirmer Books, pp. 80–84, 1997. [28] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [29] Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification via sparse representations of auditory temporal modulations,” in Proc. 17th European Signal Process. Conf., Glasgow, Scotland, Aug. 2009. [30] J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-based sparse representations for noise robust automatic speech recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2067–2080, Sep. 2011. [31] C.-T. Lee, Y.-H. Yang, K.-S. Lin and H. H. Chen. (2010). Multiple fundamental frequency estimation of piano signals via sparse representation of Fourier Coefficients. Music Information Retrieval Evaluation eXchange. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/LYLC1.pdf [32] C.-T. Lee, Y.-H. Yang, and H. H. Chen, “Automatic transcription of piano music by sparse representation of magnitude spectra,” in Proc. IEEE Int. Conf. Multimedia Expo., Barcelona, Spain, Jul. 2011. [33] A. de Cheveigne, , D. Wang and G. J. Brown, Eds., “Multiple F0 estimation,” in Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Piscataway, NJ: Wiley-IEEE Press, 2006. [34] Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, “Unsupervised single-channel music source separation by average harmonic structure modeling,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 766–778, May 2008. [35] A. Klapuri and M. Davy, Signal Processing Methods for Music Transcription. New York, NY: Springer, 2006. [36] S. A. Abdallah and M. D. Plumbley, “An independent component analysis approach to automatic music transcription,” in Proc. Audio Eng. Soc. 114th Convention, Amsterdam, The Netherlands, Mar. 2003. [37] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Advances Neural Inform. Process. Syst., vol. 13. Cambridge, MA: MIT Press, pp. 556–562, 2001. [38] S.A. Abdallah. “Towards Music Perception by Redundancy Reduction and Unsupervised Learning in Probabilistic Models,” Ph.D. dissertation, Dept. Elect. Eng., King's College, London, UK, 2002. [39] D. Donoho, “For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution,” Commun. Pure Appl. Math., vol. 59, no. 6, pp. 797–829, 2006. [40] E. Candes, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math., vol. 59, no. 8, pp. 1207–1223, Aug. 2006. [41] IBM Corporation and Microsoft Corporation, “Multimedia Programming Interface and Data Specifications 1.0,” Aug. 1991. [42] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinvesky, “An interiorpoint method for large-scale l1-regularized least squares,” IEEE J. Sel. Topics Signal Process., vol. 1, pp. 606–617, 2007. [43] M. Brand, “Coupled Hidden Markov Models for Modeling Interacting Processes,” Technical Report 405, MIT Media Lab Vision and Modeling, Nov. 1996. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48330 | - |
| dc.description.abstract | 音高資訊和其他中階特徵被視為跨越語意隔閡的關鍵資訊,因其與人類感知十分密切相關。本篇論文提出了一個極為有效的解決方法來估計鋼琴音樂中的音高資訊,並將此資訊轉為樂譜。實驗結果證實,我們的系統其轉譜準確率目前居世界領先地位。我們將轉譜問題以頻譜的稀疏表示法來解決。我們所提出的系統首先找到有可能出現的音高,再透過解l1最小化問題及隱馬可夫模型,抽取出鋼琴音樂中每個音符的音高及持續時間,完成轉譜。 | zh_TW |
| dc.description.abstract | Like rhythm and timbre, pitch as a mid-level music feature holds the promise of bridging the well-known semantic gap between low-level features and high-level semantics of music. Pitch estimation is an important first step towards this ultimate goal. In this thesis, we target the extraction of multiple pitch contours from piano music signals. Specifically, the pitch estimation is formulated as a sparse representation problem, in which the feature vector of a piano music segment (or frame) is represented as a linear combination of the feature vectors of individual piano notes. The note candidates of the input piano music segment are determined according to the harmonic structure of piano sounds. Then, the sparse representation problem is solved by l1-regularized minimization. A post-processing method based on hidden Markov models (HMMs) is applied to the resulting sparse vector for accuracy refinement. The system performance is evaluated using l1 classical music recordings of a real piano. The results show that the proposed system outperforms three state-of-the-art systems. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T06:52:47Z (GMT). No. of bitstreams: 1 ntu-100-R98922162-1.pdf: 1337154 bytes, checksum: b7ec363f9cb091b5b1fa1860e11d7997 (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
誌謝 ii 中文摘要 iii Abstract iv Contents v List of Figures vii List of Tables ix Chapter 1 Introduction 1 1.1 Automatic music transcription 1 1.2 Contributions 5 1.3 Thesis organization 7 Chapter 2 Related Work 8 2.1 Learning-based approaches 8 2.2 Dictionary-based approaches 10 2.3 Sparse representation method 12 Chapter 3 Automatic Transcription of Piano Music 14 3.1 Data preprocessing 14 3.2 Note candidate selection 16 3.3 Computation of sparse representation 19 3.4 Hidden Markov model post-processing 21 Chapter 4 Performance Evaluations 24 4.1 Experiment set-up 24 4.2 Frame-based evaluation 25 4.3 Note-based evaluation 26 4.4 Analysis of system components 28 4.5 Number of atoms 29 Chapter 5 Conclusion 31 5.1 Summary 31 5.2 Future Direction 32 | |
| dc.language.iso | en | |
| dc.subject | 稀疏表示法 | zh_TW |
| dc.subject | 自動轉譜 | zh_TW |
| dc.subject | 基頻偵測 | zh_TW |
| dc.subject | 音高偵測 | zh_TW |
| dc.subject | multiple pitch estimation | en |
| dc.subject | F0 estimation | en |
| dc.subject | sparse representation | en |
| dc.subject | l1-regularized minimization | en |
| dc.subject | automatic music transcription | en |
| dc.title | 利用稀疏表示法之自動鋼琴轉譜 | zh_TW |
| dc.title | Automatic Transcription of Piano Music by Sparse Representation | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 99-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 陳宏銘(Homer H. Chen) | |
| dc.contributor.oralexamcommittee | 陳志宏(Jyh-Horng Chen),張智星(Jyh-Shing Jang),蘇文鈺(Wen-Yu Su) | |
| dc.subject.keyword | 基頻偵測,音高偵測,稀疏表示法,自動轉譜, | zh_TW |
| dc.subject.keyword | automatic music transcription,F0 estimation,multiple pitch estimation,sparse representation,l1-regularized minimization, | en |
| dc.relation.page | 37 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2011-08-30 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf 未授權公開取用 | 1.31 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
