以情緒為基礎的多媒體播放展示

Chin-Han CHen; 陳錦翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29583

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭士康(Shyh-Kang Jeng)
dc.contributor.author	Chin-Han CHen	en
dc.contributor.author	陳錦翰	zh_TW
dc.date.accessioned	2021-06-13T01:11:10Z	-
dc.date.available	2008-07-26
dc.date.copyright	2007-07-26
dc.date.issued	2007
dc.date.submitted	2007-07-20
dc.identifier.citation	[1] Tzanetakis, G., Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 2002. 10(5): p. 293. [2] Scaringella, N., Automatic genre classification of music content. IEEE ASSP Magazine, 2006. 23(2): p. 133. [3] Huron, D., Perceptual and Cognitive Applications in Music Information Retrieval. Perception. 10(1): p. 83. [4] Tien-Lin Wu and S.-K. Jeng, Automatic emotion classification of musical segments, in ICMPC. 2006. [5] Lu, L., X.-S. Hua, H.-J. Zhang, Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio Speech and Language Processing, 2006. 14(1): p. 5. [6] Foote, J., M. Cooper, and A. Girgensohn, Creating Music Videos using Automatic Media Analysis, in ACM Multimedia. 2002. [7] Philippe Mulhem, et al., Pivot vector space approach for audio-video mixing. IEEE Computer Society, 2003. [8] Hevner, K., Experimental Studies of the Elements of Expression in Music. American Journal of Psychology, 1936. 48(2): p. 246. [9] Tzanetakis, G., MARSYAS: a framework for audio analysis. Organised Sound, 2000. 4(3): p. 169. [10] Cortes, C., Support-vector networks. Machine Learning, 1995. 20(3): p. 273. [11] Ahn, L.v., R. Liu, and M. Blum, Peekaboom: a game for locating objects in images, in Proceedings of the SIGCHI conference on Human Factors in computing systems. 2006, ACM Press. [12] Ho, C.-J., T.-H. Chang, and J.Y.-j. Hsu, Photoslap: A multi-player online game for semantic annotation. Twenty-Second Conference on Artificial Intelligence (AAAI-07), July 2007. [13] von Ahn, L., M. Kedia, and M. Blum. Verbosity: a game for collecting common-sense facts. in CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems. 2006: ACM. [14] Ellis, D., et al., The Quest for Ground Truth in Musical Artist Similarity. Proc. ISMIR-02, 2002. [15] Schubert, E., Measurement and Time Series Analysis of. Emotion in Music. 1999, University of New. SouthWales [16] Cabrera, D., PsySound2: Psychoacoustical Software for Macintosh PPC. July 2000. [17] Huang, J., S.R. Kumar, and R. Zabih, An automatic hierarchical image classification scheme, in Proceedings of the sixth ACM international conference on Multimedia. 1998, ACM Press: Bristol, United Kingdom. [18] Vailaya, A., Image classification for content-based indexing. IEEE Transactions on Image Processing, 2001. 10(1): p. 117. [19] Datta, R., et al., Toward bridging the annotation-retrieval gap in image search by a generative modeling approach, in Proceedings of the 14th annual ACM international conference on Multimedia. 2006, ACM Press: Santa Barbara, CA, USA. [20] Xiaofei He, X., Learning a semantic space from user's relevance feedback for image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2003. 13(1): p. 39. [21] Tuceryan, M. and A.K. Jain, Texture Analysis, In The Handbook of Pattern Recognition and Computer Vision, L. F. Pau and P.S.P. Wang (eds.). 1993: World Scientific Publishing Co. [22] Bezdek, J.C., A geometric approach to edge detection. IEEE Transactions on Fuzzy Systems, 1998. 6(1): p. 52. [23] Muyuan, W., Z. Naiyao, and Z. Hancheng, User-Adaptive Music Emotion Recognition. Proc. of 2004 7th International Conference on Signal Processing (ICSP'04), 2004 [24] Wu, T.L. and S.K. Jeng, Automatic classification of musical segments based on emotional expressions. Bulletin of the College of Engineering, National Taiwan University, no. 95, Oct. 2005: p. pp. 73-84. [25] Lang, P.J., M.M. Bradley, and B.N. Cuthbert, International Affective Picture System (IAPS):Technical Manual and Affective Ratings. NIMH Center for the Study of Emotion and Attention, 1997. [26] Foote, J., M. Cooper, and A. Girgensohn, Creating music videos using automatic media analysis, in Proceedings of the tenth ACM international conference on Multimedia. 2002, ACM Press: Juan-les-Pins, France. [27] Lee, S.-H., S.-Z. Wang, and C.-C.J. Kuo, Tempo-based MTV-Style Home Video Authoring. Multimedia Signal Processing, 2005. [28] Hua, X.S., Optimization-based automated home video editing system. IEEE Transactions on Circuits and Systems for Video Technology, 2004. 14(5): p. 572. [29] Chen, J.-C., et al., Tiling slideshow, in Proceedings of the 14th annual ACM international conference on Multimedia. 2006, ACM Press: Santa Barbara, CA, USA. [30] Vronay, D., S. Farnham, and D. J., PhotoStory: preserving emotion in digital photo sharing. Virtual Worlds Group Internal Paper. Microsoft Research, 2001. [31] Tien-Lin Wu and S.-K. Jeng, EXTRACTION OF SEGMENTS OF SIGNIFICANT EMOTIONAL EXPRESSIONS IN MUSIC, in International Workshop on Computer Music and Audio Technology. 2006. [32] J. A. Sloboda and P. A. Juslin, “Psychological Perspectives on Music and Emotion,” chap. 4, in Music and Emotion: Theory and research. 2001: New York: Oxford University Press. [33] Tien-Lin Wu and S.-K. Jeng, REGROUPING OF EXPRESSIVE TERMS FOR MUSICAL QUALIA WOCMAT, 2007. [34] Li, T. and M. Ogihara, Content-based music similarity search and emotion detection. Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on, 2004. 5. [35] Chih-Chung Chang and C.-J. Lin, LIBSVM: a Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. , 2001. [36] Smith, J., Integrated Spatial and Feature Image Systems: Retrieval, in Graduate School of Arts and Sciences. 1997, Columbia University: New York. [37] Lei, Z., L. Fuzong, and Z. Bo, A CBIR method based on color-spatial feature. TENCON 99. Proceedings of the IEEE Region 10 Conference, 1999. 1. [38] Cohen-Or, D., Color harmonization. ACM Transactions on Graphics, 2006. 25(3): p. 624. [39] Rother, C., et al., AutoCollage, in ACM SIGGRAPH 2006 Papers. 2006, ACM Press: Boston, Massachusetts. [40] Chang, T. and C.C.J. Kuo, Texture analysis and classification with tree-structured wavelettransform. Image Processing, IEEE Transactions on, 1993. 2(4): p. 429-441. [41] Grubinger, M., et al., The IAPR TC-12 Benchmark-A New Evaluation Resource for Visual Information Systems. the Proceedings of the International Workshop OntoImage’2006Language Resources for Content-Based Image Retrieval, 2006. [42] Frey, B.J. and D. Dueck, Clustering by Passing Messages Between Data Points. Science, 2007. 315(5814): p. 972. [43] Dixon, S., MIREX 2006 Audio beat tracking evaluation: BeatRoot. MIREX, 2006. [44] Ox, J., Two Performances in the 21st. Century Virtual Color Organ: GridJam and Im Januar am Nil, in Proceedings of the Seventh International Conference on Virtual Systems and Multimedia (VSMM'01). 2001, IEEE Computer Society. [45] Grey, J.M., An Exploration of Musical Timbre, in Ph.D. Dissertation. 1975, Stanford University.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29583	-
dc.description.abstract	一般多媒體播放軟體播放音樂時，常會內建視覺效果來搭配音樂，但是這些效果常是一些與音樂無關的圖樣。本論文提出一個結合視覺與聽覺的多媒體播放秀。主要分成三部分工作；首先，建立一資料庫含有四百張關於旅遊與自然景觀的照片和兩百首音樂片段，並透過網路由約五百人建立這些資料的情緒標籤；第二部份，則在探討如何針對音樂與照片這兩種多媒體形式做情緒的分類，利用其低階的特徵及SVM作為我們分類的工具，建立一個情緒分類的架構；最後一部份描述結合此兩種多媒體所使用的策略，提出一個階層式的方法。在第一階段，一首音樂被分析並根據節拍追蹤演算法拆解成多個基本的單元，接著系統分析每個音樂片段的情緒，相對應情緒的照片則被選為搭配的候選資料庫；第二階段中，我們將音樂與照片結合的問題轉換成一尋找最佳解的問題，並利用Viterbi演算法來處理，音樂中的頻譜重心和頻譜通量和照片中的亮度和對比成為組合的條件，播放秀的結果顯示兩種多媒體的結合引發了使用者的更多共鳴。	zh_TW
dc.description.abstract	Nowadays, the media player software is often featured with some visual effects when listening to music. But most of them are always meaningless patterns to the musical content. In this thesis, a novel method is exhibited here to show a fancy media player show, integrating auditory effect and visual cognition. The work is divided into three parts. A database of 400 travel and nature photos and 200 clips from film soundtracks is constructed with emotion labels by near five hundreds of users through web. This work marks the ground truth for emotion labels. The second part of the work focuses on the process to automatically detect the emotion of the two kinds of media. Digital photos and music are analyzed with low level features and SVM (Support Vector Machine) is utilized to classify the emotion of the media. In the final part, we demonstrate a strategy to combine these two media. A hierarchical methodology is proposed. In the first phase, a complete music is analyzed and segmented according to the beat tracking algorithm. Music emotion detection is invoked to mark each segment and images with the same emotion become the candidate data source. In the second phase, we formulate the music and photo alignment into an optimization problem and a greedy algorithm is used to solve it. Spectral centroid and spectral flux of music and color brightness and contrast of images are used as the features to coordinate. Results of subjective feedbacks show that the users have given good evaluations.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T01:11:10Z (GMT). No. of bitstreams: 1 ntu-96-R94921040-1.pdf: 1070746 bytes, checksum: 2c4a38d875616581991fb52bec537aa5 (MD5) Previous issue date: 2007	en
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Literature Survey 2 1.3 Approach 3 1.4 Organization of Thesis 6 Chapter 2 Background 7 2.1 Ground Truth Collection 7 2.2 Music Features Extraction 9 2.3 Image Features Extraction 10 2.4 Emotion Learning 12 2.5 Media Composition 13 Chapter 3 Dataset Collection 15 3.1 Emotion Database Creation 15 3.1.1 Emotion Checklist 15 3.1.2 Labeling Process 20 3.2 Gathering Statistics 22 Chapter 4 Music and Photo Emotion Recognition 25 4.1 Music Emotion Detection 25 4.1.1 Music features 25 4.1.2 Emotion detection 28 4.1.3 Results 30 4.2 Photo Emotion Detection 30 4.2.1 Image features 31 4.2.2 Emotion detection 32 4.2.3 Results 35 Chapter 5 Media Player Show Composition 37 5.1 Analysis Level 39 5.1.1 Music preprocessing 39 5.1.2 Photo classification 44 5.2 Considered Criteria for Composition 44 5.2.1 Problem Formulation 45 5.2.2 The consistence between music and photos 46 5.2.3 The consistence between photo sequences 49 5.3 Optimization method 50 5.3.1 Use of the Greedy algorithm 50 Chapter 6 Results and Evaluation 55 6.1 The Show Sequences 55 6.2 User Evaluation 57 6.3 Future work 58 Chapter 7 Conclusion 61
dc.language.iso	en
dc.subject	多媒體	zh_TW
dc.subject	情緒	zh_TW
dc.subject	media	en
dc.subject	emotion	en
dc.title	以情緒為基礎的多媒體播放展示	zh_TW
dc.title	Emotion-based Media Player Show	en
dc.type	Thesis
dc.date.schoolyear	95-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張智星(Jyh-Shing Jang),蘇文鈺(Wen Yu Su)
dc.subject.keyword	情緒,多媒體,	zh_TW
dc.subject.keyword	emotion,media,	en
dc.relation.page	67
dc.rights.note	有償授權
dc.date.accepted	2007-07-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 未授權公開取用	1.05 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。