以圖像階層式主題推薦附歌詞的歌曲之研究

Yen-Chen Fu; 傅彥禎

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71198

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬
dc.contributor.author	Yen-Chen Fu	en
dc.contributor.author	傅彥禎	zh_TW
dc.date.accessioned	2021-06-17T04:58:10Z	-
dc.date.available	2020-08-01
dc.date.copyright	2018-08-01
dc.date.issued	2018
dc.date.submitted	2018-07-26
dc.identifier.citation	[1] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cédric Bray. Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22, 2004. [2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recogni- tion (CVPR’05), volume 1, pages 886–893 vol. 1, June 2005. doi: 10.1109/CVPR. 2005.177. [3] Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-fei. Imagenet: A large-scale hierarchical image database. In In CVPR, 2009. [4] Michael Fell and Caroline Sporleder. Lyrics-based analysis and classification of music. In COLING, 2014. [5] AndreaFrome,GregSCorrado,JonShlens,SamyBengio,JeffDean,Marc’Aurelio Ranzato, and Tomas Mikolov. Devise: A deep visual-semantic embedding model. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2121–2129. Curran Associates, Inc., 2013. [6] Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, and Svetlana Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, 2014. [7] K.He,X.Zhang,S.Ren,andJ.Sun.Deepresiduallearningforimagerecognition.In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016. doi: 10.1109/CVPR.2016.90. [8] Xiao Hu, J. Stephen, Downie Andreas, and F. Ehmann. Lyric text mining in music mood classification. In Proceedings of the International Society for Music Informa- tion Retrieval Conference, 2009. [9] Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016. [10] A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (4):664–676, April 2017. ISSN 0162-8828. doi: 10.1109/TPAMI.2016.2598339. [11] Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014. [12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information pro- cessing systems, pages 1097–1105, 2012. [13] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306–351. IEEE Press, 2001. [14] Xuelong Li, Di Hu, and Xiaoqiang Lu. Image2song: Song retrieval via bridging im- age content and lyric words. In IEEE International Conference on Computer Vision (ICCV), 2017. [15] David G. Lowe. Distinctive image features from scale-invariant keypoints. Interna- tional Journal of Computer Vision, 60:91–110, 2004. [16] L. Ma, Z. Lu, L. Shang, and H. Li. Multimodal convolutional neural networks for matching image and sentence. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2623–2631, Dec 2015. doi: 10.1109/ICCV.2015.301. [17] Lin Ma, Zhengdong Lu, and Hang Li. Learning to answer questions from image using convolutional neural network. In AAAI, 2016. [18] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis- tributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Ad- vances in Neural Information Processing Systems 26, pages 3111–3119. Curran As- sociates, Inc., 2013. [19] Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval. 24:694–707, 02 2015. [20] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 1532–1543, 2014. [21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. [22] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object match- ing in videos. In IEEE International Conference on Computer Vision, volume 2, pages 1470–1477, 2003. [23] Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Y. Ng. Zero- shot learning through cross-modal transfer. In NIPS, 2013. [24] Barry E. Stein and M. Alex Meredith. The merging of the senses. The MIT Press, 1993. [25] Menno van Zaanen and Pieter Kanters. Automatic mood classification using tf*idf based on lyrics. In ISMIR, 2010. [26] L. Wang, Y. Li, and S. Lazebnik. Learning deep structure-preserving image-text embeddings. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5005–5013, June 2016.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71198	-
dc.description.abstract	相片是一項保存回憶，紀錄生活片段的重要手段。相片往往帶有各式各樣的情感，若能將視覺感官搭配聽覺感官，將能提升相片中的情緒感受。現代人常將照片分享至社群網路，若將圖片自動配上歌曲，勢必能增加該相片的豐富度及趣味性。 2017年出現第一篇論文提出圖片歌曲配對的問題。然而，該篇論文提出的資料集及方法有幾個缺點:首先，資料集中有許多不合理的圖片歌曲配對。第二，該篇論文在處理圖片時使用物體偵測的方式，會將物體偵測的結果向後面的訓練網路傳遞，而物體偵測的會有一定的錯誤率，會造成錯誤的疊加。此外，我們認為每張圖片有階層式的主題，每張圖片除了有大方向的主題外，能在該主題下細分成子主題，因此每張圖片所配對的歌曲不該只有一首歌曲，而是有順序性的，歌曲排序應先配對到子主題的歌曲，再來配對到大主題的歌曲，最後才是其他主題的歌曲。為了解決上述的問題，我們建立一個階層式圖片歌曲配對的資料集。我們將Instagram上前6000熱門的hashtag作為搜集圖片的主題，並利用Flickr搜尋該主題的照片所配對的tag作為細分子主題的依據。建立大主題、子主題後，我們根據從Flickr、Google image search這兩個平台上搜集圖片，並搜集相對應的歌曲。在這篇論文中，我們利用歌詞作為歌曲的資訊，並著重於圖文配對的方法。本篇論文提出的方法主要分成三個步驟，第一步是圖片特徵的抽取，第二步是歌詞特徵的抽取，第三步是圖片及歌詞的特徵的配對。以往的圖文配對模型無法處理階層式配對的問題，而我們提出的模型能針對階層式分層的圖片做歌曲配對。實驗結果顯示我們的模型有良好的準確率，並且能有效的處理階層式主題配對的問題。	zh_TW
dc.description.abstract	Photo is an important medium to keep memory and record life. Photo usu- ally expresses some feelings. Combine the vision with hearing, the feelings of the photo will be enhanced and strengthen. People usually post thier photo on social networks, if we can match the photo which posted on the social media with songs, the photo will be more expressive and more interesting. The first work of image-song matching problem was proposed in 2017. However, there are several drawbacks in this work. First, there are many unreasonable matching pairs in the dataset. Second, they use object detection to extract the representation of images, which will cause the error propagate to the following networks. Additionaly, we think the image has hierarchical topics, each image has a topic and a specific tag. Hence, every image can match to not only a song, they should first match to songs with same tag, then songs with same topic. To solve the problems, we create a hiearchical image-song matching dataset. We crawl image data on Flickr and Google image search by topics and tags and collect corresponding songs. In the work, we use lyric as the information of the song and put concentrate on matching rather than feature extraction. There are three main steps in our methods, first is to get image representation, second is to get lyric representation, finally we match the image with lyric. We propose two methods on this task and get great results.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:58:10Z (GMT). No. of bitstreams: 1 ntu-107-R05922006-1.pdf: 8698535 bytes, checksum: b48b1c59cf2769e8d4b49ed9fb1d32a5 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	致謝 i 中文摘要 ii Abstract iii Contents iv List of Figures vii List of Tables viii 1 Introduction 1 1.1 Image-basedsongrecommendationsystem . . . . . . . . . . . . . . . . 1 1.2 Motivation.................................. 2 1.3 Idea..................................... 5 2 Related works 6 2.1 Imagerepresentation ............................ 6 2.2 Lyric representation............................. 7 2.2.1 Word representation ........................ 7 2.2.2 Sentence/Document representation. . . . . . . . . . . . . . . . . 7 2.3 Visual-TextAlignment ........................... 8 2.3.1 Keyword matching......................... 8 2.3.2 Sentence matching ......................... 8 2.3.3 Lyric matching ........................... 9 3 Problem Definition 10 3.1 Dataset ................................... 10 3.2 Problem Definition ............................. 11 4 Methodology 12 4.1 Overview .................................. 12 4.2 Extraction of image representation..................... 13 4.3 Extraction of lyric representation...................... 14 4.4 Image-Lyric matching............................ 15 4.4.1 Possible Solution.......................... 17 4.4.2 Hierarchical retrieval model .................... 18 5 Experimental Setups 23 5.1 Datasets................................... 23 5.2 Baseline Methods.............................. 26 5.3 Implementation Details........................... 26 5.4 Evaluation Method............................. 27 6 Experimental Results 28 6.1 BaselineComparison ............................ 28 6.2 ParameterAnalysis ............................. 29 6.2.1 Analysis of λ1 and λ2........................ 29 6.2.2 Analysis of Inter-constraint..................... 30 6.2.3 Analysis of Intra-constraint..................... 32 6.3 Case Study ................................. 33 7 Conclusions and Future Work 34 7.1 Conclusions................................. 34 7.2 FutureWork................................. 34 Bibliography................................. 35
dc.language.iso	en
dc.subject	音樂檢索	zh_TW
dc.subject	音樂推薦系統	zh_TW
dc.subject	階層式圖片分層	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	Music Information Retrieval	en
dc.subject	Music Recommendation System	en
dc.subject	Hierarchical image topic	en
dc.subject	Convolutional Neural Network	en
dc.subject	Natural Language Processing	en
dc.title	以圖像階層式主題推薦附歌詞的歌曲之研究	zh_TW
dc.title	Exploiting hierarchical topics of images to recommend songs with lyrics	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳信希,陳縕儂,楊得年,陳冠宇
dc.subject.keyword	音樂檢索,音樂推薦系統,階層式圖片分層,卷積神經網路,自然語言處理,	zh_TW
dc.subject.keyword	Music Information Retrieval,Music Recommendation System,Hierarchical image topic,Convolutional Neural Network,Natural Language Processing,	en
dc.relation.page	38
dc.identifier.doi	10.6342/NTU201801896
dc.rights.note	有償授權
dc.date.accepted	2018-07-27
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	8.49 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。