利用跨多媒體擬似配對資料於影像群生成食物評論

Ya-Ting Lin; 林雅婷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66929

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民
dc.contributor.author	Ya-Ting Lin	en
dc.contributor.author	林雅婷	zh_TW
dc.date.accessioned	2021-06-17T01:15:02Z	-
dc.date.available	2022-08-24
dc.date.copyright	2017-08-24
dc.date.issued	2017
dc.date.submitted	2017-08-14
dc.identifier.citation	[1] L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43–50, ACM, 2006. [2] B. Pang, L. Lee, et al., “Opinion mining and sentiment analysis,” Foundations and Trends® in Information Retrieval, vol. 2, no. 1–2, pp. 1–135, 2008. [3] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, C. Potts, et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol. 1631, p. 1642, 2013. [4] D. Tang, B. Qin, and T. Liu, “Document modeling with gated recurrent neural network for sentiment classification.,” in EMNLP, pp. 1422–1432, 2015. [5] H. Wang, Y. Lu, and C. Zhai, “Latent aspect rating analysis on review text data: a rating regression approach,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 783–792, ACM, 2010. [6] K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Doha, Qatar), pp. 1724–1734, Association for Computational Linguistics, Oct. 2014. [7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [8] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015. [9] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International Conference on Machine Learning, pp. 2048–2057, 2015. [10] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “Image captioning with semantic attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659, 2016. [11] S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, “Sequence to sequence-video to text,” in Proceedings of the IEEE international conference on computer vision, pp. 4534–4542, 2015. [12] T.-H. K. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. Girshick, X. He, P. Kohli, D. Batra, C. L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell, “Visual storytelling,” June 2016. [13] Y. Liu, J. Fu, T. Mei, and C. W. Chen, “Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks.,” in AAAI, pp. 1445–1452, 2017. [14] C. C. Park and G. Kim, “Expressing an image stream with a sequence of natural sentences,” in Advances in Neural Information Processing Systems, pp. 73–81, 2015. [15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015. [16] L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101 – mining discriminative components with random forests,” in European Conference on Computer Vision, 2014. [17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems (NIPS), 2015. [18] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “Liblinear: A library for large linear classification,” Journal of machine learning research, vol. 9, no. Aug, pp. 1871–1874, 2008. [19] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” science, vol. 315, no. 5814, pp. 972–976, 2007.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66929	-
dc.description.abstract	由於近年來評論資訊的重要性，如何提供重要且富含資訊量的評論資訊是一個相當重要的課題。為此，我們提出了一個結構性評論的想法，也就是將一段評論搭配上相對應的影像來提供更為資訊量豐富的評論，並蒐集了大量的資料集。然而，這些文字評論跟影像資料都是相當雜的，因此濾掉相對不重要的資訊也是一個不可缺的步驟。此外，我們利用聚類分析而非分類方法來分類不同食物，因為現今所可取得的資料集並沒有適合我們問題的資料來做分類。濾掉不重要的資訊後，我們提出一個利用擬似配對資料做兩階段訓練的方法來避免跨域問題並使用不同的融合方法解決以多張影像為輸入的問題。透過我們的方法，產生的評論會相對穩定且合理並於食物正確性方面有相對36%的進步。同時，我們利用評估文字品質的BLEU來評估我們的方法也有相對的進步。	zh_TW
dc.description.abstract	Due to the importance of review information recently, how to provide more important and informative reviews to users is an essential problem to resolve. Therefore, we propose a novel idea named structural review which aims to match the review with corresponding images for more informative information and collect a large dataset. However, images are noisy even the text reviews, so it is also an essential process to filter the relatively useless information. Besides, we use clustering method to cluster the images with the same food type rather than classification for there is no suitable food dataset for our task to train a classifier. After filtering the noises, we propose a two-stage training method with pseudo pairs to avoid cross-domain issue and utilize different fusion methods for the input with multiple images. With our method, the quality of generated reviews is more stable and it also performs better with food accuracy with about 36% relative improvement. Meanwhile, our method also performs better with BLEU metric which measures the quality of the text.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:15:02Z (GMT). No. of bitstreams: 1 ntu-106-R04944009-1.pdf: 42237744 bytes, checksum: ecb5d7567a33f22f322ebb178f76b0aa (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	摘要 iii Abstract iv 1 Introduction 1 1.1 Importance of reviews . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Challenges of structural reviews . . . . . . . . . . . . . . . . . . . . . 2 1.3 Dataset and proposed method . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Review related research . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Single image captioning . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Multiple images captioning . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Cross-Media Dataset 7 3.1 Review data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Image data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Combine cross-media data . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Proposed Method 10 4.1 Input images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1.1 Filtering images with people . . . . . . . . . . . . . . . . . . . 10 4.1.2 Filtering non-food images . . . . . . . . . . . . . . . . . . . . 11 4.2 Target reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 First training stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3.1 Testing on multiple images . . . . . . . . . . . . . . . . . . . 13 4.4 Second training stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Experiments 16 5.1 Standard measurements . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 Supporting evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.3 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Conclusions and Future Work 22 Bibliography 23
dc.language.iso	en
dc.title	利用跨多媒體擬似配對資料於影像群生成食物評論	zh_TW
dc.title	Food Review Generation for a Set of Images by Leveraging Cross-Media Pseudo Pairs	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳文進,李國徵
dc.subject.keyword	評論,兩階段訓練,擬似配對,	zh_TW
dc.subject.keyword	Review,Two-stage Training,Pseudo Pair,	en
dc.relation.page	25
dc.identifier.doi	10.6342/NTU201703180
dc.rights.note	有償授權
dc.date.accepted	2017-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	41.25 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。