藉由視覺注意力來處理視頻摘要

Yen-Ting Liu; 劉彥廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74032

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強(Yu-Chiang Wang)
dc.contributor.author	Yen-Ting Liu	en
dc.contributor.author	劉彥廷	zh_TW
dc.date.accessioned	2021-06-17T08:17:26Z	-
dc.date.available	2019-08-20
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-14
dc.identifier.citation	[1] W.-L. Chao, B. Gong, K. Grauman, and F. Sha, “Large-margin determinantal point processes.” in UAI, 2015, pp. 191–200. 1 [2] M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool, “Creating summaries from user videos,” in European conference on computer vision. Springer, 2014, pp. 505–520. 1, 17 [3] M. Gygli, H. Grabner, and L. Van Gool, “Video summarization by learning submodular mixtures of objectives,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3090–3098. 1 [4] K. Zhang, W.-L. Chao, F. Sha, and K. Grauman, “Summary transfer: Exemplar-based subset selection for video summarization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1059–1067. 1 [5] ——, “Video summarization with long short-term memory,” in Proceedings of European Conference on Computer Vision (ECCV), 2016. 1, 5, 18, 19, 20, 25, 26 [6] Y. Jung, D. Cho, D. Kim, S. Woo, and I. S. Kweon,“Discriminative feature learning for unsupervised video summarization,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018. 1, 5, 20 [7] B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial lstm networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 5, 6, 18, 19, 20, 25, 26 [8] K. Zhang, K. Grauman, and F. Sha, “Retrospective encoders for video summarization,” in Proceedings of European Conference on Computer Vision (ECCV), 2018. 1, 5, 6, 25 [9] K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018. 1, 5, 14, 17, 18, 19, 20, 25, 26 [10] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. 1, 5 [11] J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4694–4702. 1 [12] B. Zhao, X. Li, and X. Lu, “Hierarchical recurrent neural network for video summarization,” in Proceedings of the 25th ACM international conference on Multimedia. ACM, 2017, pp. 863–871. 1, 5, 19, 25 [13] ——, “Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7405–7414. 1, 5, 19, 25 [14] M. Rochan, L. Ye, and Y. Wang, “Video summarization using fully convolutional sequence networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 347–363. 1, 5, 18, 19, 20, 25, 26 [15] T.-J. Fu, S.-H. Tai, and H.-T. Chen, “Attentive and adversarial learning for video summarization,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019, pp. 1579–1587. 2, 6 [16] Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attentionbased encoder-decoder networks,” arXiv preprint arXiv:1708.09545, 2017. 2, 6, 18, 19, 25 [17] Z. Ji, S. Hajar Sadeghi, A. Vasileios, M. Dorothy, and R. Paolo, “Summarizing videos with attention,” Proceedings of the AAAI Conference on Artificial Intelligence Workshops (AAAI workshops), 2018. 2, 6, 19, 25, 26 [18] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in Neural Information Processing Systems, 2015, pp. 2692–2700. 6 [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008. 7 [20] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, 2015, pp. 2048–2057. 9 [21] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web videos using titles,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5179–5187. 17, 26 [22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. 19 [23] B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2513–2520. 26 [24] E. R. J. H. Mayu Otani, Yuta Nakashima, “Rethinking the evaluation of video summaries,” in CVPR, 2019. 22, 26
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74032	-
dc.description.abstract	視頻摘要主要是從一部影片藉由挑選出真正重要的片段來縮短影片長度，到目前為止，視頻摘要仍然是一項在電腦視覺領域中值得研究的題目。在本篇論文中，我們提出了一個新的架構試圖去解決包含各式內容的影片。我們提出的多樣化專注層面的視頻摘要模型。	zh_TW
dc.description.abstract	Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:17:26Z (GMT). No. of bitstreams: 1 ntu-108-R06942114-1.pdf: 7705935 bytes, checksum: 6ad12a123c4a31dda04f5952cafca982 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Abstract i List of Figures v List of Tables vii 1 Introduction 1 2 RelatedWork 5 3 Method 7 3.1 M2VSA for Video Spatial-Temporal Attention . . . . . . . . . . . 9 3.2 Unlabeled Video Summarization with Visual Concept Consistency 12 3.3 From Supervised Towards Unsupervised Learning for Video Summarization 13 4 Experiment 17 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Protocols and Implementation Details . . . . . . . . . . . . . . . 18 4.3 Quantitative Comparisons . . . . . . . . . . . . . . . . . . . . . . 19 4.3.1 Comparison with supervised approaches . . . . . . . . . . 19 4.3.2 Comparisons with unsupervised approaches . . . . . . . . 20 4.3.3 Analysis for semi-supervised settings . . . . . . . . . . . 20 4.3.4 Performance comparisons and ablation studies. . . . . . . 22 4.4 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Conclusion 27 Reference 29
dc.language.iso	en
dc.subject	電腦視覺	zh_TW
dc.subject	視頻摘要	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Computer Vision	en
dc.subject	Deep Learning	en
dc.subject	Video Summarization	en
dc.title	藉由視覺注意力來處理視頻摘要	zh_TW
dc.title	Transforming Visual Attention into Video Summarization	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	邱維辰(Wei-Chen Chiu),林彥宇(Yen-Yu Lin)
dc.subject.keyword	視頻摘要,深度學習,電腦視覺,	zh_TW
dc.subject.keyword	Computer Vision,Deep Learning,Video Summarization,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU201901661
dc.rights.note	有償授權
dc.date.accepted	2019-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	7.53 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。