Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74032
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王鈺強(Yu-Chiang Wang)
dc.contributor.authorYen-Ting Liuen
dc.contributor.author劉彥廷zh_TW
dc.date.accessioned2021-06-17T08:17:26Z-
dc.date.available2019-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-14
dc.identifier.citation[1] W.-L. Chao, B. Gong, K. Grauman, and F. Sha, “Large-margin determinantal point processes.” in UAI, 2015, pp. 191–200. 1
[2] M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool, “Creating summaries from user videos,” in European conference on computer vision. Springer, 2014, pp. 505–520. 1, 17
[3] M. Gygli, H. Grabner, and L. Van Gool, “Video summarization by learning submodular mixtures of objectives,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3090–3098. 1
[4] K. Zhang, W.-L. Chao, F. Sha, and K. Grauman, “Summary transfer: Exemplar-based subset selection for video summarization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1059–1067. 1
[5] ——, “Video summarization with long short-term memory,” in Proceedings of European Conference on Computer Vision (ECCV), 2016. 1, 5, 18, 19, 20, 25, 26
[6] Y. Jung, D. Cho, D. Kim, S. Woo, and I. S. Kweon,“Discriminative feature learning for unsupervised video summarization,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018. 1, 5, 20
[7] B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial lstm networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 5, 6, 18, 19, 20, 25, 26
[8] K. Zhang, K. Grauman, and F. Sha, “Retrospective encoders for video summarization,” in Proceedings of European Conference on Computer Vision (ECCV), 2018. 1, 5, 6, 25
[9] K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018. 1, 5, 14, 17, 18, 19, 20, 25, 26
[10] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. 1, 5
[11] J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4694–4702. 1
[12] B. Zhao, X. Li, and X. Lu, “Hierarchical recurrent neural network for video summarization,” in Proceedings of the 25th ACM international conference on Multimedia. ACM, 2017, pp. 863–871. 1, 5, 19, 25
[13] ——, “Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7405–7414. 1, 5, 19, 25
[14] M. Rochan, L. Ye, and Y. Wang, “Video summarization using fully convolutional sequence networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 347–363. 1, 5, 18, 19, 20, 25, 26
[15] T.-J. Fu, S.-H. Tai, and H.-T. Chen, “Attentive and adversarial learning for video summarization,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019, pp. 1579–1587. 2, 6
[16] Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attentionbased encoder-decoder networks,” arXiv preprint arXiv:1708.09545, 2017. 2, 6, 18, 19, 25
[17] Z. Ji, S. Hajar Sadeghi, A. Vasileios, M. Dorothy, and R. Paolo, “Summarizing videos with attention,” Proceedings of the AAAI Conference on Artificial Intelligence Workshops (AAAI workshops), 2018. 2, 6, 19, 25, 26
[18] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in Neural Information Processing Systems, 2015, pp. 2692–2700. 6
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008. 7
[20] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, 2015, pp. 2048–2057. 9
[21] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web videos using titles,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5179–5187. 17, 26
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. 19
[23] B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2513–2520. 26
[24] E. R. J. H. Mayu Otani, Yuta Nakashima, “Rethinking the evaluation of video summaries,” in CVPR, 2019. 22, 26
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74032-
dc.description.abstract視頻摘要主要是從一部影片藉由挑選出真正重要的片段來縮短影片長度,到目前為止,視頻摘要仍然是一項在電腦視覺領域中值得研究的題目。在本篇論文中,我們提出了一個新的架構試圖去解決包含各式內容的影片。我們提出的多樣化專注層面的視頻摘要模型。zh_TW
dc.description.abstractVideo summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches.en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:17:26Z (GMT). No. of bitstreams: 1
ntu-108-R06942114-1.pdf: 7705935 bytes, checksum: 6ad12a123c4a31dda04f5952cafca982 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAbstract i
List of Figures v
List of Tables vii
1 Introduction 1
2 RelatedWork 5
3 Method 7
3.1 M2VSA for Video Spatial-Temporal Attention . . . . . . . . . . . 9
3.2 Unlabeled Video Summarization with Visual Concept Consistency 12
3.3 From Supervised Towards Unsupervised Learning for Video Summarization 13
4 Experiment 17
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Protocols and Implementation Details . . . . . . . . . . . . . . . 18
4.3 Quantitative Comparisons . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 Comparison with supervised approaches . . . . . . . . . . 19
4.3.2 Comparisons with unsupervised approaches . . . . . . . . 20
4.3.3 Analysis for semi-supervised settings . . . . . . . . . . . 20
4.3.4 Performance comparisons and ablation studies. . . . . . . 22
4.4 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Conclusion 27
Reference 29
dc.language.isoen
dc.subject電腦視覺zh_TW
dc.subject視頻摘要zh_TW
dc.subject深度學習zh_TW
dc.subjectComputer Visionen
dc.subjectDeep Learningen
dc.subjectVideo Summarizationen
dc.title藉由視覺注意力來處理視頻摘要zh_TW
dc.titleTransforming Visual Attention into Video Summarizationen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee邱維辰(Wei-Chen Chiu),林彥宇(Yen-Yu Lin)
dc.subject.keyword視頻摘要,深度學習,電腦視覺,zh_TW
dc.subject.keywordComputer Vision,Deep Learning,Video Summarization,en
dc.relation.page32
dc.identifier.doi10.6342/NTU201901661
dc.rights.note有償授權
dc.date.accepted2019-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
7.53 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved