Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70661
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor施吉昇(Chi-Sheng Shih)
dc.contributor.authorHSIN-I HUANGen
dc.contributor.author黃馨誼zh_TW
dc.date.accessioned2021-06-17T04:34:06Z-
dc.date.available2020-09-29
dc.date.copyright2020-09-29
dc.date.issued2020
dc.date.submitted2020-09-02
dc.identifier.citation[1] J. R. Smith, D. Joshi, B. Huet, W. H. Hsu, and J. Cota, “Harnessing a.i. for augment-ing creativity: Application to movie trailer creation,” Proceedings of the 25th ACM international conference on Multimedia, 2017.
[2] J. Choi, T. Oh, and I. S. Kweon, “Contextually customized video summaries via natural language,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 1718–1726.
[3] “Self-learning ai for process automation and data extraction.” [Online]. Available: https://www.vilynx.com/
[4] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014, cite arxiv:1406.2661. [Online]. Available: http://arxiv.org/abs/1406.2661
[5] B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial lstm networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[6] T. Fu, S. Tai, and H. Chen, “Attentive and adversarial learning for video summarization,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1579–1587.
[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[8] V. Gandhi, R. Ronfard, and M. Gleicher, “Multi-clip video editing from a single viewpoint,” Proceedings of the 11th European Conference on Visual Media Production - CVMP 14, 2014.
[9] E. Jain, Y. Sheikh, A. Shamir, and J. Hodgins, “Gaze-driven video re-editing,” ACM Transactions on Graphics, vol. 34, no. 2, p. 1–12, 2015.
[10] K. K. Rachavarapu, M. Kumar, V. Gandhi, and R. Subramanian, “Watch to edit: Video retargeting using gaze,” Computer Graphics Forum, vol. 37, no. 2, p. 205–215, 2018.
[11] M. Kumar, V. Gandhi, R. Ronfard, and M. Gleicher, “Zooming on all actors: Automatic focus context split screen video generation,” Computer Graphics Forum, vol. 36, no. 2, p. 455–465, 2017.
[12] M. Saini, R. Gadde, S. Yan, and W. T. Ooi, “Movimash: Online mobile video mashup,” Proceedings of the 20th ACM international conference on Multimedia - MM 12, 2012.
[13] D. T. D. Nguyen, A. Carlier, W. T. Ooi, and V. Charvillat, “Jiku director 2.0,” Proceedings of the ACM International Conference on Multimedia - MM 14, 2014.
[14] Z. Cao, G. H. Martinez, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields.” IEEE transactions on pattern analysis and machine intelligence, 2019.
[15] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 2234–2242. [Online]. Available: http://papers.nips.cc/paper/6125-improved-techniques-for-training-gans.pdf
[16] T. Hazan, G. Papandreou, and D. Tarlow, Perturbations, optimization, and statistics. The MIT Press, 2016.
[17] L. V. D. Maaten and G. E. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
[18] S. Guan and M. H. Loew, “Measures to evaluate generative adversarial networks based on direct analysis of generated images,” ArXiv, vol. abs/2002.12345, 2020.
[19] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, p. 600–612, 2004.
[20] “Dynamic time warping,” Discrete-Time Processing of Speech Signals, 2009.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70661-
dc.description.abstract經驗豐富的影片編輯人員會使用不同的編輯技術,包括攝像機的移動,鏡頭的類型和鏡頭的構圖,以創造出不同語義的影片,從而傳遞不同的含義給觀賞者。在影片的製作過程中,影片本身的內容很重要,但如何將每個鏡頭組合起來的方式也很重要。我們的目標是訓練一個模型去學習如何編輯出符合攝像規則的影片。我們提出了一個深度生成模型,其中的生成器和鑑別器都是單向的LSTM網絡,這個模型用於生成影片編輯所使用的鏡頭轉換的序列。製作不同種類的影片時,我們會使用不同類型的鏡頭轉換方式,而我們的模型是從兩個不同類型的音樂節目中學習其所使用的不同鏡頭轉換的方式,其中一個是韓國音樂節目,另一個是中國音樂節目。通過結合不同類型的鏡頭和攝像機的移動,我們的AI影片編輯器可以為觀眾帶來了各式各樣的觀看體驗。而最後我們從三個方面(包括創造力,繼承性和多樣性)衡量用於影片編輯時的生成鏡頭轉換序列的質量。
LSTM-GAN生成的序列的質量在衡量創意性的方面 (值0到1)平均比馬爾可夫鏈生成的好0.35,但比LSTM生成的略差0.0204。在衡量繼承的方面 (值-1到1)平均分別比馬爾可夫鏈和LSTM生成的好0.0007和0.0223。在衡量多樣性方面 (值0到1)分別比馬爾可夫鏈和LSTM生成的質量好0.2957和0.37305。權衡這三方面來說,LSTM-GAN生成的序列都比馬爾可夫鏈生成的好,而與LSTM生成的比較時,創意性的方面較LSTM生成的略差,不過在繼承方面與多樣性方面都比LSTM生成的好。總結來說,在同時確保創造力,繼承和多樣性下 LSTM-GAN生成的序列的質量比馬爾可夫鏈或LSTM生成的質量更好。
zh_TW
dc.description.abstractExperienced video editors use different editing techniques, including camera movement, types of shots, and shot compositions to create different video semantics delivering different messages to the viewers. In the video production process, the content of the video is important, but so is the way to put it together. Our goal is to train a model to learn how to edit the video that meets the videography rules. We propose a deep generative model with both the generator and discriminator are unidirectional LSTM networks to generate the sequences of shot transitions for video editing. Different kinds of productions use different types of editing transitions and our model learns two types of editing transitions from two different productions. One is the performance stages of Korean music programs, and another is Chinese music programs. By combining different types of shots and camera movements, our AI video editor brings various viewing experiences to the viewers. We measure the quality of the generated shot sequences for video editing from three aspects, including creativity, inheritance, and diversity. The quality (ensuring creativity, inheritance, and diversity at the same time) of the synthetic sequences generated by LSTM-GAN are better than those generated by the baseline model (Markov chain or LSTM).
On average the quality of the sequence generated by LSTM-GAN is 0.35 better than that generated by the Markov chain in terms of measuring creativity (the value is [0,1]), but slightly worse than that generated by LSTM by 0.0204. In terms of measuring inheritance (the value is [-1,1]), it is 0.0007 and 0.0223 better than Markov chain and LSTM, respectively. In terms of measuring diversity (the value is [0,1]), the quality generated by the Markov chain and LSTM is 0.2957 and 0.37305 better than those generated by LSTM, respectively. The sequences generated by LSTM-GAN are better than those generated by Markov chains, and when compared with those generated by LSTM, the creativity is slightly worse than those generated by LSTM, but they are better than those generated by LSTM in terms of inheritance and diversity. In summary, the quality of the sequence generated by LSTM-GAN is better than the quality generated by the Markov chain or LSTM while ensuring creativity, inheritance, and diversity at the same time.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T04:34:06Z (GMT). No. of bitstreams: 1
U0001-2608202016164400.pdf: 5628326 bytes, checksum: ca8f77bb42a1e8925d29ddc91e84e634 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents口 試 委 員 會 審 定 書 i
致 謝 ii
摘 要 iii
Abstract iv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background and Related Work 3
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Artificial Intelligence Video Editing . . . . . . . . . . . . . . . . 3
2.1.2 Intelligent Customized Video Production . . . . . . . . . . . . . 3
2.1.3 Generative Adversarial Networks . . . . . . . . . . . . . . . . . 4
2.1.4 Adversarial Long Short-Term Memory Networks . . . . . . . . . 4
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Multi-clip Video Editing . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Intelligent Video Mashup . . . . . . . . . . . . . . . . . . . . . . 6
3 System Architecture and Problem Definition 8
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Design and Implementation 10
4.1 Training Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1 Analysis of the type of shot of each frame . . . . . . . . . . . . . 10
4.1.2 Two cinematography rules to be learned . . . . . . . . . . . . . . 13
4.2 LSTM-GAN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Architecture of the LSTM-GAN model . . . . . . . . . . . . . . 13
4.2.2 Generator network . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.3 Discriminator network . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Performance Evaluation 19
5.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.2 Evaluation based on the direct analysis of generated sequences . . 20
6 Conclusion 23
Bibliography 24
dc.language.isozh-TW
dc.title使用基於長短期記憶的生成對抗網路實現自動智慧影片編輯zh_TW
dc.titleAutomated Intelligent Video Editing Using LSTM-GANen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee洪士灝(Shih-Hao Hung),葉彌妍(Mi-Yen Yeh)
dc.subject.keyword生成對抗網路,長短期記憶,智慧影片編輯,影片編輯語言,zh_TW
dc.subject.keywordGenerative adversarial network,long short-term memory,intelligent video editing,language of video editing,en
dc.relation.page25
dc.identifier.doi10.6342/NTU202004171
dc.rights.note有償授權
dc.date.accepted2020-09-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-2608202016164400.pdf
  目前未授權公開取用
5.5 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved