以雙流注意力機制模型擷取直播影片精華

Liang-Wei Lo; 羅良瑋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82363

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳建錦(Chien Chin Chen)
dc.contributor.author	Liang-Wei Lo	en
dc.contributor.author	羅良瑋	zh_TW
dc.date.accessioned	2022-11-25T07:29:45Z	-
dc.date.available	2023-08-01
dc.date.copyright	2021-11-17
dc.date.issued	2021
dc.date.submitted	2021-07-08
dc.identifier.citation	References Bahdanau, D., Cho, K., Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States. Brink, A. D., Pendock, N. E. (1996). Minimum cross-entropy threshold selection. Pattern recognition, 29(1), 179-188. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 4171-4186). Duprez, C., Christophe, V., Rimé, B., Congard, A., Antoine, P. (2015). Motives for the social sharing of an emotional experience. Journal of Social and Personal Relationships, 32, 757-787. Erkan, G., Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457-479. Fu, C.-Y., Lee, J., Bansal, M., Berg, A. (2017). Video highlight prediction using audience chat reactions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 972-978). Han, H.-K., Huang, Y.-C., Chen, C. C. (2019). A deep learning model for extracting live streaming video highlights using audience messages. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference (pp. 75-81). He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961-2969). He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). Hearst, M. A. (1998). Support vector machines. IEEE Intelligent Systems, 13, 18–28. Hochreiter, S., Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735-1780. Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097-1105. Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T. (2018). Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Transactions on Multimedia, 20(10), 2693-2705. Lee, Y. J., Ghosh, J., Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1346-1353). Loshchilov, I., Hutter, F. (2019). Decoupled weight decay regularization. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Nepal, S., Srinivasan, U., Reynolds, G. (2001). Automatic detection of 'Goal' segments in basketball videos. In Proceedings of the ninth ACM International Conference on Multimedia (pp. 261-269). Otsuka, I., Nakane, K., Divakaran, A., Hatanaka, K., Ogawa, M. (2005). A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Transactions on Consumer Electronics, 51, 112-116. Ozsoy, M. G., Alpaslan, F. N., Cicekli, I. (2011). Text summarization using latent semantic analysis. Journal of Information Science, 37, 405-417. Rani, S., Kumar, M. (2020). Social media video summarization using multi-Visual features and Kohnen's Self Organizing Map. Information Processing Management, 57(3), 102190. Ren, S., He, K., Girshick, R., Sun, J. (2016). Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. Ringer, C., Nicolaou, M. A. (2018). Deep unsupervised multi-view detection of video game stream highlights. In Proceedings of the 13th International Conference on the Foundations of Digital Games (pp. 1-6). Rochan, M., Reddy, M. K. K., Ye, L., Wang, Y. (2020, August). Adaptive video highlight detection by learning from user history. In European Conference on Computer Vision (pp. 261-278). Springer, Cham. Rui, Y., Gupta, A., Acero, A. (2000). Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM International Conference on Multimedia (pp. 105-115). Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z. (2007). Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial intelligence, IJCAI (Vol. 7, pp. 2862-2867). Solomon, M. R., Marshall, G. W., Stuart, E. W. (2008). Marketing : real people, real choices. Upper Saddle River (N.J.): Pearson/Prentice Hall. Sun, M., Farhadi, A., Seitz, S. (2014). Ranking domain-specific highlights by analyzing edited videos. In European Conference on Computer Vision (pp. 787-802). Tas, O., Kiyani, F. (2007). A survey automatic text summarization. PressAcademia Procedia, 5, 205-213. Tjondronegoro, D., Chen, Y.-P. P., Pham, B. (2004). Highlights for more complete sports video summarization. IEEE multimedia, 11, 22-37. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4489-4497). Wang, Z., Zhou, J., Ma, J., Li, J., Ai, J., Yang, Y. (2020). Discovering attractive segments in the user-generated video streams. Information Processing Management, 57(1), 102130. Wei, Z., Wang, B., Hoai, M., Zhang, J., Lin, Z., Shen, X., Měch, R., Samaras, D. (2018). Sequence-to-segments networks for segment detection. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 3511-3520). Wong, T. T. (2015). Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition, 48(9), 2839-2846. Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K. (2019). Less is more: learning highlight detection from video duration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1258-1267). Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B. (2015). Unsupervised extraction of video highlights via robust recurrent auto-encoders. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4633-4641). Yao, T., Mei, T., Rui, Y. (2016). Highlight detection with pairwise deep ranking for first-person video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 982-990). Yeh, J.-Y., Ke, H.-R., Yang, W.-P., Meng, I.-H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing Management, 41, 75-95. Zhang, B., Dou, W., Chen, L. (2006). Combining short and long term audio features for TV sports highlight detection. In European Conference on Information Retrieval (pp. 472-475). Zhang, Y., Gao, J., Yang, X., Liu, C., Li, Y., Xu, C. (2020). Find objects and focus on highlights: mining object semantics for video highlight detection via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 12902-12909).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82363	-
dc.description.abstract	近年來隨著談話型的串流影片越來越普及，直播平台漸漸的成為人們吸收新資訊的另一個管道。然而，談話型的直播影片通常較為冗長，使得大部分的觀眾無法全程參與直播，為了吸引觀眾加入直播串留影片甚至進一步成為訂閱者，提供精華片段對直播主和直播平台而言就變得格外重要。近年來有許多影片精華擷取相關的研究，其中多數研究使用影像上的資訊作為特徵再進一步擷取影片精華片段，然而這樣的方式並不適用於談話型的直播影片，原因在於談話型直播影片的精華與影像畫面並沒有直接相關，而是與直播主的言談以及觀眾的反應有關。在此篇論文中，我們使用了直播主的言談以及觀眾的留言作為模型輸入，提出了針對談話型直播影片精華擷取的模型，並進一步利用了位置的特徵增強和專注力機制強化特徵向量。此外，我們也透過自調節權重網路給予兩個文字分流預測分數不同的權重增強模型的表現。實驗證明我們的方法在現實生活的資料籍上，表現比起近年提出的幾個知名的精華擷取模型來得更好。	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-25T07:29:45Z (GMT). No. of bitstreams: 1 U0001-0707202111504700.pdf: 2322749 bytes, checksum: 7d1d81c7488b7266a174971b8dc62d8d (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	口試委員會審定書…………………………………………………………………………………… i 誌謝………………………………………………………………………………………………………………… ii 中文摘要……………………………………………………………………………………………………… iii Abstract…………………………………………………………………………………………………… iv 1 Introduction…………………………………………………………………………………… 1 2 Related work…………………………………………………………………………………… 5 2.1 Supervised Highlight Extraction……………………………………………………………… 6 2.2 Unsupervised Highlight Extraction………………………………………………………… 7 2.3 Video Highlight Extraction using Textual Information……………………………… 9 3 Methodology…………………………………………………………………………………………………………………………………………………………………… 12 3.1 Video Preprocessing and Discourse Segmentation ………………………………………… 13 3.2 Streamer Discourse Embedding and Position Enrichment…………………………… 15 3.3 Viewer Message Embedding and Attention………………………………………………………………… 16 3.4 Highlight Extraction and Self-Adaptive Weighting Scheme…………………… 19 3.5 Model Training and Highlight Extraction Loss………………………………………………… 20 4 Experiment…………………………………………………………………………………………………………………………………………………………… 22 4.1 Evaluation Dataset and Metrics ………………………………………………………………… 22 4.2 Effect of System Components…………………………………………………………………………… 25 4.3 Comparison with Other Highlight Extraction Methods……………… 29 5 Conclusion…………………………………………………………………………………………………………………………………………………… 33 Reference…………………………………………………………………………………………………………………………………………………………………35
dc.language.iso	en
dc.title	以雙流注意力機制模型擷取直播影片精華	zh_TW
dc.title	Two-Stream Attention Model for Highlight Extraction	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳孟彰(Hsin-Tsai Liu),張詠淳(Chih-Yang Tseng)
dc.subject.keyword	深度學習,自然語言處理,影片精華擷取,直播串流影片,文字訊息處理,	zh_TW
dc.subject.keyword	deep learning,natural language processing,highlight extraction,streaming video,textual information processing,	en
dc.relation.page	39
dc.identifier.doi	10.6342/NTU202101320
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2021-07-08
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
dc.date.embargo-lift	2023-08-01	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
U0001-0707202111504700.pdf	2.27 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。