Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82363
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳建錦(Chien Chin Chen)
dc.contributor.authorLiang-Wei Loen
dc.contributor.author羅良瑋zh_TW
dc.date.accessioned2022-11-25T07:29:45Z-
dc.date.available2023-08-01
dc.date.copyright2021-11-17
dc.date.issued2021
dc.date.submitted2021-07-08
dc.identifier.citationReferences Bahdanau, D., Cho, K., Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States. Brink, A. D., Pendock, N. E. (1996). Minimum cross-entropy threshold selection. Pattern recognition, 29(1), 179-188. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 4171-4186). Duprez, C., Christophe, V., Rimé, B., Congard, A., Antoine, P. (2015). Motives for the social sharing of an emotional experience. Journal of Social and Personal Relationships, 32, 757-787. Erkan, G., Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457-479. Fu, C.-Y., Lee, J., Bansal, M., Berg, A. (2017). Video highlight prediction using audience chat reactions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 972-978). Han, H.-K., Huang, Y.-C., Chen, C. C. (2019). A deep learning model for extracting live streaming video highlights using audience messages. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference (pp. 75-81). He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961-2969). He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). Hearst, M. A. (1998). Support vector machines. IEEE Intelligent Systems, 13, 18–28. Hochreiter, S., Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735-1780. Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097-1105. Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T. (2018). Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Transactions on Multimedia, 20(10), 2693-2705. Lee, Y. J., Ghosh, J., Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1346-1353). Loshchilov, I., Hutter, F. (2019). Decoupled weight decay regularization. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Nepal, S., Srinivasan, U., Reynolds, G. (2001). Automatic detection of 'Goal' segments in basketball videos. In Proceedings of the ninth ACM International Conference on Multimedia (pp. 261-269). Otsuka, I., Nakane, K., Divakaran, A., Hatanaka, K., Ogawa, M. (2005). A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Transactions on Consumer Electronics, 51, 112-116. Ozsoy, M. G., Alpaslan, F. N., Cicekli, I. (2011). Text summarization using latent semantic analysis. Journal of Information Science, 37, 405-417. Rani, S., Kumar, M. (2020). Social media video summarization using multi-Visual features and Kohnen's Self Organizing Map. Information Processing Management, 57(3), 102190. Ren, S., He, K., Girshick, R., Sun, J. (2016). Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. Ringer, C., Nicolaou, M. A. (2018). Deep unsupervised multi-view detection of video game stream highlights. In Proceedings of the 13th International Conference on the Foundations of Digital Games (pp. 1-6). Rochan, M., Reddy, M. K. K., Ye, L., Wang, Y. (2020, August). Adaptive video highlight detection by learning from user history. In European Conference on Computer Vision (pp. 261-278). Springer, Cham. Rui, Y., Gupta, A., Acero, A. (2000). Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM International Conference on Multimedia (pp. 105-115). Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z. (2007). Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial intelligence, IJCAI (Vol. 7, pp. 2862-2867). Solomon, M. R., Marshall, G. W., Stuart, E. W. (2008). Marketing : real people, real choices. Upper Saddle River (N.J.): Pearson/Prentice Hall. Sun, M., Farhadi, A., Seitz, S. (2014). Ranking domain-specific highlights by analyzing edited videos. In European Conference on Computer Vision (pp. 787-802). Tas, O., Kiyani, F. (2007). A survey automatic text summarization. PressAcademia Procedia, 5, 205-213. Tjondronegoro, D., Chen, Y.-P. P., Pham, B. (2004). Highlights for more complete sports video summarization. IEEE multimedia, 11, 22-37. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4489-4497). Wang, Z., Zhou, J., Ma, J., Li, J., Ai, J., Yang, Y. (2020). Discovering attractive segments in the user-generated video streams. Information Processing Management, 57(1), 102130. Wei, Z., Wang, B., Hoai, M., Zhang, J., Lin, Z., Shen, X., Měch, R., Samaras, D. (2018). Sequence-to-segments networks for segment detection. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 3511-3520). Wong, T. T. (2015). Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition, 48(9), 2839-2846. Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K. (2019). Less is more: learning highlight detection from video duration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1258-1267). Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B. (2015). Unsupervised extraction of video highlights via robust recurrent auto-encoders. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4633-4641). Yao, T., Mei, T., Rui, Y. (2016). Highlight detection with pairwise deep ranking for first-person video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 982-990). Yeh, J.-Y., Ke, H.-R., Yang, W.-P., Meng, I.-H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing Management, 41, 75-95. Zhang, B., Dou, W., Chen, L. (2006). Combining short and long term audio features for TV sports highlight detection. In European Conference on Information Retrieval (pp. 472-475). Zhang, Y., Gao, J., Yang, X., Liu, C., Li, Y., Xu, C. (2020). Find objects and focus on highlights: mining object semantics for video highlight detection via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 12902-12909).
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82363-
dc.description.abstract近年來隨著談話型的串流影片越來越普及,直播平台漸漸的成為人們吸收新資訊的另一個管道。然而,談話型的直播影片通常較為冗長,使得大部分的觀眾無法全程參與直播,為了吸引觀眾加入直播串留影片甚至進一步成為訂閱者,提供精華片段對直播主和直播平台而言就變得格外重要。近年來有許多影片精華擷取相關的研究,其中多數研究使用影像上的資訊作為特徵再進一步擷取影片精華片段,然而這樣的方式並不適用於談話型的直播影片,原因在於談話型直播影片的精華與影像畫面並沒有直接相關,而是與直播主的言談以及觀眾的反應有關。在此篇論文中,我們使用了直播主的言談以及觀眾的留言作為模型輸入,提出了針對談話型直播影片精華擷取的模型,並進一步利用了位置的特徵增強和專注力機制強化特徵向量。此外,我們也透過自調節權重網路給予兩個文字分流預測分數不同的權重增強模型的表現。實驗證明我們的方法在現實生活的資料籍上,表現比起近年提出的幾個知名的精華擷取模型來得更好。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-25T07:29:45Z (GMT). No. of bitstreams: 1
U0001-0707202111504700.pdf: 2322749 bytes, checksum: 7d1d81c7488b7266a174971b8dc62d8d (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents口試委員會審定書…………………………………………………………………………………… i 誌謝………………………………………………………………………………………………………………… ii 中文摘要……………………………………………………………………………………………………… iii Abstract…………………………………………………………………………………………………… iv 1 Introduction…………………………………………………………………………………… 1 2 Related work…………………………………………………………………………………… 5 2.1 Supervised Highlight Extraction……………………………………………………………… 6 2.2 Unsupervised Highlight Extraction………………………………………………………… 7 2.3 Video Highlight Extraction using Textual Information……………………………… 9 3 Methodology…………………………………………………………………………………………………………………………………………………………………… 12 3.1 Video Preprocessing and Discourse Segmentation ………………………………………… 13 3.2 Streamer Discourse Embedding and Position Enrichment…………………………… 15 3.3 Viewer Message Embedding and Attention………………………………………………………………… 16 3.4 Highlight Extraction and Self-Adaptive Weighting Scheme…………………… 19 3.5 Model Training and Highlight Extraction Loss………………………………………………… 20 4 Experiment…………………………………………………………………………………………………………………………………………………………… 22 4.1 Evaluation Dataset and Metrics ………………………………………………………………… 22 4.2 Effect of System Components…………………………………………………………………………… 25 4.3 Comparison with Other Highlight Extraction Methods……………… 29 5 Conclusion…………………………………………………………………………………………………………………………………………………… 33 Reference…………………………………………………………………………………………………………………………………………………………………35
dc.language.isoen
dc.title以雙流注意力機制模型擷取直播影片精華zh_TW
dc.titleTwo-Stream Attention Model for Highlight Extractionen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳孟彰(Hsin-Tsai Liu),張詠淳(Chih-Yang Tseng)
dc.subject.keyword深度學習,自然語言處理,影片精華擷取,直播串流影片,文字訊息處理,zh_TW
dc.subject.keyworddeep learning,natural language processing,highlight extraction,streaming video,textual information processing,en
dc.relation.page39
dc.identifier.doi10.6342/NTU202101320
dc.rights.note同意授權(全球公開)
dc.date.accepted2021-07-08
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
dc.date.embargo-lift2023-08-01-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
U0001-0707202111504700.pdf2.27 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved