劇本導向影像摘要方法之研究

Yu-Jen Cheng; 鄭友仁

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38888

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳信希
dc.contributor.author	Yu-Jen Cheng	en
dc.contributor.author	鄭友仁	zh_TW
dc.date.accessioned	2021-06-13T16:50:54Z	-
dc.date.available	2005-07-04
dc.date.copyright	2005-07-04
dc.date.issued	2005
dc.date.submitted	2005-06-22
dc.identifier.citation	[1] G. Ahanger. Techniques for automatic digital video composition. PhD thesis, College of Engineering, Boston university. [2] C. Becchetti and L. R. Ricotti. Speech recognition, John Wiley & Sons, 1999. [3] D. Beeferman, A. Berger, and J. Lafferty. “Statistical models of text segmentation.” Machine Learning, vol. 34, no. 1-3, pp. 177-210, 1999. [4] A. D. Bimbo. Visual information retrieval, Morgan Kaufmann, 1999. [5] Y. L. Chang, W. Zeng, I. Kamel, and R. Alonso. “Integrated image and speech analysis for content-based video indexing.” Proceedings of IEEE International Conference on Multimedia Computing and Systems, pp. 306-313, Hiroshima, Japan, June, 1996. [6] H. H. Chen and S. J. Yan. “Dealing with very long Chinese sentences in a robust parsing system.” Proceedings of National Science Council, Part A: Physical Science and Engineering, vol. 19, no. 5, pp. 398-407, 1995. [7] M.G. Christel, M. A. Smith, C. R. Taylor and D. B, Winkler. “Evolving video skims into useful multimedia abstractions.” Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 171-178, Los Angeles, United States, 1998. [8] J. M. Corridoni, A. Del Bimbo, D. Lucarella, and H. Wenxue. “Multi-perspective navigation of movies.” Journal of Visual Languages and Computing, vol. 7, no. 4, pp. 445-466, 1996. [9] D. DeMenthon, V. Kobla, and D. Doermann. “Video summarization by curve simplification.” Proceedings of the 6th ACM international conference on Multimedia, pp. 211-218, Bristol, Great Britain, 1998. [10] Y. Gong and X. Liu. “Video summarization using singular value decomposition” Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 174-180, Hilton Head Island, United States, 2000. [11] Y. Gong and X. Liu. “Video summarization with minimal visual content redundancies.” Proceedings of 2001 IEEE International Conference on Image Processing, vol. 3, pp. 362-365, Thessaloniki, Greece, October, 2001. [12] A. Hanjalic, R. L. Lagendijk, and J. Biemond. “Automated high-level movie segmentation for advanced video-retrieval systems.” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 580-588, June, 1999. [13] A. Hanjalic and H. J. Zhang. “An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis.” IEEE transactions of Circuits and System for Video Technology, vol. 9, no. 8, pp. 1280-1289, 1999. [14] C. N. Li and S. A. Thompson. Mandarin Chinese – a functional reference grammar, Berkeley: University of California press, 1981. [15] Y. Li, T. Zhang and D. Tretter. “An overview of video abstracting techniques.” HP Laboratory Technical Report HPL-2001-191, 2001. [16] R. Lienhart, S. Pfeiffer, and W. Effelsberg. “Video abstracting.” Communications of the ACM, vol. 40, no. 11, pp. 55-62, 1997. [17] S. Lu, I. King, and M. R. Lyu. “Video summarization by video structure analysis and graph optimization.” Proceedings of 2004 IEEE International Conference on Multimedia and Expo, vol. 3, pp. 1959-1962, Taipei, Taiwan, June, 2004. [18] S. Lu, I. King, and M. R. Lyu. “A novel video summarization framework for document preparation and archival applications.” Proceedings of 2005 IEEE Aerospace Conference, pp. 1-10, Montana, March, 2005. [19] Y. F. Ma, L. Lu, H. J. Zhang, and M. Li. “A user attention model for video summarization.” Proceedings of the 10th ACM International conference on Multimedia, pp. 533-542, Juan-les-Pins, France, December, 2002. [20] K. Nagao, S. Ohira, and M. Yoneoka. “Annotation-based multimedia summarization and translation.” Proceedings of the 19th International Conference on Computational Linguistics, vol. 2, pp. 702-708, Taipei, Taiwan, August 2002. [21] J. Nam and A. H. Tewfik. “Dynamic video summarization and visualization.” Proceedings of the 7th ACM International conference on Multimedia, pp. 53-56, Orlando, United States, 1999. [22] C. W. Ngo, Y. F. Ma, and H. J. Zhang. “Automatic video summarization by graph modeling.” Proceedings of the 9th IEEE International Conference on Computer Vision, vol. 1, pp. 104-109, Nice, France, October, 2003. [23] J. Y. Pan, H. Yang and C. Faloutsos, “MMSS: multi-modal story-oriented video summarization.” Proceedings of the 4th IEEE International Conference on Data Mining, pp. 491-494, Brighton, United Kingdom, November, 2004. [24] L. Pevzner and M. Hearst. “A critique and improvement of an evaluation metric for text segmentation.” Computational Linguistics, vol. 28, no. 1, pp. 19-36, 2002. [25] S. Pfeiffer, R. Lienhart, S. Fischer, and W. Effelsberg. “Abstracting digital movies automatically.” Journal of Visual Communication and Image Representation, vol. 7, no. 4, pp. 345-353, 1996. [26] M. A. Smith and T. Kanade. “Video skimming and characterization through the combination of image and language understanding.” IEEE International Workshop on Content-based Access of Image and Video Databases, pp. 61-70, Bombay, India January, 1998. [27] H. Sundaram, L. Xie, and S. F. Chang. “A utility framework for the automatic generation of audio-visual skims.” Proceedings of the 10th ACM International conference on Multimedia, pp. 189-198, Juan-les-Pins, France, December, 2002. [28] D. Swanberg, C. F. Shu, and R. Jain. “Knowledge guided parsing in video databases.” Proceedings of SPIE Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 13-24, 1993. [29] R. Tibshirani, G. Walther, and T. Hastie. “Estimating the number of clusters in a data set via the gap statistic.” Journal of the Royal Statistical Society, vol. B, no. 63, pp. 411-423, 2001. [30] S. Uchihashi, J. Foote, A. Girgensohn and J. Boreczky. “Video manga: generating semantically meaningful video summaries.” Proceedings of the 7th ACM international conference on Multimedia, pp. 383-92, Orlando, United States, October, 1999. [31] I. Yahiaoui, B. Merialdo, and B. Heut. “Automatic video summarization.” 2002 International Thyrrhenian workshop on digital communications, Palazzo dei Congressi, Italy, September, 2002. [32] M. Yeung, B. L. Yeo, and B. Liu. “Extracting story units from long programs for video browsing and navigation.” Proceedings of IEEE International Conference on Multimedia Computing and Systems, pp. 296-305, Hiroshima, Japan, June, 1996. [33] X. Zhu and X. Wu. “Sequential association mining for video summarization.” Proceedings of 2003 International Conference Multimedia and Expo, vol. 3, pp. 333-336, July, 2003. [34] X. Zhu, X. Wu, A. K. Elmagarmid, Z. Feng and L. Wu. “Video data mining: semantic indexing and event detection from the association perspective.” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 665-677, 2005.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38888	-
dc.description.abstract	自動影像摘要近年來引起廣泛的討論。相關研究可細分為兩類，一為以靜態圖片為基礎之影像摘要，一為動態影片之影像摘要。近年來因電腦計算能力及媒體儲存容量快速增長，動態影片摘要可快速產生。然而先前並無劇本導向影像摘要之研究及多部影片摘要之相關實驗。本論文將探討針對使用者需求所產生之多部影片影像摘要。我們採用語言方面之資訊和場景變化之資訊把影片分段。接著資訊檢索系統根據使用者需求找出相關之影片片段。影片sub-shot分群之結果用來衡量一影片片段視覺上新穎之程度。而結合這兩種分數可使我們選擇資訊上相關及視覺上生動的影片。為了達到視覺上的平順，片段的重新排列亦被我們所考慮。我們分析了每個片段的影像內容。根據導演的節奏及一些經驗法則，我們提出了一可達到視覺平順之演算法。在實驗中證明在我們提出的四個演算法中，此種方法最能有效達到視覺平順，而不失去相關的資訊。	zh_TW
dc.description.abstract	Automatic video summarization methods have attracted research attentions for a long time. Previous works can be classified into two categories: keyframe-based video summarization and dynamic video summarization. Recently the rapid growth of computing power and storage capacity make it possible to generate dynamic video summaries much faster. However there is no previous work on generating video summaries according to specific user information needs and experiments on a multi-video environment. In this thesis we will explore the problem of script-based video summarization, in which the information needs are contained in a user script. We first use linguistic information and shot boundary detection results to divide videos into segments, which are the foundation stones of building the summary. Then information retrieval system retrieves relevant segments using the user script as queries, and captions of the segments as documents. After sub-shot clustering, visual importance scores are evaluated for each segment based on the clustering results of its constituent sub-shots. The relevant score and the visual importance score are combined to select both informative and vivid segments. To achieve better coherence, segment re-ordering is applied. We analyze the audio and video content, finding the editing rhythm and editing heuristics, and then develop an algorithm for visual coherence. Experiments show that this algorithm has better coherence compared with other text-based algorithm, without loss of informativeness.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T16:50:54Z (GMT). No. of bitstreams: 1 ntu-94-R92922101-1.pdf: 642722 bytes, checksum: 7ce63525ae10d4ea572c3a2f91ab7a82 (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	Table of Contents Chapter 1 Introduction 1 1.1. Motivation 1 1.2. Related works 1 1.3. System architecture 4 1.3. Organization of the thesis 6 Chapter 2 Video Segmentation 7 2.1. Introduction 7 2.2. Video preprocessing 8 2.3. Shot boundary detection 8 2.4. Linking elements in Mandarin Chinese 9 2.5. The combining algorithm 12 2.6. Evaluation 14 2.6.1. Golden standard 14 2.6.2. Evaluation metric 14 2.6.3. Experimant results 16 2.6.4. Discussion 19 2.7. Conclusion 21 Chapter 3 Segment Selection 23 3.1. Text-relevant summaries 23 3.2. Visual-rich summaries 24 3.2.1. Introduction 24 3.2.2. Frame sampling 24 3.2.3. Sub-shot definition 24 3.2.4. Keyframe selection 26 3.2.5. Sub-shot clustering 26 3.2.6. Clustering result analysis 27 3.2.7. Visual importance score 30 3.3. Conclusion 31 Chapter 4 Video Summary Coherence 33 4.1. Introduction 33 4.2. Text-coherent summarization 34 4.3. Visual-coherent summarization 35 4.3.1. Visual dynamics modeling 35 4.3.2. Audio dynamics modeling 35 4.3.3. Editing rhythm mining 35 4.3.4. Algorithm 37 4.4. Experiment setup 38 4.5. Evaluation 40 4.6. Discussion 43 Chapter 5 Conclusion and Future Works 45 5.1. Conclusion 45 5.2. Future works 46
dc.language.iso	en
dc.subject	影像摘要	zh_TW
dc.subject	Video	en
dc.subject	Summarization	en
dc.title	劇本導向影像摘要方法之研究	zh_TW
dc.title	Script-based Multi-video Summarization	en
dc.type	Thesis
dc.date.schoolyear	93-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	簡立峰,柯淑津,陳光華
dc.subject.keyword	影像摘要,	zh_TW
dc.subject.keyword	Video,Summarization,	en
dc.relation.page	50
dc.rights.note	有償授權
dc.date.accepted	2005-06-23
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	627.66 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。