請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46565完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 莊永裕(Yung-Yu Chuang) | |
| dc.contributor.author | Ming-Fang Weng | en |
| dc.contributor.author | 翁明昉 | zh_TW |
| dc.date.accessioned | 2021-06-15T05:15:55Z | - |
| dc.date.available | 2010-07-30 | |
| dc.date.copyright | 2010-07-30 | |
| dc.date.issued | 2010 | |
| dc.date.submitted | 2010-07-21 | |
| dc.identifier.citation | [1] W. H. Adams, G. Iyengar, C.-Y. Lin, M. R. Naphade, C. Neti, H. J. Nock, and J. R. Smith. Semantic indexing of multimedia content using visual, audio and text cues. Eurasip Journal on Applied Signal Processing, 2003(2):170-185, 2003.
[2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. of the International Conference on Very Large Data Bases, pages 487-499, 1994. [3] A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Te_si_c, and T. Volkmer. IBM research TRECVID-2005 video retrieval system. In Online Proceedings of TREC Video Retrieval Evaluation Workshop, 2005. [4] S. Ayache and G. Qu_enot. Video corpus annotation using active learning. In Proc. of the European Conference on Information Retrieval (ECIR), pages 187-198, 2008. [5] Y. Aytar, M. Shah, and J. Luo. Utilizing semantic word similarity measures for video retrieval. In Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8, 2008. [6] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 1 edition, 2007. [7] M. G. Brown, J. T. Foote, G. J. F. Jones, K. Sparck Jones, and S. J. Young. Automatic content-based retrieval of broadcast news. In Proc. of the ACM International Conference on Multimedia (MM), pages 35-43, 1995. [8] J. Cao, Y. Lan, J. Li, Q. Li, X. Li, F. Lin, X. Liu, L. Luo, W. Peng, D. Wang, H. Wang, Z. Wang, Z. Xiang, J. Yuan, B. Zhang, J. Zhang, L. Zhang, X. Zhang, and W. Zheng. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Online Proceedings of TREC Video Retrieval Evaluation Workshop, 2006. [9] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: From pairwise approach to listwise approach. In Proc. of the International Conference on Machine Learning (ICML), pages 129-136, 2007. [10] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [11] S.-F. Chang, J. He, Y.-G. Jiang, E. E. Khoury, C.-W. Ngo, A. Yanagawa, and E. Zavesky. Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search. In Online Proceedings of TREC Video Retrieval Evaluation Workshop, 2008. [12] S.-F. Chang, L. S. Kennedy, and E. Zavesky. Columbia University's semantic video search engine. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 643-643, 2007. [13] S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 4, pages 1205-1208, 2007. [14] M. G. Christel and E. G. Hauptmann. The use and utility of high-level semantic features in video retrieval. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 134-144, 2005. [15] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2):1-60, 2008. [16] L. Duan, I. Tsang, D. Xu, and S. Maybank. Domain transfer SVM for video concept detection. pages 1375-1381, 2009. [17] S. Ebadollahi, L. Xie, S. fu Chang, and J. R. Smith. Visual event detection using multi-dimensional concept dynamics. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), pages 881-884, 2006. [18] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Workshop on Generative-Model Based Vision, 2004. [19] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28(9):23-32, 1995. [20] A. Gupta and R. Jain. Visual information retrieval. Commun. ACM, 40(5):70-79, 1997. [21] A. Hampapur. Semantic video indexing: approach and issues. ACM SIGMOD Rec., 28(1):32-39, 1999. [22] A. Hauptmann, R. Yan, and W.-H. Lin. How many high-level concepts will fill the semantic gap in news video retrieval? In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 627-634, 2007. [23] A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Trans. Multimedia, 9(5):958-966, 2007. [24] W. Hsu, T. Mei, and R. Yan. Knowledge discovery over community-sharing media: From signal to intelligence. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), pages 1448-1451, 2009. [25] W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking through random walk over document-level context graph. In Proc. of the ACM International Conference on Multimedia (MM), pages 971-980, 2007. [26] W. Jiang, S.-F. Chang, and A. Loui. Active context-based concept fusion with partial user labels. In Proc. of the IEEE International Conference on Image Processing (ICIP), pages 2917-2920, 2006. [27] W. Jiang, S.-F. Chang, and A. C. Loui. Context-based concept fusion with boosted conditional random _elds. In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 1, pages 949-952, 2007. [28] W. Jiang, E. Zavesky, S.-F. Chang, and A. Loui. Cross-domain learning methods for high-level visual concept classi_cation. pages 161-164, 2008. [29] Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), 2007. [30] Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo. Domain adaptive semantic diffusion for large scale context-based video annotation. In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2009. [31] Y.-G. Jiang, A. Yanagawa, S.-F. Chang, and C.-W. Ngo. CU-VIREO374: Fusing Columbia374 and VIREO374 for large scale semantic concept detection. Technical report, Columbia University, 2008. [32] Y.-G. Jiang, J. Yang, C.-W. Ngo, and A. G. Hauptmann. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. Multimedia, 12(1):42-53, 2010. [33] J. R. Kender and M. R. Naphade. Visual concepts for news story tracking: Analyzing and exploiting the NIST TRECVID video annotation experiment. In Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1174-1181, 2005. [34] L. Kennedy and A. Hauptmann. LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, 2006. [35] L. S. Kennedy and S.-F. Chang. A reranking approach for context-based concept fusion in video indexing and retrieval. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 333-340, 2007. [36] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30-37, 2009. [37] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl., 2(1):1-19, 2006. [38] W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. of the IEEE International Conference on Data Mining (ICDM), pages 369-376, 2001. [39] J. Liu, W. Lai, X.-S. Hua, Y. Huang, and S. Li. Video search re-ranking via multi-graph propagation. In Proc. of the ACM International Conference on Multimedia (MM), pages 208-217, 2007. [40] K.-H. Liu, M.-F.Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimedia, 10(2):240-251, 2008. [41] Y. Liu, T. Mei, and X.-S. Hua. CrowdReranking: Exploring multiple search engines for visual search reranking. In Proc. of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 500-507, 2009. [42] C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. [43] M. R. Naphade and T. S. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia, 3(1):141-151, 2001. [44] M. R. Naphade and T. S. Huang. Extracting semantics from audiovisual content: The final frontier in multimedia retrievel. IEEE Trans. Neural Netw., 13(4):793-810, 2002. [45] M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. Technical report, IBM Research, 2005. [46] M. R. Naphade, I. V. Kozintsev, and T. S. Huang. Factor graph framework for semantic video indexing. IEEE Trans. Circuits Syst. Video Technol., 12(1):40-52, 2002. [47] M. R. Naphade and J. R. Smith. On the detection of semantic concepts at TRECVID. In Proc. of the ACM International Conference on Multimedia (MM), pages 660-667, 2004. [48] M. R. Naphade, J. R. Smith, J. Te_si_c, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE Multimedia, 13(3):86-91, 2006. [49] A. P. Natsev, A. Haubold, J. Te_si_c, L. Xie, and R. Yan. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proc. of the ACM International Conference on Multimedia (MM), pages 991-1000, 2007. [50] S.-Y. Neo, J. Zhao, M.-Y. Kan, and T.-S. Chua. Video retrieval using high level features: Exploiting query matching and con_dence-based weighting. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 143-152, 2006. [51] A. Pentland, R. W. Picard, and S. Sclaro_. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision, 18(3):233-254, 1996. [52] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell., 22(10):1090-1104, 2000. [53] J. Platt. Probabilistic outputs for support vector machines and comparison to regularize likelihood methods. In A. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, editors, Advances in Large Margin Classi_ers, pages 61-74, 2000. [54] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992. [55] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, M. Wang, and H.-J. Zhang. Correlative multilabel video annotation with temporal kernels. ACM Trans. Multimedia Comput. Commun. Appl., 5(1):1-27, 2008. [56] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In Proc. of the ACM International Conference on Multimedia (MM), pages 17-26, 2007. [57] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1 edition, 1993. [58] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In Proc. of the International Conference on Machine Learning (ICML), pages 713-719, 2005. [59] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000. [60] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance Feedback: A power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol., 8(5):644-655, 1998. [61] S. Russell and P. Norvig. Arti_cial Intelligence: A Modern Approach. Prentice Hall, 2 edition, 2003. [62] J. Sivic and A. Zisserman. E_cient visual search for objects in videos. Proceedings of the IEEE, 96(4):548-566, Apr. 2008. [63] A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In Proc. of the ACM International Workshop on Multimedia Information Retrieval (MIR), pages 321-330, 2006. [64] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349-1380, 2000. [65] J. R. Smith and S.-F. Chang. VisualSEEk: A fully automated content-based image query system. In Proc. of the ACM International Conference on Multimedia (MM), pages 87-98, 1996. [66] J. R. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12-20, 1997. [67] J. R. Smith, M. R. Naphade, and A. Natsev. Multimedia semantic indexing using model vectors. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), volume 2, pages 445-448, 2003. [68] S. W. Smoliar and H. Zhang. Content-based video indexing and retrieval. IEEE Multimedia, 1(2):62-72, 1994. [69] C. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring. Adding semantics to detectors for video retrieval. IEEE Trans. Multimedia, 9(5):975-986, 2007. [70] C. G. M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5-35, 2005. [71] C. G. M. Snoek and M. Worring. Concept-based video retrieval. Foundations and Trends in Information Retrieval, 2(4):215-322, 2009. [72] C. G. M. Snoek, M.Worring, J. M. Geusebroek, D. C. Koelma, F. J. Seinstra, and A. W. M. Smeulders. The semantic path_nder: Using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1678-1689, 2006. [73] C. G. M. Snoek, M. Worring, D. C. Koelma, and A. W. M. Smeulders. A learned lexicon-driven paradigm for interactive video retrieval. IEEE Trans. Multimedia, 9(2):280-292. [74] C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. Early versus late fusion in semantic video analysis. In Proc. of the ACM International Conference on Multimedia (MM), pages 399-402, 2005. [75] C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proc. of the ACM International Conference on Multimedia (MM), pages 421-430, 2006. [76] J. Tang, X.-S. Hua, T. Mei, G.-J. Qi, and X. Wu. Video annotation based on temporally consistent gaussian random _eld. Electronics Letters, 43(8):448-449, 2007. [77] S. Tang, J.-T. Li, M. Li, C. Xie, Y.-Z. Liu, K. Tao, and S.-X. Xu. TRECVID 2008 high-level feature extraction by MCG-ICT-CAS. In Online Proceedings of TREC Video Retrieval Evaluation Workshop, 2008. [78] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. Bayesian video search reranking. In Proc. of the ACM International Conference on Multimedia (MM), pages 131-140, 2008. [79] B. Tseng, C.-Y. Lin, M. R. Naphade, A. Natsev, and J. R. Smith. Normalized classi_er fusion for semantic visual concept detection. In Proc. of the IEEE International Conference on Image Processing (ICIP), volume 2, pages 535-538, 2003. [80] G. Wang and D. Forsyth. Object image retrieval by exploiting online knowledge resources. In Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8, 2008. [81] J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. In Proc. of the International Conference on Advances in Visual Information Systems, pages 360-371, 2000. [82] X.-Y. Wei, Y.-G. Jiang, and C.-W. Ngo. Exploring inter-concept relationship with context space for semantic video indexing. In Proc. of the ACM International Conference on Image and Video Retrieval (CIVR), pages 1-8, 2009. [83] X.-Y. Wei, C.-W. Ngo, and Y.-G. Jiang. Selection of concept detectors for video search by ontology-enriched semantic spaces. IEEE Trans. Multimedia, 10(6):1085-1096, 2008. [84] M.-F. Weng and Y.-Y. Chuang. Multi-cue fusion for semantic video indexing. In Proc. of the ACM International Conference on Multimedia (MM), pages 71-80, 2008. [85] M.Worring, C. Snoek, O. de Rooij, G. Nguyen, and A. Smeulders. The Mediamill semantic video search engine. In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 4, pages 1213-1216, 2007. [86] L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Flickr distance. In Proc. of the ACM International Conference on Multimedia (MM), pages 31-40, 2008. [87] L. Xie and S.-F. Chang. Pattern mining in visual concept streams. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), pages 297-300, 2006. [88] R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), pages 301-304, 2006. [89] A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia University's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, 2007. [90] J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In Proc. of the ACM International Workshop on Multimedia Information Retrieval (MIR), pages 33-42, 2006. [91] J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In Proc. of the ACM International Conference on Multimedia (MM), pages 188-197, 2007. [92] Y.-H. Yang, W. H. Hsu, and H. H. Chen. Online reranking via ordinal informative concepts for context fusion in concept detection and video search. IEEE Trans. Circuits Syst. Video Technol., 19(12):1880-1890, 2009. [93] Y. H. Yang, P. T. Wu, C. W. Lee, K. H. Lin, W. H. Hsu, and H. H. Chen. ContextSeer: Context search and recommendation at query time for shared consumer photos. In Proc. of the ACM International Conference on Multimedia (MM), pages 199-208, 2008. [94] E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. of the ACM International Conference on Information and Knowledge Management (CIKM), pages 102-111, 2006. [95] X. Yin and J. Han. CPAR: Classi_cation based on predictive association rules. In Proc. of the SIAM International Conference on Data Mining, 2003. [96] F. Zhuang, P. Luo, H. Xiong, Y. Xiong, Q. He, and Z. Shi. Cross-domain learning from multiple sources: A consensus regularization perspective. IEEE Trans. Knowl. Data Eng., 99(PrePrints), 2009. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46565 | - |
| dc.description.abstract | 由於數位相機、數位攝影機日益普及,大幅提昇了影片取得的便利性。另一方面,影片分享平台像 Youtube 也蓬勃發展,更加速了影片的複製、傳播和交換。在這種情況下,迅速增加的大規模影片衍生出查詢與搜尋的迫切需求。近年來,影片查詢系統普遍採用以語意概念為主的搜尋架構;然而,此項技術卻極度依賴影片之語意標記的準確度。為了提升影片標記的正確性,本論文針對三個關鍵議題進行研究。首先,我們探討如何挖掘語意概念在影片中的語境結構及時間關聯。其次,我們探討如何整合語境結構、時間關聯及語意概念偵測器等多重線索。最後,我們探討如何降低影片範疇轉移所帶的來的衝擊和影響。具體來說,本論文提出了四種不同的框架,探討從使用者提供之標記資料、偵測器產生之預測資料以及同時使用這兩種資料,進行語境結構及時間關聯的資料探索,進而用以提升影片語意標記的準確性。
我們提出的第一種語境結構及時間關聯的整合框架稱為「規則導向的後置過濾框架」。在此架構中,我們一方面以關聯性規則探勘演算法建立不同語意概念同時出現於同一分鏡的語境結構關係;另一方面,我們以統計評量的方法建立同一語意概念同時出現於不同分鏡的時間關聯關係。我們的實驗結果顯示,語境結構關係可提升原始標記準確度約百分之三,而時間關聯關係約可增加百分之十五。除此之外,我們發現語境結構與時間關聯兩者具有互補特性,若把分別經由兩者各自處理後的結果加以結合,可再提高影片語意標記的準確度。 其次,為了充分利用語境結構及時間關聯的資訊,我們提出另一種整合框架稱作「多重線索融合框架」。有別於前者,此架構創新之處有二:第一,我們採取機率模型與遞迴方式設計一套資料驅動的演算法,從標記的訓練資料中,分別建立語意概念的語境結構及時間關聯之多階關係。第二,我們採用圖形模型,將取得的多階關係與偵測器的預測結果整合於同一網路,透過能量函式最佳化的方式,找出最接近偵測器預測結果與最符合語境結構及時間關聯的解。藉由多重線索融合框架的技術,多數偵測器預測錯誤的標記結果,皆能獲得適當的修正。 另外,為了減少訓練影片與測試影片因資料來源範疇差異所導致多階關係可能不一致的潛在問題,我們提出「跨範疇多重線索融合框架」,此架構同時整合了從使用者提供之標記資料(訓練影片)與從偵測器產生之預測資料(測試影片)所取得的語境結構和時間關聯。藉由整合從測試資料取得的多階關係,從訓練資料取得的多階關係可獲得妥善的調整,進而適用於和原始影片不同範疇的目標影片上。 最後,我們提出一種非監督式的「協同過濾框架」用以改進偵測器預測影片標記不夠準確的缺點,這種架構的主要優勢是沒有跨影片範疇的問題。在此架構中,我們利用分鏡與分鏡相似以及概念與概念相關的兩個重要性質,針對語意概念偵測器產生之標記預測結果,進行資料相依性的探索。我們把所有語意概念出現於全部分鏡的可能性建構成一個矩陣,透過矩陣分解,取得一個近似於原始矩陣的低維度矩陣。由於低維度矩陣的相依關係具有較原始矩陣更高的一致性,因此可大幅降低原始矩陣中不正確的資料數量。換句話說,偵測器預測不準確的標記結果在協同過濾的運作下,可獲得適度的修正,進而改善整體的準確度。 為了驗證以上四種整合語境結構及時間關聯的框架,我們以TRECVID 數據集進行實驗,結果顯示我們所提出的方法不但效率好,效益也高;其中,對於大部分語意概念來說,標記準確度都有顯著的提升。除此之外,使用我們提出的方法所建立之語境結構及時間關聯的關係,亦能普遍適用於各種不同偵測器的預測結果上。 總結來說,本論文的具體貢獻可扼要說明如下:第一,針對影片中語意概念的語境結構及時間關聯,是深入且完整的研究。第二,針對各種不同的資料進行語意概念的語境結構及時間關聯之探索,並用以提升影片標記準確度。第三,提出一個整合多影片範疇多重線索的框架,其準確度提升的幅度是目前所有公開技術中表現最好的。第四,本論文提出第一個以非監督式方法同時利用語境結構及時間關聯來改進影片標記準確度的框架。 | zh_TW |
| dc.description.abstract | The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This thesis studies three key issues toward improving concept detection: (1) how to explore cues beyond low-level features in an efficient and effective way, (2) how to integrate these learned high-level relations with independent concept detectors into a common framework, and (3) how to exploit the information embedded within the initial detection results to alleviate cross-domain problems. Specifically, we propose several frameworks to take advantage of both contextual correlation and temporal dependency from user-provided annotations and/or detector-generated predictions for various application scenarios.
We first present a rule-based post-filtering framework that combines contextual correlation and temporal dependency to enhance robustness and accuracy of semantic concept detection. Given manually annotated ground truth, we use association rule mining techniques to discover inter-concept contextual relationships and adopt a strategy to combine correlated detectors. In addition, we investigate statistical measurements to discover inter-shot temporal relationships and propose a filter design to fuse dependent detectors. Experiments on the TRECVID 2005 data set show our framework is not only effective but efficient. Furthermore, it can be easily integrated with existing detectors to boost their performance. To exploit the refined scores for inference, instead of direct use of detection scores, we introduce a multi-cue fusion framework that explores and unifies both contextual correlation among concepts as well as temporal dependency among shots. This framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from manual annotations of tens of thousands of shots with hundreds of concepts. Second, labels for all concepts and all shots in a video are solved simultaneously through optimizing a graphical model. Experiments on the TRECVID 2006 data set %evaluation benchmark show that our framework is promising for semantic video indexing, achieving around a 30% performance boost on two popular baselines, VIREO-374 and Columbia374, in inferred average precision. Toward solving the problem of domain change between training and test videos, we propose a cross-domain multi-cue fusion framework that explores multiple cues across various video domains and fuses them all together then. In this framework, test shots are assigned with pseudo-labels so that contextual and temporal relations can be modeled in an unsupervised manner. Integration of the relationships learned from user-provided annotations (training videos) and detector-generated predictions (test videos) accommodates the domain change, leading to greater labeling quality. Extensive experiments on the TRECVID 2006--2008 data sets show that our framework outperforms the stat-of-the-art approaches, achieving significant performance gains (ranging from 27% to 61% for different settings) on a widely used benchmark. Finally, this thesis describes a collaborative filtering framework that refines the initial detection scores in a fully unsupervised fashion, through exploring shot-to-shot (clip-to-clip) similarity and concept-to-concept correlation in a large collection of testing videos. We treat the noisy (inaccurate) scores for all concepts and all shots as a matrix. These scores are then de-noised via matrix factorization which discovers data dependence within the matrix. We further improve this method by dividing the score matrix into patches. Better models are learned from the grouped similar patches to further enhance the detection accuracy. In addition to the easy-to-implement advantage, experiments on the TRECVID 2006-2008 evaluation benchmarks achieve salient improvement, ranging from 20% to 50%, without using any labeled training data or external resources. The major contributions of this thesis can be summarized as follows. (1) An in-depth investigation of jointly exploiting both inter-concept correlation and inter-shot dependency to enhance detecting the presence of generic concepts. (2) The first study covering the exploration of various sources for discovering relational knowledge to benefit semantic video indexing. (3) A state-of-the-art system that enables a fusion of multiple cues from multiple domains, %when used in conjunction with detection scores, yielding the highest performance improvement on exploring high-level relations for concept detection. (4) The first unsupervised approach of simultaneously utilizing both contextual and temporal information to improve concept-based video indexing. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T05:15:55Z (GMT). No. of bitstreams: 1 ntu-99-D94922008-1.pdf: 2729142 bytes, checksum: 970591f73f5e8865558130b5cd53c720 (MD5) Previous issue date: 2010 | en |
| dc.description.tableofcontents | Thesis Oral Examination Committee Members Approval Sheet i
Chinese Abstract ii Abstract v Table of Contents viii List of Figures x List of Tables xi 1 Introduction 1 1.1 Semantic Video Indexing . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Modeling Relations: Contextual Correlation and Temporal Dependency 6 1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Literature Review 18 2.1 A Typical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Relation Modeling Approaches . . . . . . . . . . . . . . . . . . . . . . 19 2.3 TRECVID Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Rule-based Post-ltering Framework 28 3.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Mining Temporal Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Multi-Cue Fusion Framework 47 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 Multi-Cue Fusion Method . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Cross-Domain Multi-Cue Fusion Framework 71 5.1 Overview of Framework . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Online Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 85 6 Collaborative Filtering Framework 89 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3 Latent Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.4 Localized Collaborative Video Re-indexing . . . . . . . . . . . . . . . 98 6.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.6 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . 109 7 Conclusions 112 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Unique Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 References 118 | |
| dc.language.iso | en | |
| dc.subject | 時間關聯 | zh_TW |
| dc.subject | 影片查詢系統 | zh_TW |
| dc.subject | 影片語意標記 | zh_TW |
| dc.subject | 數位影音內容分析 | zh_TW |
| dc.subject | 語境結構 | zh_TW |
| dc.subject | Concept-based video retrieval | en |
| dc.subject | TRECVID | en |
| dc.subject | Temporal dependency | en |
| dc.subject | Contextual correlation | en |
| dc.subject | Cross-domain learning | en |
| dc.subject | Multimedia content analysis | en |
| dc.subject | Semantic video indexing | en |
| dc.title | 利用語境結構與時間關聯提升影片語意標記準確性之研究 | zh_TW |
| dc.title | Exploring Contextual and Temporal Relationships for
Semantic Video Indexing | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 98-2 | |
| dc.description.degree | 博士 | |
| dc.contributor.oralexamcommittee | 吳家麟(Ja-Ling Wu),林智仁(Chih-Jen Lin),徐宏民(Winston H. Hsu),廖弘源(Hong-Yuan Mark Liao),黃仲陵(Chung-Lin Huang),林嘉文(Chia-Wen Lin),陳祝嵩(Chu-song Chen) | |
| dc.subject.keyword | 影片查詢系統,影片語意標記,數位影音內容分析,語境結構,時間關聯, | zh_TW |
| dc.subject.keyword | Concept-based video retrieval,Semantic video indexing,Multimedia content analysis,Cross-domain learning,Contextual correlation,Temporal dependency,TRECVID, | en |
| dc.relation.page | 130 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2010-07-22 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-99-1.pdf 未授權公開取用 | 2.67 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
