基於圖片內容擷取之圖像語意擴充與其影片廣告安插之應用

Wei-Shing Liao; 廖瑋星

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45262

標題:	基於圖片內容擷取之圖像語意擴充與其影片廣告安插之應用 Semantic Query Expansion for Content-based Image Retrieval and Its Application on Video Advertising
作者:	Wei-Shing Liao 廖瑋星
指導教授:	徐宏民(Winston H. Hsu)
關鍵字:	圖片搜尋,語意擴充,相片文字標籤,影片廣告, Content-based Image Retrieval,Semantic Query Expansion,Visual Words,Tag,Video Advertising,
出版年 :	2010
學位:	碩士
摘要:	基於影像內容的擷取(CBIR)，是一種有效管理數量以指數成長的圖片之關鍵技術，並可應用於諸如搜尋式的相片標註(annotation by search)，計算相片學(computational photography)和圖片的問答(photo-based question answering)。雖然經過數十年的研究，現有的方法仍然受限於語意隔閡(semantic gap)。於此論文我們提出一種發掘蓬勃發展的多媒體分享服務(如Flickr)和搜尋引擎(如Google)去取得外界知識(諸如:相片、相片標籤、部落格、輔助資料)來建立圖片之間的語意關係來改善CBIR成效的技術。此外，我們也提出一個系統來呈現如何善用語意擴充(semantic expansion)。此系統結合影像比對技術(包含諸如:商標、地標的影像)，自動於影片中安插廣告。此廣告推薦系統也是一種前瞻的服務將有機會增加網路商業效益。因為傳統CBIR的技術都受制於語意隔閡(semantic gap)的問題，於是我們提出一種結合圖片視覺與文字的技術。然而這些由社群網路提供的資料夾帶著大量的雜訊，包括相片標籤等等。於是我們利用Google的知識來計算出這些相片標籤之間的語意相關性。我們利用一種基於圖論為基礎的方法，為每一種線索都建立起一個模組。經由線性組合起來，三種線索建立起的三個模組將會合併成為一個模組。於此，我們利用隨機漫步(random walk)來解此問題，以任一CBIR方法的結果視作初始值，接著以隨機漫步的方式來改善成果。因為每個模組的建構是獨立的，我們將可容易的於線性組合當中改變她們的權重以達不同需求。除此之外，我們也考慮了處理大量資料時效率的問題。同時基於這個架構下，也可延伸應用於基於文字的圖片搜尋(keyword-based image retrieval)與圖片自動註記(image annotation)。為呈現語意擴充的用途，我們也提出了一個廣告推薦系統: AdVis。這是一個基於影像比對技術的自動安插廣告於影片當中的系統。於此，標價者將會對有興趣的影像(AdImage)進行標價，類似於AdWords的系統。此外AdVis著重於系統效益的最佳化並且考慮觀看者的感受。我們將此問題公式化於1-0 integer programming problem的問題。實驗於Flickr資料庫上顯示語意擴充有效解決於: (1)基於圖片內容的圖片搜尋，搜尋效果比起過去方法成長了200%; (2)基於文字的圖片搜尋，搜尋結果相較於傳統的文字搜尋也表現突出，肇因於傳統文字搜尋受限於雜訊與不完整的相片標籤; (3)相片自動標籤，和過去具代表性的方法比較也有大幅度的成長。 Content-based image retrieval (CBIR) is one of the essential techniques for managing exponentially growing photos and the enabling technology for many applications such as annotation by search, computational photography, photo-based question and answering, etc. Though through decades of research, current solutions are limited due to the “semantic gap.” In this work, we argue to improve CBIR by automatically exploiting the auxiliary knowledge (i.e., tags, photos, blogs, metadata, etc.) from the booming media-sharing services (e.g., Flickr) and search engines (e.g., Google); i.e., finding more semantic-related images to enhance CBIR results. To demonstrate the benefits of semantic expansion, we further apply the proposed framework in a promising application – video advertising by target image matching, which automatically associates the relevant ads by content-based matching over related image objects (e.g., logos, scenes, etc.). It is one of the potential applications for Internet monetization as the prevalence of shared videos. Since traditional CBIR methods are hindered by the semantic gap, our approach argues to leverage content and context information widely available in media-sharing services. However, such community-contributed cues (e.g., tags, descriptions, image appearance, metadata) are generally noisy. We measure the semantic similarities between tags by exploiting Google knowledge. We apply graph-based approach, one graph model for each cue. Through aggregating multiple cues (as graphs) in a linear manner, we perform random walk on a unified graph originating from the initial CBIR search results to further improve the precision and recall rates. Since each graph is built independently, the weights for them are adaptive for different applications. We also consider the efficiency issues as deploying in the large-scale media-sharing sites. Meanwhile, the framework is generic and can be extended for other applications such as keyword-based image retrieval and image annotation. To demonstrate the benefits of semantic expansion, we also propose a framework called AdVis, which automatically associates the relevant ads by visual matching. Here, bidder bids ads by the interested images – adImages, which is analogous to keywords in AdWords model. AdVis aims to maximize system revenue and user perception. We formulate the solution as a nonlinear 1-0 integer programming problem. Experimenting over Flickr photo benchmarks, the proposed semantic expansion framework performs saliently in a few aspects: (1) example-based image retrieval: outperforming traditional CBIR systems up to 200%; (2) text-based image retrieval: salient performance gains over conventional keyword-based search since the latter suffers from noisy and missing tags; (3) image auto-annotation: showing significant gains over other search-based image annotation approaches.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45262
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 目前未授權公開取用	1.24 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。