基於聲音資訊的婚禮影片索引

Shao-Yen Fang; 方劭彥

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27254

標題:	基於聲音資訊的婚禮影片索引 Audio Information Based Wedding Video Indexing
作者:	Shao-Yen Fang 方劭彥
指導教授:	吳家麟(Ja-Ling Wu)
關鍵字:	婚禮事件配對,講演/音樂的辨別,語音/非語音的辨別,移動平均線,靜音偵測,演說者改變,掌聲偵測, wedding event matching,speech/music discrimination,vocal/non-vocal discrimination, moving average,silence detection,speaker change detection,clap detection,
出版年 :	2008
學位:	碩士
摘要:	有越來越多的人們使用數位攝影機來記錄他們生活上所發生的一些特別的事件，例如婚禮是我們生活中一項很重要的儀式，遠方的親戚或許久不見的朋友藉機聚在一起，所以常會拍攝影片去紀念它，這就是所謂的婚禮影片。然而影片卻常常被放在儲存媒體中而沒有再去看過它。原始未經整理的影片讓人無法輕易的觀看，因此我們需要作影片摘要的處理。　　在傳統的影片摘要處理上常應用「主要顏色(dominant color)」、「鏡頭移動(camera motion)」、「場景轉換(scene change)」等視覺資訊，但這些技術並不能很好的運用在婚禮影片上；相反的，在婚禮影片的聲音訊號中卻有很豐富的資訊。但是在婚禮影片的拍攝中環境雜音的影響是不可避免的，然而大部份的聲音訊號處理技巧都是在乾淨無噪音的環境下進行的，而且這些技巧在有噪音的狀況下不是很準確，因此我們提出可以抵抗噪音影響的實驗方法。　　首先將聲音訊號抽取出各項特徵並選取適合的特徵群後進行講演/音樂的辨別(speech/music discrimination)，其中音樂的部份再選取被推薦的特徵群後進行語音/非語音的辨別(vocal/non-vocal discrimination)，根據以上兩種辨別方法的結果經過移動平均線(moving average)的平滑化(smooth)後，我們將輸入的聲音訊號切成一個個的片段。而講演的部份由相對的靜音偵測(silence detection)找出斷句的所在，再進行基於句子之間差異的演說者轉換偵測(speaker change detection)。輸入的聲音訊號同時也進行掌聲偵測(clap detection)。綜合以上實驗的結果，我們將前面兩種辨別方法所得到的片段與婚禮的事件逐一配對。　　由各項實驗的結果，可以得到由每個片段的特徵值欄位組成的表格，每一個片段的欄會描述該片段是講演或是音樂、有語音或沒有語音、有沒有演說者改變及有沒有掌聲出現。每個片段根據其特徵欄位與婚禮事件的性質作配對，例如一個是音樂且沒有語音的片段會與樂器演奏配對，一個是講演、沒有演說者改變且沒有偵測出掌聲的片段會與牧師禱告配對。　　由於目前對於婚禮事件性質的瞭解尚有不足，實驗中分段的結果及其它聲學事件偵測可能產生誤差，所以實際真相和配對的結果仍然有一定程度的差異。為了提昇婚禮事件配對結果的準確度，我們提出了一個簡單的錯誤更正機制。如同一般有故事結構的事件序列有起承轉合，婚禮事件也符合這樣的架構。因此我們將前面所得到婚禮事件配對的結果序列加以分群，而無法被分群的片段則被視為錯誤；我們的目標就是更正無法被分群的片段所產生的錯誤，產生錯誤的片段用適當的配對規則重新配對。圖五是一個婚禮影片經過重新修正的結果，其中一個錯誤的事件被更新成正確的事件而提昇了準確率。　　在本篇論文中，婚禮影片藉由我們所提出可以抵抗噪音影響的講演/音樂辨別及語音/非語音辨別分成許多片段；接著這些片段藉由我們所提出的演說者轉變偵測和掌聲偵測標記上相對應的婚禮事件；最後經由修正的機制更正不符合婚禮架構的錯誤。 People tend to use digital video recorder to capture their lives, for example wedding is one of important ceremonies in our life, and people usually film a video record to commemorate it. But then the videos are usually put into storage and never watch again, because the raw video is hard to turn into compelling video story. Thus we need to apply the video summarization. Visual information such as dominant color, motion, scene change is usually used in traditional video summarization, but it is not well applicable in wedding video. On the other hand the audio information is meaningful. It is hard to avoid the noise in wedding videos, however most audio processings such as speech/music discrimination are dealt with in clean environment in the literature, and the performance of them are not good enough with noise, thus we develop the noisy environment resisted speech/music discrimination and vocal/non-vocal discrimination. In addition, contrast to other papers that apply low level acoustic features, we combine the results of speaker change detection and clap detection with our wedding event matching procedure. Distinguishably to other papers which focus on the signal processing, we apply a refine algorithm to re-correct the mismatched events to improve the performance of our proposed work. 　　In this thesis, the given wedding videos are divided into several segments by speech/music discrimination and vocal/non-vocal discrimination which are developed by our proposed work and can resist the noisy environment. Then the obtained segments will be labeled to associated wedding events assisted with speaker change detection and clap detection which are developed by our proposed work. Finally the labeled events will be revised by our refine algorithm that tried to re-match the mismatch events which are not fit for the wedding structure.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27254
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 目前未授權公開取用	1.28 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。