基於數位語音處理技術之同步演算法實作與實測

Jhu-Ze Ke; 柯竹澤

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46933

標題:	基於數位語音處理技術之同步演算法實作與實測 Implementation and Evaluation of DSP-Assisted Synchronization Algorithms for Heterogeneous Teleconferencing
作者:	Jhu-Ze Ke 柯竹澤
指導教授:	謝宏昀(Hung-Yun Hsieh)
關鍵字:	數位語音處理,共振峰,影音同步,異質網路, DSP,formant,Heterogeneous Teleconferencing,
出版年 :	2010
學位:	碩士
摘要:	在異質網路下的視訊會議不僅可以經由傳統電話(PSTN)網路得到穩定的語音通話品質，並可藉由網際(IP)網路傳送影像資訊讓使用者擁有較佳的使用者經驗。然而，經由不同網路分別傳送語音及影像資料會有影音不同步的問題。在本實驗室先前的論文研究中，曾提出基於數位語音處理技術的演算法以解決影音不同步之問題，但並未在實際的系統上測試其可行性。在本論文中，我們首先實作前人論文研究所提出的演算法，並於現有的軟硬體環境上實機測試。在硬體方面，我們利用雙模無線網卡及音源轉接線，使筆記型電腦具有雙模手機之功能，能同時與PSTN (GSM)及IP (WLAN)網路連通。在軟體方面，我們在既有的網路電話軟體中加入同步處理模組，使得我們能利用演算法取得兩邊網路的時間延遲差異，並在必要情況下讓語音訊號延遲來達到影音同步之效。在實機上測試前人研究所提出的演算法後，我們發現這些演算法需要進一步減少複雜度以能有效即時運作。本論文因此提出了加速同步演算法的機制：經由分析語音訊號的發聲模型，我們提出了兩個利用語音共振峰特性之加速演算法。在實機上實作本論文所提出之加速機制，並測量其速度與正確率之改進程度後，我們發現本論文提出的演算法可以在未犧牲任何正確率之下，只使用原本之十分之一的執行時間，有效驗證以數位語音處理技術解決異質網路視訊會議下影音同步問題之可行性。 The video conference among heterogeneous network could not only provide stable connection through PSTN network but also provide good experience by transmitting picture frames among IP network. However, the audio and video streams might be asynchronous among heterogeneous network. In our previous work, algorithms based on digital speech processing techniques are provided to deal with the problem. However, the algorithms are not evaluated on the real system. In this thesis, synchronization algorithms of previous research are implemented and evaluated on the existing hardware. In hardware part, we use dual mode wireless card and audio cable to make notebook have the same capability as dual mode mobile phone which can communicate with both PSTN(GSM) and IP(WLAN) networks at same time. In software part, we add the synchronization modules in existing softphone to measure the delay difference between networks and postpone the waveform if needed. After the algorithms of our previous work are implemented and evaluated upon the prototype, we found both accuracy and execution time of our previous work should be improved for real-time system. Accelerating methods are proposed in thesis to resolve the problem. We proposed two accelerating algorithms using feature of speech formant after analyzing speech production model. We found the proposed algorithm would only spend one-tenth execution time of original algorithm without sacrificing any accuracy after implementation our proposed algorithm on existing hardware. Therefore, we evaluate the possibility of resolving audio/video conference asynchrony in heterogeneous networks by digital speech processing technology.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46933
全文授權:	有償授權
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	3.22 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。