基於數位語音處理技術之同步演算法實作與實測

Jhu-Ze Ke; 柯竹澤

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46933

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝宏昀(Hung-Yun Hsieh)
dc.contributor.author	Jhu-Ze Ke	en
dc.contributor.author	柯竹澤	zh_TW
dc.date.accessioned	2021-06-15T05:43:34Z	-
dc.date.available	2011-08-01
dc.date.copyright	2010-08-20
dc.date.issued	2010
dc.date.submitted	2010-08-19
dc.identifier.citation	[1] The extended osip library. http://savannah.nongnu.org/projects/exosip/. [2] Fastest fourier transform in the west( tw). http://www.fftw.org/. [3] The gnu osip library. http://www.gnu.org/software/osip/. [4] Linphone website. http://www.linphone.org/index.php/. [5] Msdn: Directshow (windows). http://msdn.microsoft.com/en-us/library/dd375454%28VS.85%29.aspx. [6] Pjsip website. http://www.pjsip.org/pjsip/. [7] N. Aoki. A voip packet loss concealment technique taking account of pitch variation in pitch waveform replication. Electronics and Communications in Japan (Part I: Communications), 2005. [8] N. Cvejic and T. Seppnen. Spread spectrum audio watermarking using frequency hopping and attack characterization. Signal Processing, 84(1) [9] N. Fatima, S. Aftab, R. Sultan, S. Shah, B. Hashmi, A. Majid, and S. Zafar. Speaker recognition using lower formants. In Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International, pages 125 [10] K. Fushikida. A formant extraction method using autocorrelation domain inverse ltering and focusing method. In Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on, pages 2260 [11] J. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler. Acoustic characteristics of american english vowels. The Journal of the Acoustical Society of America [12] H.-Y. Hsieh, C.-W. Li, S.-W. Liao, Y.-W. Chen, T.-L. Tsai, and H.-P. Lin. Moving toward end-to-end support for hando s across heterogeneous telephony systems on dual-mode mobile devices. Comput. Commun. [13] S. G. Johnson and M. Frigo. A modi ed split-radix t with fewer arithmetic operations. Signal Processing, IEEE Transactions on, 55(1) [14] C. Kim, K. deok Seo, W. Sung, and S. heung Jung. E cient audio/video synchronization method for video telephony system in consumer cellular phones. In Consumer Electronics, 2006. ICCE '06. 2006 Digest of Technical Papers. International Conference on, pages 137 [15] C. Kim, K.-d. Seo, and W. Sung. A robust formant extraction algorithm combining spectral peak picking and root polishing. EURASIP J. Appl. Signal Process 2006 [16] C.-C. Kuo, M.-S. Chen, and J.-C. Chen. An adaptive transmission scheme for audio and video synchronization based on real-time transport protocol. Multimedia and Expo, IEEE International Conference on, 0:104, 2001. [17] W.-N. Lie and H.-C. Hsieh. Lips detection by morphological image processing. In Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth International Conference on, volume 2, pages 1084 [18] H.-P. Lin. On using digital speech processing techniques for synchronization among heterogeneous teleconferencing devices. Master's thesis, National Taiwan University, July 2008. [19] H. Liu and M. El Zarki. An adaptive delay and synchronization control scheme for wi-fi based audio/video conferencing. [20] D. F. McAllister, R. D. Rodman, D. L. Bitzer, and A. S. Freeman. Speaker independence in automated lip-sync for audio-video communication. Comput. Netw. ISDN Syst., 30(20-21):1975 [21] S. McCandless. An algorithm for automatic formant extraction using linear prediction spectra. Acoustics, Speech and Signal Processing, IEEE Transactions on, 22(2):135 [22] A. Mezghani and D. O'Shaughnessy. Speaker veri cation using a new representation based on a combination of mfcc and formants. In Electrical and Computer Engineering, 2005. Canadian Conference on, May 2005. [23] K. Molla, K. Hirose, N. Minematsu, and K. Hasan. Voiced/unvoiced detection of speech signals using empirical mode decomposition model. In Information and Communication Technology, 2007. ICICT '07. International Conference on pages 311{314, March 2007. [24] C. Perkins, O. Hodson, and V. Hardman. A survey of packet loss recovery techniques for streaming audio. pages 607 [25] S. E. Poltrock. Videoconferencing: Recent experiments and reassessment. In Proc. 38th Hawaii Intl Conf. on System Sciences (HICSS-38), 2005, 2005. [26] H. Reitboeck, T. Brody, and J. Thomas, D. Speaker-identi cation with real time formant extraction. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '77., volume 2, pages 761 [27] J. Rouat, Y. C. Liu, and D. Morissette. A pitch determination and voiced/unvoiced decision algorithm for noisy speech. [28] M. Steinebach and J. Dittmann. Watermarking-based digital audio data authentication. EURASIP J. Appl. Signal Process., 2003 [29] Telecommunication standardization sector of International Telecommunication Union(ITU). Coding of speech at 8 kbit/s using conjugate-structure algerbraic code-excited linear prediction(CS-ACELP). [30] Q. C. University and Q. Cheng. Spread spectrum signaling for speech watermarking. In in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP, pages 1337 [31] R. Vch1 and M. Vondra1. Speech spectrum envelope modeling. Lecture Notes in Computer Science, 4775, 2007. [32] F. Villavicencio, A. Robel, and X. Rodet. Improving lpc spectral envelope extraction of voiced speech by true-envelope estimation. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, volume 1, pages I [33] M. Wasserblat, M. Gainza, D. Dorran, and Y. Domb. Pitch tracking and voiced/unvoiced detection in noisy environment using optimal sequence estimation. In Signals and Systems Conference, 208. (ISSC 2008). IET Irish, pages 43 [34] L. Welling and H. Ney. Formant estimation for speech recognition. Speech and Audio Processing, IEEE Transactions on, Jan 1998. [35] M. Yang, N. Bourbakis, Z. Chen, and M. Trifas. An e cient audio-video synchronization methodology. In Multimedia and Expo, 2007 IEEE International Conference on, pages 767 2007. [36] O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. Signal Processing, IEEE Transactions on, July 2004.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46933	-
dc.description.abstract	在異質網路下的視訊會議不僅可以經由傳統電話(PSTN)網路得到穩定的語音通話品質，並可藉由網際(IP)網路傳送影像資訊讓使用者擁有較佳的使用者經驗。然而，經由不同網路分別傳送語音及影像資料會有影音不同步的問題。在本實驗室先前的論文研究中，曾提出基於數位語音處理技術的演算法以解決影音不同步之問題，但並未在實際的系統上測試其可行性。在本論文中，我們首先實作前人論文研究所提出的演算法，並於現有的軟硬體環境上實機測試。在硬體方面，我們利用雙模無線網卡及音源轉接線，使筆記型電腦具有雙模手機之功能，能同時與PSTN (GSM)及IP (WLAN)網路連通。在軟體方面，我們在既有的網路電話軟體中加入同步處理模組，使得我們能利用演算法取得兩邊網路的時間延遲差異，並在必要情況下讓語音訊號延遲來達到影音同步之效。在實機上測試前人研究所提出的演算法後，我們發現這些演算法需要進一步減少複雜度以能有效即時運作。本論文因此提出了加速同步演算法的機制：經由分析語音訊號的發聲模型，我們提出了兩個利用語音共振峰特性之加速演算法。在實機上實作本論文所提出之加速機制，並測量其速度與正確率之改進程度後，我們發現本論文提出的演算法可以在未犧牲任何正確率之下，只使用原本之十分之一的執行時間，有效驗證以數位語音處理技術解決異質網路視訊會議下影音同步問題之可行性。	zh_TW
dc.description.abstract	The video conference among heterogeneous network could not only provide stable connection through PSTN network but also provide good experience by transmitting picture frames among IP network. However, the audio and video streams might be asynchronous among heterogeneous network. In our previous work, algorithms based on digital speech processing techniques are provided to deal with the problem. However, the algorithms are not evaluated on the real system. In this thesis, synchronization algorithms of previous research are implemented and evaluated on the existing hardware. In hardware part, we use dual mode wireless card and audio cable to make notebook have the same capability as dual mode mobile phone which can communicate with both PSTN(GSM) and IP(WLAN) networks at same time. In software part, we add the synchronization modules in existing softphone to measure the delay difference between networks and postpone the waveform if needed. After the algorithms of our previous work are implemented and evaluated upon the prototype, we found both accuracy and execution time of our previous work should be improved for real-time system. Accelerating methods are proposed in thesis to resolve the problem. We proposed two accelerating algorithms using feature of speech formant after analyzing speech production model. We found the proposed algorithm would only spend one-tenth execution time of original algorithm without sacrificing any accuracy after implementation our proposed algorithm on existing hardware. Therefore, we evaluate the possibility of resolving audio/video conference asynchrony in heterogeneous networks by digital speech processing technology.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T05:43:34Z (GMT). No. of bitstreams: 1 ntu-99-R96942092-1.pdf: 3297268 bytes, checksum: 9d2ce0a9d78d83fbe27a4b09e872cb9a (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Heterogeneous Teleconference Scenario . . . . . . . . . . . . . . . . . 5 2.1.1 Audio Conference Architecture . . . . . . . . . . . . . . . . . 5 2.1.2 Video Conference Architecture . . . . . . . . . . . . . . . . . 6 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Conventional IP Audio-Video Synchronization . . . . . . . . 7 2.2.2 Watermarking Mechanism . . . . . . . . . . . . . . . . . . . 8 2.2.3 Lip Synchronization . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Dual Transmission Conference . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Synchronization Problem Translation . . . . . . . . . . . . . 9 2.3.2 Synchronization based on Cross-correlation Coecient . . . . 10 2.3.3 Synchronization based on Spectrogram . . . . . . . . . . . . 11 2.3.4 Window-Disjoint Orthogonality Measurement . . . . . . . . . 11 CHAPTER 3 IMPLEMENTATION . . . . . . . . . . . . . . . . . . . 13 3.1 Overall Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Algorithm Trigger . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 Synchronization Algorithm . . . . . . . . . . . . . . . . . . . 16 3.3.3 Delay Controller . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Linphone Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.1 Concepts of Filters in Linphone . . . . . . . . . . . . . . . . 21 3.4.2 Block Diagram of Linphone Architecture . . . . . . . . . . . 21 3.4.3 Prime Data Structure and Functions . . . . . . . . . . . . . . 23 3.5 Modication on Linphone . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.1 Non-discard Mechanism . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 Packet Loss Compensation . . . . . . . . . . . . . . . . . . . 26 3.5.3 Compatible for Relay Server . . . . . . . . . . . . . . . . . . 27 3.6 Implementation of Cross-correlation Algorithm . . . . . . . . . . . . 28 3.7 Implementation of Spectrogram Algorithm . . . . . . . . . . . . . . 32 3.8 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.9 Further Discussion of Cross-correlation based algorithm . . . . . . . 39 CHAPTER 4 ACCELERATING WITH FORMANT RELATED CO- EFFICIENTS IN FREQUENCY DOMAIN . . . . . . . . . . . . . 48 4.1 Feature of People Separation and Synchronization Algorithm . . . . 48 4.2 Features of Speech Sound . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Simple Model for Speech Production . . . . . . . . . . . . . . 50 4.2.2 Relationship between Voiced Frames and SIR . . . . . . . . . 52 4.3 Peak-Group-Picking Method . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Implementation of Peak-Group-Picking Algorithm . . . . . . . . . . 56 4.5 Spectrogram with Peak-Group-Picking Algorithm . . . . . . . . . . 57 4.5.1 Windowed-disjoint Orthogonality . . . . . . . . . . . . . . . 57 4.5.2 Synchronization Accuracy Evaluation . . . . . . . . . . . . . 61 CHAPTER 5 ACCELERATING SYNCHRONIZATION ALGORITHM IN FREQUENCY DOMAIN BY TOP-N COEFFICIENTS . . . 70 5.1 Another View of Speech Sound . . . . . . . . . . . . . . . . . . . . . 70 5.2 Features of Top-N coecients in Frequency Domain . . . . . . . . . 72 5.3 TNCP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 Spectrogram with Top-N Coecients Picking . . . . . . . . . . . . . 77 5.4.1 W-DO score . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.2 Synchronization accuracy evaluation . . . . . . . . . . . . . . 79 CHAPTER 6 PROTOTYPE EXPERIMENT . . . . . . . . . . . . . 85 6.1 Experiment Environment Setup . . . . . . . . . . . . . . . . . . . . 85 6.1.1 Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1.2 Software Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2 Codec Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Packet Loss and Dierent Jitters . . . . . . . . . . . . . . . . . . . . 91 6.4 3-parties Connection Recovery . . . . . . . . . . . . . . . . . . . . . 93 CHAPTER 7 CONCLUSION AND FUTURE WORK . . . . . . . 96 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
dc.language.iso	en
dc.subject	異質網路	zh_TW
dc.subject	數位語音處理	zh_TW
dc.subject	共振峰	zh_TW
dc.subject	影音同步	zh_TW
dc.subject	Heterogeneous Teleconferencing	en
dc.subject	DSP	en
dc.subject	formant	en
dc.title	基於數位語音處理技術之同步演算法實作與實測	zh_TW
dc.title	Implementation and Evaluation of DSP-Assisted Synchronization Algorithms for Heterogeneous Teleconferencing	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	葉丙成(Ping-Cheng Yeh),高榮鴻(Rung-Hung Gau),鄭振牟(Chen-Mou Cheng)
dc.subject.keyword	數位語音處理,共振峰,影音同步,異質網路,	zh_TW
dc.subject.keyword	DSP,formant,Heterogeneous Teleconferencing,	en
dc.relation.page	100
dc.rights.note	有償授權
dc.date.accepted	2010-08-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	3.22 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。