自發性國語語音中自動偵測填充式停頓之初步研究

Yi Lee; 李易

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40927

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山
dc.contributor.author	Yi Lee	en
dc.contributor.author	李易	zh_TW
dc.date.accessioned	2021-06-14T17:07:10Z	-
dc.date.available	2008-08-05
dc.date.copyright	2008-08-05
dc.date.issued	2008
dc.date.submitted	2008-07-28
dc.identifier.citation	【1】 Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., Molau, S., Pitz, M., Sixtus, A., 1999. 'The Philips/RWTH system for transcription of broadcast news.' In: Proc. European Conference on Speech Communication and Technology, Vol. II, Budapest, Hungary, pp. 647-650. 【2】 Yu, H., Tomokiyo, T., Wang, Z., Waibel, A., 2000. 'New developments in automatic meeting transcription.' In: Proc. International Conference on Spoken Language Processing, Vol. IV, Beijing, China, pp. 310-313. 【3】 Shriberg, E., 1996. “Disfluencies in Switchboard.” In: Proc. International Conference on Spoken Language Processing, Vol. Addendum, Philadelphia, USA, pp. 11–14. 【4】 Shriberg, E., Stolcke, A., 1996. “Word predictability after hesitations: a corpus-based study.” In: Proc. International Conference on Spoken Language Processing, Vol. III. Philadelphia, USA, pp. 1868–1871. 【5】 Pakhomov, S.-V., 2001. “Hesitations and cognitive status of noun phrase referents in spontaneous discourse.” University of Minnesota, dissertation for doctor of philosophy. 【6】 Shriberg, E., 2005. “Spontaneous speech: how people really talk and why engineers should care.” In: Proc. Interspeech 2005, pp. 1781-1784. 【7】黃佳瑩, 重松淳, 2005. “日籍國語學習者之填空詞使用：以遠距形式談話為中心的考察.” 全球華文網路教育國際研討會（ICICE）. 【8】 Gabrea, M., O’Shaugnessy, D., 2000. “Detection of filled pauses in spontaneous conversational speech.” In: Proc. International Conference on Spoken Language Processing, Vol. III, Beijing, China, pp. 678–681. 【9】 Ohta, K., Tsuchiya, M., Nakagawa, S., 2007. “Construction of spoken language model including fillers using filler prediction model.” In: Proc. Interspeech 2007, pp. 1489-1492. 【10】 Pakhomov, S.-V., Savova, G., 1999. “Filled pause distribution and modeling in quasi-spontaneous speech.” Presented at Disfluency Workshop at International Congress of Phonetic Sciences, Berkely, CA. 【11】 Pakhomov, S.-V., 1999. “Modeling filled pauses in medical dictations.” In: Proc. Association for Computational Linguistics (ACL), College Park, Maryland, USA, pp. 619–624. 【12】 Siu, M., Ostendorf, M., 1996. “Modeling disfluencies in conversation speech.” In: Proc. ICSLP-96, vol.1, pp. 386-389. 【13】 Stolcke, A., Shriberg, E., 1996. “Statistical language modeling for speech disfluencies.” In: Proc. International Conference on Acoustics, Speech and Signal Processing, Vol. I, Atlanta, USA, pp. 405–408. 【14】 Siu, M., Ostendorf, M., 2000. “Variable N-gram and extensions for conversational speech language modeling.” Speech and Audio Processing, IEEE Transactions on Volume 8, pp. 63-75. 【15】 Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P., 2006. “Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation.” In: Speech Communication 48, pp. 1590-1606. 【16】 Stouten, F., Martens, J.-P., 2003. “A feature-based filled pause detection system for Dutch.” In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Virgen Islands, USA, pp. 309–314. 【17】 Wu, C.-H., Yan, G.-L., 2004. “Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition.” In: Journal of VLSI Signal Processing 36, pp. 91-104. 【18】 Wu, C.-H., Yan, G.-L., 2004. “A study on speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system.” Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C., dissertation for doctor of philosophy. 【19】 Quimbo, F.C., Kawahara, T., Doshita, S., 1998. “Prosodic analysis of fillers and self-repair in Japanese speech.” In: Proc. International Conference on Spoken Language Processing, Sydney, Australia, pp. 3313–3316. 【20】 Gabrea, M., O’Shaugnessy, D., 2000. “Detection of filled pauses in spontaneous conversational speech.” In: Proc. International Conference on Spoken Language Processing, Vol. III, Beijing, China, pp. 678–681. 【21】 Goto, M., Itou, K., Hayamizu, S., 1999. “A real-time filled pause detection system for spontaneous speech.” In: Proc. European Conference on Speech Communication and Technology, Vol. I, Budapest, Hungary, pp. 227–230. 【22】 The Department of Linguistics at the Ohio State University, 2004. “Language files -- Materials for an introduction to language and linguistics.” 9th edition. 【23】 Zhao, Y., Jurafsky, D., 2005. “A preliminary study of Mandarin filled pauses.” In: Proc. DISS' 05, Aix-en-Provence, pp. 179-182. 【24】 Wasaw, T., 1997. “Remarks on grammatical weight.” Language Variation and Change, 9, pp.81-105 【25】 Vorstermans, A., Martens, J.-P., Van Coile, B., 1996. “Automatic segmentation and labeling of multi-lingual speech data.” Speech Comm. 19, pp. 271–293. 【26】 http://htk.eng.cam.ac.uk/ 【27】王惟正, “國語語音訊號中發音偏誤類型之自動偵測,” 國立台灣大學電機資訊學院資訊工程學系碩士論文, 2008. 【28】 Jang R., 'Data Clustering and Pattern Recognition,' http://140.114.76.148/jang/books/dcpr/. 【29】 Chang, C.-C., Lin, C.-J., “LIBSVM—a library for support vector machines,” http://www.csie.ntu.edu.tw/~cjlin/libsvm/. 【30】 Akita, Y., Kawahara, T., 2006. “Efficient estimation of language model statistics of spontaneous speech via statistical transformation model.” In: Proc. ICASSP 2006. 【31】 Batliner, A., Kiessling, A., Burger, S., Noth, E., 1995. “Filled pauses in spontaneous speech.” In: Proc. International Congress of Phonetic Sciences, Stockholm, Sweden. 【32】 Ishihara, K., Tsubota, Y., Okuno, H.-G., 2003. “Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure.” In: Proc. Interspeech 2003, pp. 3185-3188. 【33】 Lin, C.-K., Lee, L.-S., 2005. “Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features.” In: Proc. Interspeech 2005, pp. 1621-1624. 【34】 Moniz, H., Mata, A.-I., Viana, M.-C., 2007. “On filled-pause and prolongations in European Portuguese.” In: Proc. Interspeech 2007, pp. 2645-2648. 【35】 Peters, J., May 2003. “LM studies on filled pauses in spontaneous medical dictation.” In: Proc. Human Language Technology conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Edmonton, Canada, pp. 82–84. 【36】 Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N., 2005. “Detection of coughs from user utterances using imitated phoneme model.” In: Proc. Interspeech 2005, pp. 1357-1360. 【37】 Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N., 2004. “Cough detection in spoken dialogue system for home health care.” In: Proc. Interspeech 2004, pp. 1865-1868. 【38】 Truong, K.-P., David A. van Leeuwen., 2005. “Automatic detection of laughter.” In: Proc. Interspeech 2005, pp. 485-488. 【39】 Schramm, H., Aubert, X.L., Meyer, C., Peters, J., 2003. “Filled pause modeling for medical transcriptions.” In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, Japan. 【40】 Swerts, M., Wichmann, A. and Beun, R., 1996. “Filled pauses as markers of discourse structure.” Proc. ICSLP. 【41】 Shriberg, E., and Stolcke, A., 'Prosody modeling for automatic speech recognition and understanding.' In: Proc. Workshop on Mathematical Foundations of Natural Language Modeling, 2002. 【42】 Shriberg, E., and Stolcke, A., Hakkani-Tur, D. and Tur, G., 'Prosody-based automatic segmentation of speech into sentences and topics. ' Speech communication 32(1-2), pp. 127-154, 2000.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40927	-
dc.description.abstract	在今天，朗讀式語音的辨識已經有相當不錯的成果，但對於辨識自發性語音則仍然面對許多難題，其中重要的一個就是自發性語音中存在著的許多不流暢現象，例如：填充式停頓、重述、重新起始、延長、改正等等，而從這些不流暢會衍生出許多問題，造成自動語音辨識系統效能的降低，而其中最常出現的不流暢現象就是填充式停頓，例如國語中的「嗯」、「啊」、「呃」等等。而填充式停頓與流暢語音的不同之處在於填充式停頓在聲學上常為中央發音的母音，在韻律上延長，聲波、時頻譜有變動緩慢平滑的特性，在語言方面則有較高的機率被靜默停頓所緊鄰。本論文便針對上述的這些特性來設計獨立於語音辨識器之外的填充式停頓的偵測技術。首先根據梅爾倒頻譜係數向量變化的劇烈程度來抽取潛在的語段邊界，再依照填充式停頓的特性來抽取各語段的特徵值形成特徵向量，最後以多層感知器為主要分類器配合三種不同策略來對每個語段做分類。我們能夠在CALLHOME語料中面對平衡分佈資料時同時得到約70%以上的召回率以及精確率，但面對真實分佈的資料則只能同時達到約20%，能否將這樣效能的填充式停頓偵測整合在語音辨識器中以改進字詞辨識準確率則尚待檢驗。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-14T17:07:10Z (GMT). No. of bitstreams: 1 ntu-97-R94942122-1.pdf: 7415509 bytes, checksum: 73a3af02ae82982518d45f092c621b63 (MD5) Previous issue date: 2008	en
dc.description.tableofcontents	第1章導論 1 1.1 研究動機 1 1.2 研究現況 1 1.3 主要成果 3 1.4 章節摘要 4 第2章研究背景 5 2.1 國語之填充式停頓 5 2.1.1 國語填充詞及填充式停頓的定義及其功能 5 2.1.2 國語填充式停頓之特性 9 2.1.2.1 國語填充式停頓之聲學特性 9 2.1.2.2 國語填充式停頓之語言特性 13 2.2 使用語料介紹 15 2.3 填充式停頓偵測所使用語音特徵參數、機率模型及效能評估方法 15 2.3.1 高斯混和模型 16 2.3.2 多層感知器 16 2.3.3 系統效能評估 17 2.4 本章結論 18 第3章語段切割及特徵抽取 19 3.1 系統架構簡介 19 3.2 語段切割 20 3.3 語段特徵抽取 26 3.3.1 語段持續時間（1特徵值） 26 3.3.2 語段相對持續時間比率（1特徵值） 26 3.3.3 頻譜穩定度（1特徵值） 28 3.3.4 穩定區間持續時間（8特徵值） 28 3.3.5 語段之前與之後有無靜默式停頓（2特徵值） 30 3.3.6 頻譜重心（1特徵值） 31 3.3.7 相對頻譜重心比率（1特徵值） 31 3.3.8 梅爾倒頻譜係數一階差量方差（1特徵值） 33 3.4 本章總結 34 第4章語段分類及效能評估 42 4.1 語段分類 42 4.2 均分法 44 4.3 高斯混和模型初步篩選法 47 4.4 本章結論 53 第5章結論與展望 55 參考文獻 57
dc.language.iso	zh-TW
dc.subject	多層感知器	zh_TW
dc.subject	自發性語音	zh_TW
dc.subject	不流暢語音	zh_TW
dc.subject	填充式停頓	zh_TW
dc.subject	spontaneous speech	en
dc.subject	multilayer perceptron	en
dc.subject	filled pause	en
dc.subject	disfluent speech	en
dc.title	自發性國語語音中自動偵測填充式停頓之初步研究	zh_TW
dc.title	A Preliminary Study on Automatic Detection of Filled Pause in Spontaneous Mandarin Speech	en
dc.type	Thesis
dc.date.schoolyear	96-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	廖婉君,謝宏昀
dc.subject.keyword	自發性語音,不流暢語音,填充式停頓,多層感知器,	zh_TW
dc.subject.keyword	spontaneous speech,disfluent speech,filled pause,multilayer perceptron,	en
dc.relation.page	62
dc.rights.note	有償授權
dc.date.accepted	2008-07-29
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 未授權公開取用	7.24 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。