Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45131
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor羅仁權(Ren C. Luo)
dc.contributor.authorChien-Chieh Huangen
dc.contributor.author黃健桀zh_TW
dc.date.accessioned2021-06-15T04:05:42Z-
dc.date.available2016-08-22
dc.date.copyright2011-08-22
dc.date.issued2011
dc.date.submitted2011-08-17
dc.identifier.citation[1]P. Ekman, and W. V. Friesen, “Facial Action Coding System: A Technique for the Measurement of Facial Movement,” Consulting Psychologists Press, Palo Alto, CA, 1978.
[2]MPEG-4 Systems, ISO/IEC N2201, 1998.
[3]M. Escher, I. Pandzic, and N. M. Thalmann, “Facial deformations for MPEG-4,” Proc. Computer Animation, pp.56-62, PA, 1998.
[4]S. Kshirsagar, S. Garchery, and N. M. Thalmann, “Feature point based mesh deformation applied to MPEG-4 facial animation,” Proc. Deform, Workshop on Virtual Humans, pp.23–34, 2000.
[5]K. Waters, “A muscle model for animating three-dimensional facial expression,” SIGGRAPH Comput. Graph., vol.21, pp.17–24, 1987.
[6]S. M. Platt, and N. I. Badler, “Animating facial expression,” SIGGRAPH Comput. Graph., vol.15, pp.245–252, 1981.
[7]S. M. Platt, “A structural model of the human face,” Ph.D. dissertation, University of Pennsylvania, USA, 1985.
[8]K. Kähler, J. Haber, and H. P. Seidel, “Geometry-based muscle modeling for facial animation,” Proc. Graphics Interface, pp. 27-36, 2001.
[9]D. Terzopoulos, and K. Waters, “Physically-based facial modeling, analysis, and animation,” J. Visual. Comput. Animation, vol. 1, pp.73–80, 1990.
[10]Y. C. Lee, D. Terzopoulos, and K. Waters, “Constructing physics-based facial models of individuals,” Proc. Graphics Interface’93, 1993.
[11]Y. C. Lee, D. Terzopoulos, and K. Waters, “Realistic modeling for facial animation,” Proc. Ann. Conf. Series, SIGGRAPH, pp. 55–62, 1995.
[12]C. J. Kuo, R. S. Huang, and T. G. Lin, “Synthesizing lateral face from frontal facial image using anthropometric estimation,” Proc. Image Process., vol. 1, pp. 133–136, 1997.
[13]D. DeCarlo, D. Metaxas, and M. Stone, “An anthropometric face model using variational technique,” Proc. SIGGRAPH, pp. 67-74, 1998.
[14]S. J. Gortler, and M. F. Cohen, “Hierarchical and variational geometric modeling with wavelets,” Symp. Interactive 3D Graphics, pp.35–42, 1995.
[15]W. Welch, and A. Witkin, “Variational surface modeling,” Proc. SIGGRAPH, vol. 26, pp.157–166, 1992.
[16]F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin, “Synthesizing realistic facial expressions from photographs,” Proc. SIGGRAPH, pp.75–84, July 1998.
[17]F. Ulgen, “A step toward universal facial animation via volume morphing,” Proc. 6th IEEE Int. Workshop on Robot and Human Commun., pp.358–363, 1997.
[18]B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin, “Making faces,” Proc. SIGGRAPH’98, pp.55–66, July 1998.
[19]M. J. D. Powell, “Radial basis functions for multivariate interpolation: A review,” in Algorithms for Approximation, J. C. Mason and M. G. Cox, Eds. Oxford, U.K.: Oxford University Press, pp. 143-167, 1987.
[20]T. Poggio, and F. Girosi, “A theory of networks for approximation and learning,” A.I. Memo 1140, Mass. Inst. Tech., 1989.
[21]V. Blanz, and T. Vetter, “A morphable model for the synthesis of 3d faces,” Proc. SIGGRAPH, ACM Press,pp. 187-194, 1999.
[22]P. Rubin, T. Bear, and P. Mermelstein, “An articulatory synthesizer for perceptual research,” J. Acoust. Soc. Am., pp.321–328, 1981.
[23]J. Allen, M. S. Hunnicutt, and D. Klatt, From Text to Speech: The MITalk system.Cambridge, U.K.: Cambridge Univ. Press, 1987.
[24]J. P. H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, Eds., Progress in Speech Synthesis.New York: Springer, 1997.
[25]J. P. H. van Santen, “Assignment of segmental duration in text-to-speech synthesis,” Comput. Speech and Language, pp. 95-128, 1994.
[26]L. F. Lamel, J. L. Gauvain, B. Prouts, C. Bouhier, and R. Boesch, “Generation and Synthesis of Broadcast Messages,” Proc. ESCA-NATO Workshop on Applications of Speech Technology, 1993.
[27]A. W. Black, “Perfect synthesis for all of the people all of the time,” IEEE TTS Workshop, 2002.
[28]J. Kominek, and A. W. Black, “CMU ARCTIC databases for speech synthesis,” Lang. Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, 2003 [Online]. Available: http://festvox.org/cmu_arctic/index.html
[29]J. Zhang, “Language generation and speech synthesis in dialogues for language learning,” M.S. thesis, Dept. Elect. Eng., M.I.T., 2004.
[30]K. N. Stevens, and A. S. House, “Speech perception (Acoustic model and linguistic, syntactic, lexical and semantic factors in speech perception and production process),” in Foundations of modern auditory theory, vol. 2, New York, Academic Press, pp. 3-62, 1972.
[31]H. Zen, T. Nose, J. Yamagishi, S. Sako, and K. Tokuda, “The HMMbased speech synthesis system (HTS) version 2.0,” Proc. 6th ISCA Workshop Speech Synth., pp. 294–299, Aug. 2007.
[32]L. R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition,” Proc. IEEE, vol. 77, pp. 257–285, Feb. 1989.
[33]M. A. Przybocki, and A. F. Martin, “NIST speaker recognition evaluation chronicles,” Proc. Odyssey 2004: Speaker Lang. Recognition Workshop, Toledo, Spain, pp. 15–22, Jun. 2004.
[34]Handbook of the International Phonetic Association, International Phonetic Association, Cambridge Univ. Press, 1999.
[35]T. Crowley, An introduction to historical linguistics. Oxford Univ. Press, 1992.
[36]C. G. Fisher, “Confusions among visually perceived consonants,” J. Speech Hearing Res., vol. 11, pp. 796–803, 1968.
[37]T. Chen, “Audiovisual speech processing. Lip reading and lip synchronization,” IEEE Signal Processing Mag., vol. 18, pp. 9–21, Jan. 2001.
[38]H. McGurk, and J. MacDonald, “Hearing lips and seeing voices,” Nature, vol. 264, pp. 746–748, 1976.
[39]E. Owens, and B. Blazek, “Visemes observed by hearing-impaired and normal hearing adult viewers,” J. Speech Hearing Res., vol. 28, pp. 381–393, 1985.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45131-
dc.description.abstract在21世紀,目前世界上智慧型機器人是一個相當重要的產業,有愈來愈多的機構在開發先進的、多功能的智慧型機器人,例如輪型機器人、雙足機器人。隨著老年人口的增加與目前社會的經濟壓力,大多數的父母都需要外出工作,基於這個原因及現象,我們做了一個應用給家中的小孩與老人家們。
人機互動介面在智慧型機器人領域一直是一個重要的技術,而在此我們使用語音與人聲來當做控制方式來與機器人溝通,在這篇論文中包含了兩個主要的部份,一個是頭部模型的建立,另一個是語音處理。
語音與唇形的同步包含了電腦視覺、語音合成、語音辨識等等的技術,我們提出一個方法來達到語音與唇形的同步,利用的是微軟公司所開發的程式,語音應用程式介面 (SAPI) 來當做我們語音合成與辨識的工具。語音動畫包含了兩個部份,語音與唇形畫面。至於語音合成的輸出是從文字轉語音 (TTS) 的程式得來,而唇形畫面是由軟體 (FaceGen Modeller) 所合成的。
藉由輸入三張主要的照片,左側、右側、正面,再經過校正點的校正,我們能夠得到一個與照片人物相近的3D人臉模形。使用C#當中的語法來連接唇形畫面與對應的視素 (viseme) 關係,依照視素的排序關係來匯入對應的唇形畫面。
目前語音合成的主要應用大多是當做輔助工具,例如說,當做視覺有障礙的人的螢幕閱讀器,幫助他們閱讀。或是,一個無法說話的人,可以藉由語音合成來與其它人溝通。而到了近幾年,語音合成被廣泛的應用在服務型機器人與娛樂型的產品,比如語言的學習、教育方面的功能、影音遊戲方面、動畫方面、音樂方面。
最後,我們建立了一個快速的方法來產生3D頭部模型,並同時讓他與語音同步。這個應用可以使用在教育小孩英語閱讀與英語聽力,對於一些特定的人們,比如聾啞人士,可以利用這個程式來當做溝通的工具。
zh_TW
dc.description.abstractIn 21st century, the intelligent robotics becomes one of the most essential industries all over the world. There are many intelligent robotics institutions develop modern and multi-functional robots in many types, for example, wheel robot, and biped robot. With the growing of elders and economic pressure of present society, most of the parents both have to work for their family. Because of this phenomenon, we made an application for the children and the elders.
Human-robot interaction (HRI) is an important technology in intelligent robotics field. In this thesis, we use sound and voice as commands to communicate with robots. It consists of two major parts, namely, head modeling and speech processing.
Synchronization between speech and mouth shape includes technologies, such as computer vision, speech synthesis, and speech recognition. We present a method to synchronize the lip movement and the speech, and we use Microsoft’s Speech Application Programming Interface (SAPI) as the speech synthesis and recognition tool. Speech animation includes two components, the speech and the image. Speech synthesis output is obtained from Text-to-Speech (TTS), and the images of visemes are generated from software, FaceGen Modeller.
Import three key pictures to this software to calibrate and generate the face model. The viseme event handler in C# will connect the image of mouth shape and viseme together. Load the images sequentially and the visemes will one by one match with the images correctly.
The main applications of speech synthesis are used as assistive devices, e.g. the use of screen readers for people with visual impairment. A mute person can take advantage of this technology to talk to others. In recent years, speech synthesis is extensively applied in service robotics and entertainment productions such as language learning, education, video games, animations, and music videos.
Finally, we build a quick method to make a 3D head model and synchronize it with speech. This application can be used to educate children English reading and listening. For some specific people, like mute people and deaf people, this application can be used as a communication tool.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T04:05:42Z (GMT). No. of bitstreams: 1
ntu-100-R97921047-1.pdf: 3122323 bytes, checksum: f8bef24bcb40a554952fe6d20059494e (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents誌謝 I
中文摘要 III
ABSTRACT IV
TABLE OF CONTENTS VI
LIST OF FIGURES VIII
LIST OF TABLES IX
CHAPTER 1 INTRODUCTION 1
1.1 ROBOT GENERATION 1
1.2 HUMAN-ROBOT INTERACTION 1
1.3 SPEECH ANIMATION 2
1.4 APPLICATIONS 4
1.5 ORGANIZATION 4
CHAPTER 2 PREVIOUS AND RELATED WORK 6
2.1 LIP SYNCHRONIZATION APPROACHES 6
2.2 FACIAL ACTION CODING SYSTEM (FACS) 6
2.3 MPEG-4 FACIAL ANIMATION 10
2.4 VISUAL SPEECH ANIMATION 13
CHAPTER 3 HUMAN HEAD MODELING 15
3.1 HEAD MODELING TECHNIQUES 15
3.1.1 Laser Scan Method 15
3.1.2 Photographic Method 16
3.2 PHYSICS-BASED MUSCLE MODELING 17
3.2.1 Vector Muscle 17
3.2.2 Spring Mesh muscle 18
3.2.3 Layered Spring Mesh Muscle 18
3.3 3D FACE MODELING 19
3.3.1 Anthropometry 19
3.3.2 Person-Specific Model Creation 20
CHAPTER 4 SPEECH SIGNAL PROCESSING 22
4.1 SPEECH SYNTHESIS (SS) 22
4.1.1 Text-to-Speech (TTS) 23
4.1.2 Synthesizer Technology 24
4.1.3 Concatenative Synthesis 25
4.1.4 Formant Synthesis 27
4.1.5 Articulatory Synthesis 27
4.1.6 HMM-based Synthesis 28
4.2 SPEECH RECOGNITION (SR) 28
4.2.1 Algorithms 29
4.2.2 Hidden Markov Model 30
4.2.3 Performance Criterion 33
4.2.4 Applications 34
4.3 MICROSOFT SPEECH APPLICATION PROGRAMMING INTERFACE (SAPI) 35
4.3.1 Basic architecture 37
4.3.2 SAPI version 5 38
4.3.3 SAPI 5.1 and SAPI 5.3 39
CHAPTER 5 LIP SYNCHRONIZATION 40
5.1 PHONEME 40
5.2 COARTICULATION 41
5.3 VISEME 42
5.4 MCGURK EFFECT 43
5.5 PHONEMES AND VISEMES ASSIGNMENT 44
CHAPTER 6 SCENARIO APPLICATIONS 45
6.1 USUAL TALKS 45
6.2 A PRESCRIPTION 47
6.3 FEELING QUEASY 49
CHAPTER 7 RESULTS OF LIP-SYNC SPEECH ANIMATION 51
7.1 SPEECH SYNTHESIS AND SPEECH RECOGNITION 51
7.2 3D HEAD MODEL 54
7.3 LIP SYNCHRONIZATION ANIMATION 55
7.3.1 Facial Expressions 57
CHAPTER 8 CONCLUSIONS AND CONTRIBUTIONS 60
8.1 CONCLUSIONS 60
8.2 CONTRIBUTIONS 62
CHAPTER 9 FUTURE WORKS 64
REFERENCES 65
VITA 68
dc.language.isoen
dc.subject臉部動畫zh_TW
dc.subject唇形同步zh_TW
dc.subject語音辨識zh_TW
dc.subject語音合成zh_TW
dc.subject3D頭部模型zh_TW
dc.subjectfacial animationen
dc.subjectlip synchronizationen
dc.subjectspeech recognitionen
dc.subjectspeech synthesisen
dc.subject3D head modelen
dc.title3D人臉動畫模型建立及唇形語音同步在人機互動系統之應用zh_TW
dc.title3D Facial Modeling and Animation with Speech / Lip Synchronization for Human-Robot Interactionsen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.oralexamcommittee馮蟻剛(I-Kong Fong),鄒杰烔(Jie-Tong Zou)
dc.subject.keyword唇形同步,語音辨識,語音合成,3D頭部模型,臉部動畫,zh_TW
dc.subject.keywordlip synchronization,speech recognition,speech synthesis,3D head model,facial animation,en
dc.relation.page69
dc.rights.note有償授權
dc.date.accepted2011-08-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  未授權公開取用
3.05 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved