3D人臉動畫模型建立及唇形語音同步在人機互動系統之應用

Chien-Chieh Huang; 黃健桀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45131

標題:	3D人臉動畫模型建立及唇形語音同步在人機互動系統之應用 3D Facial Modeling and Animation with Speech / Lip Synchronization for Human-Robot Interactions
作者:	Chien-Chieh Huang 黃健桀
指導教授:	羅仁權(Ren C. Luo)
關鍵字:	唇形同步,語音辨識,語音合成,3D頭部模型,臉部動畫, lip synchronization,speech recognition,speech synthesis,3D head model,facial animation,
出版年 :	2011
學位:	碩士
摘要:	在21世紀，目前世界上智慧型機器人是一個相當重要的產業，有愈來愈多的機構在開發先進的、多功能的智慧型機器人，例如輪型機器人、雙足機器人。隨著老年人口的增加與目前社會的經濟壓力，大多數的父母都需要外出工作，基於這個原因及現象，我們做了一個應用給家中的小孩與老人家們。人機互動介面在智慧型機器人領域一直是一個重要的技術，而在此我們使用語音與人聲來當做控制方式來與機器人溝通，在這篇論文中包含了兩個主要的部份，一個是頭部模型的建立，另一個是語音處理。語音與唇形的同步包含了電腦視覺、語音合成、語音辨識等等的技術，我們提出一個方法來達到語音與唇形的同步，利用的是微軟公司所開發的程式，語音應用程式介面 (SAPI) 來當做我們語音合成與辨識的工具。語音動畫包含了兩個部份，語音與唇形畫面。至於語音合成的輸出是從文字轉語音 (TTS) 的程式得來，而唇形畫面是由軟體 (FaceGen Modeller) 所合成的。藉由輸入三張主要的照片，左側、右側、正面，再經過校正點的校正，我們能夠得到一個與照片人物相近的3D人臉模形。使用C#當中的語法來連接唇形畫面與對應的視素 (viseme) 關係，依照視素的排序關係來匯入對應的唇形畫面。目前語音合成的主要應用大多是當做輔助工具，例如說，當做視覺有障礙的人的螢幕閱讀器，幫助他們閱讀。或是，一個無法說話的人，可以藉由語音合成來與其它人溝通。而到了近幾年，語音合成被廣泛的應用在服務型機器人與娛樂型的產品，比如語言的學習、教育方面的功能、影音遊戲方面、動畫方面、音樂方面。最後，我們建立了一個快速的方法來產生3D頭部模型，並同時讓他與語音同步。這個應用可以使用在教育小孩英語閱讀與英語聽力，對於一些特定的人們，比如聾啞人士，可以利用這個程式來當做溝通的工具。 In 21st century, the intelligent robotics becomes one of the most essential industries all over the world. There are many intelligent robotics institutions develop modern and multi-functional robots in many types, for example, wheel robot, and biped robot. With the growing of elders and economic pressure of present society, most of the parents both have to work for their family. Because of this phenomenon, we made an application for the children and the elders. Human-robot interaction (HRI) is an important technology in intelligent robotics field. In this thesis, we use sound and voice as commands to communicate with robots. It consists of two major parts, namely, head modeling and speech processing. Synchronization between speech and mouth shape includes technologies, such as computer vision, speech synthesis, and speech recognition. We present a method to synchronize the lip movement and the speech, and we use Microsoft’s Speech Application Programming Interface (SAPI) as the speech synthesis and recognition tool. Speech animation includes two components, the speech and the image. Speech synthesis output is obtained from Text-to-Speech (TTS), and the images of visemes are generated from software, FaceGen Modeller. Import three key pictures to this software to calibrate and generate the face model. The viseme event handler in C# will connect the image of mouth shape and viseme together. Load the images sequentially and the visemes will one by one match with the images correctly. The main applications of speech synthesis are used as assistive devices, e.g. the use of screen readers for people with visual impairment. A mute person can take advantage of this technology to talk to others. In recent years, speech synthesis is extensively applied in service robotics and entertainment productions such as language learning, education, video games, animations, and music videos. Finally, we build a quick method to make a 3D head model and synchronize it with speech. This application can be used to educate children English reading and listening. For some specific people, like mute people and deaf people, this application can be used as a communication tool.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45131
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	3.05 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。