語音驅動之3維人臉動畫

Jun-Ze Huang; 黃鈞澤

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32717

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	陳炳宇(Bing-Yu Chen)
dc.contributor.author	Jun-Ze Huang	en
dc.contributor.author	黃鈞澤	zh_TW
dc.date.accessioned	2021-06-13T04:14:04Z	-
dc.date.available	2006-07-26
dc.date.copyright	2006-07-26
dc.date.issued	2006
dc.date.submitted	2006-07-24
dc.identifier.citation	[Alexa 02] Alexa , M. 2002. Linear combination of transformations. ACM Transactions on Graphics 21,3(July),380-387 [Black 92] Black, M. J. Robust Incremental Optical Flow. PhD thesis, Yale University, 1992. [Bishop 95] Bishop, C. M. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford. [Bregler 97] Christoph Bregler, Malcolm Slaney, Michele Covell. Video Rewrite: Driving visual speech with audio. SIGGRAPH 1997. [Chai 03] Jin-Xiang Chai, Jing Xiao, Jessica Hodgins, 'Vision-based Control of 3D Facial Animation,' In the proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer animation [Cormen 89] Cormen , T. H., Leiserson, C. E., and Rivest, R. L. 1989. Introduction to Algorithms. The MIT Press and McGraw-Hill Book Company. [Ezzat 96] Tony Ezzat and Tomaso Poggio, 'Facial Analysis and Synthesis Using Image-Based Models, ' Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, Vermont, October 1996 [Ezzat 97] Tony Ezzat and Tomaso Poggio, 'Videorealistic Talking Faces : A Morphing Approach, ' Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece, September 1997 [Ezzat 99] T. Ezzat and T. Poggio. 'Visual speech synthesis by morphing visemes, ' In K. A. Publishers, editor, International Journal of Computer Vision, volume 38, pages 45--57, 2000. [Ezzat 02] T. Ezzat, G. Geiger, T. Poggio. 'Trainable videorealistic speech animation. ' ACM Trans. Graphics (also in Proc. SIGGRAPH’02) 21(3): 388 –398, 2002. [Hamlaoui 05] Soumya Hamlaoui, Frank Davoine. Facial Action Tracking Using An AAM-Based Condensation Approach. ICASSP 2005 [Madsen 04] Madsen, Kl, Nielsen, H., and Tingleff, O. 2004. Methods for nonlinear least squares problems. Tech. rep., Informatics and Mathematical Modeling , Technical University of Denmark [Roweis 98] Roweis, S. 1998. EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems. The MIT Press, M.I Jordan, M.J.Kearns, and S.A.Solla, Eds.,vol.10 [Shoemake 92] Shoemake, K., and Duff, T. 1992. Matrix animation and polar decomposition. In Proceedings of Graphics Interface 92, 259-264. [Sphinx] http://cmusphinx.sourceforge.net/sphinx2/ [sphinx2] http://www.speech.cs.cmu.edu/tools/lmtool.html [Sumner 04] SUMNER, R. W., POPOVI´C, J. Deformation Transfer for Triangle Meshes. SIGGRAPH 2004. [Sumner 05] SUMNER, R. W., ZWICKER, M., GOTSMAN, C., AND POPOVI´C, J. 2005.Mesh-based inverse kinematics. ACM Transactions on Graphics 24, 3(Aug.), 488–495. [Vlasic 05] Daniel Vlasic, Matthew Brand, Hanspeter Pfister, Jovan Popović, 'Face Transfer with Multilinear Models, ' ACM Transactions on Graphics 24(3), 2005 [Wolberg 90] Wolberg, G. 1990. Digital Image Warping. IEEE Computer Society Press, Los Alamitos, CA. [Zhang 04] Li Zhang and Noah Snavely and Brian Curless and Steven M. Seitz, Spacetime Faces: High-Resolution Capture for Modeling and Animation. ACM Annual Conference on Computer Graphics. 2004(August), 548-558.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32717	-
dc.description.abstract	要創造一段講特定語音的3維人臉動畫是很困難的。即使是對一個專業的動畫師而言, 也要花費許多的時間。我們的研究提供了一個語音驅動的3維人臉動畫系統，可以讓使用者輕易地產生人臉動畫。使用者只要給一段語音當成輸入，我們的系統就會輸出一段講輸入語音的3維人臉動畫我們的系統主要分為三個部份: 第一個是MMM (multidimensional morphable model).MMM主要是利用機械學習的方式，從訓練影片所建立的模型。我們使用MMM來產生對應輸入語音的真實語音影片。第二個部份是臉部追蹤。臉部追蹤可以從合成的語音影片中找出在人臉上的特徵點所在位置。第三個部份是Mesh-IK(mesh based inverse kinematics).Mesh-IK以特徵點的移動當成指導方針來變形3維人臉模型，並且使得產生的模型相似於對應的語音影片的影格。所以我們可以輸出一段3維的人臉動畫。臉部追蹤和Mesh-IK也可把真實的語音影片或表情影片當成輸入，然後產生對應的語音或是表情的人臉動畫。	zh_TW
dc.description.abstract	It is often difficult to animate a face model speaking a specific speech. Even for professional animators, it will take a lot of time. Our work provides a speech-driven 3D facial animation system which allows the user to easily generate facial animations. The user only needs to give a speech as the input. The output will be a 3D facial animation relative to the input speech. Our work can be divided into three sub-systems: One is the MMM (multidimensional morphable model). MMM is build from the pre-recorded training video using machine learning techniques. We can use MMM to generate realistic speech video respect to the input speech. The second part is Facial Tracking. Facial Tracking can extract the feature points of a human subject in the synthetic speech video. The third part is Mesh-IK (mesh based inverse kinematics). Mesh-IK takes the motion of feature points as the guide line to deform 3D face models, and makes the result model have the same looking in the corresponding frame of the speech video. Thus we can have a 3D facial animation as the output. Facing Tracking and Mesh-IK can also take a real speech video or even a real expression video as the input, and produce the corresponding facial animations.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T04:14:04Z (GMT). No. of bitstreams: 1 ntu-95-R93725012-1.pdf: 2544573 bytes, checksum: 197e308880e65569cfdb59e5f67dada1 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	1. Introduction 9 2. Related Work 11 3. System Overview 17 4. MMM 19 4.1 Corpus Recording 19 4.2 Pre-Processing 19 4.3 Building a MMM 21 4.3.1 PCA 22 4.3.2 K-means Clustering 22 4.3.3 Dijkstra 24 4.4 MMM Synthesis 25 4.5 Analysis 26 4.6 Trajectory Synthesis 29 4.7 Post-Processing 31 5 Facial Tracking 35 6 MeshIK 41 6.1 Feature Vectors 41 6.2 Linear Feature Space 45 6.3 Nonlinear Feature Space 45 7. Result 49 7.2 Synthetic speech video driven facial animation2 51 7.3 Real speech video driven facial animation 52 7.4 Real expression video driven facial animation 54 8. Conclusion & Future Work 57 8.1 Conclusion 57 8.2 Future Work 57 9. Reference 58
dc.language.iso	en
dc.subject	追蹤	zh_TW
dc.subject	臉部動畫	zh_TW
dc.subject	語音	zh_TW
dc.subject	tracking	en
dc.subject	facial animation	en
dc.subject	speech	en
dc.title	語音驅動之3維人臉動畫	zh_TW
dc.title	Speech-Driven 3D Facial Animation	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.coadvisor	莊永裕(Yung-Yu Chuang)
dc.contributor.oralexamcommittee	林文杰(Wen-Chieh Lin),林奕成(I-Chen Lin)
dc.subject.keyword	語音,臉部動畫,追蹤,	zh_TW
dc.subject.keyword	speech,facial animation,tracking,	en
dc.relation.page	60
dc.rights.note	有償授權
dc.date.accepted	2006-07-25
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
Appears in Collections:	資訊管理學系

Files in This Item:

File	Size	Format
ntu-95-1.pdf Restricted Access	2.48 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets