Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99278Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 黃漢邦 | zh_TW |
| dc.contributor.advisor | Han-Pang Huang | en |
| dc.contributor.author | 江佳鴻 | zh_TW |
| dc.contributor.author | Chia-Hung Chiang | en |
| dc.date.accessioned | 2025-08-21T17:05:53Z | - |
| dc.date.available | 2025-08-22 | - |
| dc.date.copyright | 2025-08-21 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-04 | - |
| dc.identifier.citation | [1] "Chinese/Taiwanese/Hakka Machine Translation," accessed January 2025. <http://tts001.iptcloud.net:8802/>
[2] "Kaldi Speech Recognition Toolkit.," accessed January 2025. <https://github.com/kaldi-asr/kaldi> [3] "Taiwanese Speech Synthesis.," accessed January 2025. <http://tts001.iptcloud.net:8804/> [4] "Viggle.Ai," accessed June 2025. <https://viggle.ai/home> [5] "世界聽力日衛生福利統計通報," accessed January 2025. <https://dep.mohw.gov.tw/DOS/cp-5112-63001-113.html> [6] "How to Communicate with Individuals Who Are Deaf or Hard of Hearing," accessed January 2025. [7] "Deafness and Hearing Loss," accessed January 2025. <https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss> [8] "Develop Android Apps with Kotlin," accessed January 2025. <https://developer.android.com/kotlin> [9] "React Native. Learn Once, Write Anywhere.," accessed January 2025. <https://reactnative.dev/> [10] "Swift, the Powerful Programming Language That’s Also Easy to Learn.," accessed January 2025. <https://developer.apple.com/swift/> [11] I.A. Adeyanju, O.O. Bello, and M.A. Adegboye, "Machine Learning Methods for Sign Language Recognition: A Critical Review and Analysis," Intelligent Systems with Applications, vol. 12, p. 200056, 2021. [12] T. Ahonen, A. Hadid, and M. Pietikainen, "Face Description with Local Binary Patterns: Application to Face Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, 2006. [13] J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. Binkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan, "Flamingo: A Visual Language Model for Few-Shot Learning," [Online]. Available: https://ui.adsabs.harvard.edu/abs/2022arXiv220414198A, April 01, 2022, 2022. [14] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C.L. Zitnick, and D. Parikh, "Vqa: Visual Question Answering," 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2425-2433, 2015. [15] M. Barreto, C. Victor, C. Hammond, A. Eccles, M.T. Richins, and P. Qualter, "Loneliness around the World: Age, Gender, and Cultural Differences in Loneliness," Personality and Individual Differences, vol. 169, p. 110066, 2021. [16] A.W. Black, H. Zen, and K. Tokuda, "Statistical Parametric Speech Synthesis," 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, vol. 4, pp. IV-1229-IV-1232, 2007. [17] A. Bordes, Y.-L. Boureau, and J. Weston, "Learning End-to-End Goal-Oriented Dialog," p. arXiv:1605.07683doi: 10.48550/arXiv.1605.07683. [18] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language Models Are Few-Shot Learners," Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2020. [19] K.V. Bruss, P. Seth, and G. Zhao, "Loneliness, Lack of Social and Emotional Support, and Mental Health Issues - United States, 2022," MMWR Morb Mortal Wkly Rep, vol. 73, no. 24, pp. 539-545, 2024. [20] A. Bulat and G. Tzimiropoulos, "How Far Are We from Solving the 2d & 3d Face Alignment Problem? (and a Dataset of 230,000 3d Facial Landmarks)," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1021-1030, 2017. [21] Q. Cao, L. Shen, W. Xie, O.M. Parkhi, and A. Zisserman, "Vggface2: A Dataset for Recognising Faces across Pose and Age," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67-74, 2018. [22] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960-4964, 2016. [23] H. Chen, X. Liu, D. Yin, and J. Tang, "A Survey on Dialogue Systems: Recent Advances and New Frontiers," SIGKDD Explor. Newsl., vol. 19, no. 2, pp. 25–35, 2017. [24] Y.C. Chen, "Sign Language Robots and Taiwanese Sign Language Recognition Systems," Master’s thesis, Mechanical Engineering, National Taiwan University, 2024. [25] S. Chiu, M. Li, Y.-T. Lin, and Y.-N. Chen, "Salesbot: Transitioning from Chit-Chat to Task-Oriented Dialogues," p. arXiv:2204.10591doi: 10.48550/arXiv.2204.10591. [26] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation," p. arXiv:1406.1078doi: 10.48550/arXiv.1406.1078. [27] C. Cortes and V. Vapnik, "Support-Vector Networks," Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995. [28] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 1, pp. 886-893 vol. 1, 2005. [29] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding," North American Chapter of the Association for Computational Linguistics, pp. 4171-4186, 2019. [30] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale," p. arXiv:2010.11929doi: 10.48550/arXiv.2010.11929. [31] D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. Osazuwa Ness, and J. Larson, "From Local to Global: A Graph Rag Approach to Query-Focused Summarization," p. arXiv:2404.16130doi: 10.48550/arXiv.2404.16130. [32] P. Ekman, "An Argument for Basic Emotions," Cognition & Emotion, vol. 6, pp. 169-200, 1992. [33] M. Eric and C.D. Manning, "Key-Value Retrieval Networks for Task-Oriented Dialogue," p. arXiv:1705.05414doi: 10.48550/arXiv.1705.05414. [34] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert, "Ragas: Automated Evaluation of Retrieval Augmented Generation," p. arXiv:2309.15217doi: 10.48550/arXiv.2309.15217. [35] Z. Gan, L. Li, C. Li, L. Wang, Z. Liu, and J. Gao, "Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends," p. arXiv:2210.09263doi: 10.48550/arXiv.2210.09263. [36] J. Gao, M. Galley, and L. Li, "Neural Approaches to Conversational Ai," The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, [Online]. Available: https://doi.org/10.1145/3209978.3210183, 2018. [37] J. Gao, M. Galley, and L. Li, "Neural Approaches to Conversational Ai," p. arXiv:1809.08267doi: 10.48550/arXiv.1809.08267. [38] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, and H. Wang, "Retrieval-Augmented Generation for Large Language Models: A Survey," p. arXiv:2312.10997doi: 10.48550/arXiv.2312.10997. [39] I.J. Goodfellow, D. Erhan, P.L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio, "Challenges in Representation Learning: A Report on Three Machine Learning Contests," p. arXiv:1307.0414doi: 10.48550/arXiv.1307.0414. [40] A. Graves, "Sequence Transduction with Recurrent Neural Networks," p. arXiv:1211.3711doi: 10.48550/arXiv.1211.3711. [41] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks," Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania, USA, [Online]. Available: https://doi.org/10.1145/1143844.1143891, 2006. [42] A. Graves and N. Jaitly, "Towards End-to-End Speech Recognition with Recurrent Neural Networks," Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, Beijing, China, 2014. [43] G.K. Gupta, "Ubiquitous Mobile Phones Are Becoming Indispensable," ACM Inroads, vol. 2, no. 2, pp. 32–33, 2011. [44] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, "Realm: Retrieval-Augmented Language Model Pre-Training," Proceedings of the 37th International Conference on Machine Learning, 2020. [45] D.Z. Hakkani-Tür, G. Tür, A. Celikyilmaz, Y.-N. Chen, J. Gao, L. Deng, and Y.-Y. Wang, "Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional Rnn-Lstm," Interspeech, 2016. [46] A.Y. Hannun, C. Case, J. Casper, B. Catanzaro, G.F. Diamos, E. Elsen, R.J. Prenger, S. Satheesh, S. Sengupta, V. Rao, A. Coates, and A. Ng, "Deep Speech: Scaling up End-to-End Speech Recognition," ArXiv, vol. abs/1412.5567, 2014. [47] B. Hasani and M.H. Mahoor, "Facial Expression Recognition Using Enhanced Deep 3d Convolutional Neural Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2278-2288, 2017. [48] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016. [49] A. Henschel, G. Laban, and E.S. Cross, "What Makes a Robot Social? A Review of Social Robots from Science Fiction to a Home or Hospital near You," Current Robotics Reports, vol. 2, no. 1, pp. 9-19, 2021. [50] G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, and B. Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012. [51] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [52] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," p. arXiv:1704.04861doi: 10.48550/arXiv.1704.04861. [53] G. Huang, M. Mattar, T. Berg, and E. Learned-Miller, "Labeled Faces in the Wild: A Database Forstudying Face Recognition in Unconstrained Environments," Tech. rep., 2008. [54] A.J. Hunt and A.W. Black, "Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database," 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1, pp. 373-376 vol. 1, 1996. [55] B. Irfan, S. Kuoppamäki, and G. Skantze, "Recommendations for Designing Conversational Companion Robots with Older Adults through Foundation Models," Frontiers in Robotics and AI, Original Research vol. Volume 11 - 2024, 2024. [56] B. Irfan, A. Ramachandran, S. Spaulding, D.F. Glas, I. Leite, and K.L. Koay, "Personalization in Long-Term Human-Robot Interaction," 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 685-686, 2019. [57] K.K. Kapoor, Y.K. Dwivedi, and M.D. Williams, "Rogers’ Innovation Adoption Attributes: A Systematic Review and Synthesis of Existing Research," Information Systems Management, vol. 31, no. 1, pp. 74-91, 2014. [58] U. Khandelwal, O. Levy, D. Jurafsky, L. Zettlemoyer, and M. Lewis, "Generalization through Memorization: Nearest Neighbor Language Models," p. arXiv:1911.00172doi: 10.48550/arXiv.1911.00172. [59] T. Kojima, S.S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large Language Models Are Zero-Shot Reasoners," Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022. [60] D. Kollias and S. Zafeiriou, "Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and Arcface," p. arXiv:1910.04855doi: 10.48550/arXiv.1910.04855. [61] A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017. [62] A. Lambert, N. Nahal, B. Gerd, and G. and Welch, "A Systematic Review of Ten Years of Research on Human Interaction with Social Robots," International Journal of Human–Computer Interaction, vol. 36, no. 19, pp. 1804-1817, 2020. [63] N.D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A.T. Campbell, "A Survey of Mobile Phone Sensing," IEEE Communications Magazine, vol. 48, no. 9, pp. 140-150, 2010. [64] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive Nlp Tasks," p. arXiv:2005.11401doi: 10.48550/arXiv.2005.11401. [65] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive Nlp Tasks," Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2020. [66] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A Convolutional Neural Network Cascade for Face Detection," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5325-5334, 2015. [67] J. Li, D. Li, C. Xiong, and S. Hoi, "Blip: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation," p. arXiv:2201.12086doi: 10.48550/arXiv.2201.12086. [68] J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky, "Deep Reinforcement Learning for Dialogue Generation," p. arXiv:1606.01541doi: 10.48550/arXiv.1606.01541. [69] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A.C. Berg, "Ssd: Single Shot Multibox Detector," p. arXiv:1512.02325doi: 10.48550/arXiv.1512.02325. [70] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, vol. 60, pp. 91-110, 2004. [71] P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The Extended Cohn-Kanade Dataset (Ck+): A Complete Dataset for Action Unit and Emotion-Specified Expression," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94-101, 2010. [72] X. Ma, X. Zhang, R. Pradeep, and J. Lin, "Zero-Shot Listwise Document Reranking with a Large Language Model," p. arXiv:2305.02156doi: 10.48550/arXiv.2305.02156. [73] I. Malavolta, S. Ruberto, T. Soru, and V. Terragni, "Hybrid Mobile Apps in the Google Play Store: An Exploratory Investigation," 2015 2nd ACM International Conference on Mobile Software Engineering and Systems, pp. 56-59, 2015. [74] N. Marangunić and A. Granić, "Technology Acceptance Model: A Literature Review from 1986 to 2013," Univers. Access Inf. Soc., vol. 14, no. 1, pp. 81–95, 2015. [75] A.G. Michael and C.S. Alan, Human-Robot Interaction: A Survey. now, 2008, p. 1. [76] A. Mollahosseini, B. Hasani, and M.H. Mahoor, "Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild," p. arXiv:1708.03985doi: 10.48550/arXiv.1708.03985. [77] R.K. Moore, "Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction," p. arXiv:1607.05174doi: 10.48550/arXiv.1607.05174. [78] N. Mrkšić, D. Ó Séaghdha, T.-H. Wen, B. Thomson, and S. Young, "Neural Belief Tracker: Data-Driven Dialogue State Tracking," Vancouver, Canada: Association for Computational Linguistics, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1777-1788, 2017. [79] R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman, "Webgpt: Browser-Assisted Question-Answering with Human Feedback," p. arXiv:2112.09332doi: 10.48550/arXiv.2112.09332. [80] C. Nass and Y. Moon, "Machines and Mindlessness: Social Responses to Computers," Journal of Social Issues, vol. 56, no. 1, pp. 81-103, 2000. [81] R. Prenger, R. Valle, and B. Catanzaro, "Waveglow: A Flow-Based Generative Network for Speech Synthesis," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617-3621, 2019. [82] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Prentice-Hall, Inc., 1993. [83] A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, "Learning Transferable Visual Models from Natural Language Supervision," p. arXiv:2103.00020doi: 10.48550/arXiv.2103.00020. [84] A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust Speech Recognition Via Large-Scale Weak Supervision," p. arXiv:2212.04356doi: 10.48550/arXiv.2212.04356. [85] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language Models Are Unsupervised Multitask Learners," 2019. [86] H. Rashkin, E.M. Smith, M. Li, and Y.-L. Boureau, "Towards Empathetic Open-Domain Conversation Models: A New Benchmark and Dataset," p. arXiv:1811.00207doi: 10.48550/arXiv.1811.00207. [87] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," p. arXiv:1506.02640doi: 10.48550/arXiv.1506.02640. [88] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017. [89] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A Unified Embedding for Face Recognition and Clustering," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815-823, 2015. [90] S.C. Sennott, L. Akagi, M. Lee, and A. Rhodes, "Aac and Artificial Intelligence (Ai)," Top Lang Disord, vol. 39, no. 4, pp. 389-403, 2019. [91] J. Shen, R. Pang, R.J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R.A. Saurous, Y. Agiomvrgiannakis, and Y. Wu, "Natural Tts Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779-4783, 2018. [92] K. Shuster, J. Xu, M. Komeili, D. Ju, E.M. Smith, S. Roller, M. Ung, M. Chen, K. Arora, J. Lane, M. Behrooz, W. Ngan, S. Poff, N. Goyal, A. Szlam, Y.-L. Boureau, M. Kambadur, and J. Weston, "Blenderbot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage," p. arXiv:2208.03188doi: 10.48550/arXiv.2208.03188. [93] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," p. arXiv:1409.1556doi: 10.48550/arXiv.1409.1556. [94] A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.-Y. Nie, J. Gao, and B. Dolan, "A Neural Network Approach to Context-Sensitive Generation of Conversational Responses," p. arXiv:1506.06714doi: 10.48550/arXiv.1506.06714. [95] K. Sun, S. Moon, P. Crook, S. Roller, B. Silvert, B. Liu, Z. Wang, H. Liu, E. Cho, and C. Cardie, "Adding Chit-Chat to Enhance Task-Oriented Dialogues," Online: Association for Computational Linguistics, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1570-1583, 2021. [96] I. Sutskever, O. Vinyals, and Q.V. Le, "Sequence to Sequence Learning with Neural Networks," Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, Montreal, Canada, 2014. [97] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the Gap to Human-Level Performance in Face Verification," 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701-1708, 2014. [98] B. Thomas, B. Alessandro, and W. Trevor, Flutter for Beginners: An Introductory Guide to Building Cross-Platform Mobile Applications with Flutter 2.5 and Dart. Packt Publishing, 2021, p. 1. [99] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du, Y. Li, H. Lee, H.S. Zheng, A. Ghafouri, M. Menegali, Y. Huang, M. Krikun, D. Lepikhin, J. Qin, D. Chen, Y. Xu, Z. Chen, A. Roberts, M. Bosma, V. Zhao, Y. Zhou, C.-C. Chang, I. Krivokon, W. Rusch, M. Pickett, P. Srinivasan, L. Man, K. Meier-Hellstern, M. Ringel Morris, T. Doshi, R. Delos Santos, T. Duke, J. Soraker, B. Zevenbergen, V. Prabhakaran, M. Diaz, B. Hutchinson, K. Olson, A. Molina, E. Hoffman-John, J. Lee, L. Aroyo, R. Rajakumar, A. Butryna, M. Lamm, V. Kuzmina, J. Fenton, A. Cohen, R. Bernstein, R. Kurzweil, B. Aguera-Arcas, C. Cui, M. Croak, E. Chi, and Q. Le, "Lamda: Language Models for Dialog Applications," p. arXiv:2201.08239doi: 10.48550/arXiv.2201.08239. [100] J.T. Tsay, James H.-Y. Liu, Shih-kai Chen, Yijun, Taiwan Sign Language Online Dictionary, 4th Edition ed. Chiayi: The Taiwan Center for Sign Linguistics, National Chung Cheng University, Taiwan., 2022. [101] M.A. Turk and A. Pentland, "Eigenfaces for Recognition," Journal of Cognitive Neuroscience, vol. 3, pp. 71-86, 1991. [102] J. Urakami and K. Seaborn, "Nonverbal Cues in Human-Robot Interaction: A Communication Studies Perspective," p. arXiv:2304.11293doi: 10.48550/arXiv.2304.11293. [103] A.N. Vaidyam, H. Wisniewski, J.D. Halamka, M.S. Kashavan, and J.B. Torous, "Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape," Can J Psychiatry, vol. 64, no. 7, pp. 456-464, 2019. [104] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A Generative Model for Raw Audio," p. arXiv:1609.03499doi: 10.48550/arXiv.1609.03499. [105] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," p. arXiv:1706.03762doi: 10.48550/arXiv.1706.03762. [106] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need," Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017. [107] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and Tell: A Neural Image Caption Generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156-3164, 2015. [108] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and Tell: Lessons Learned from the 2015 Mscoco Image Captioning Challenge," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 652–663, 2017. [109] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, pp. I-I, 2001. [110] I. Vlad Serban, R. Lowe, L. Charlin, and J. Pineau, "Generative Deep Neural Networks for Dialogue: A Short Review," p. arXiv:1611.06216doi: 10.48550/arXiv.1611.06216. [111] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, "Residual Attention Network for Image Classification," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450-6458, 2017. [112] L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W.X. Zhao, Z. Wei, and J. Wen, "A Survey on Large Language Model Based Autonomous Agents," Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024. [113] S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N.E. Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "Espnet: End-to-End Speech Processing Toolkit," p. arXiv:1804.00015doi: 10.48550/arXiv.1804.00015. [114] T.-H. Wen, M. Gašić, N. Mrkšić, P.-H. Su, D. Vandyke, and S. Young, "Semantically Conditioned Lstm-Based Natural Language Generation for Spoken Dialogue Systems," Lisbon, Portugal: Association for Computational Linguistics, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1711-1721, 2015. [115] T.-H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L.M. Rojas-Barahona, P.-H. Su, S. Ultes, and S. Young, "A Network-Based End-to-End Trainable Task-Oriented Dialogue System," p. arXiv:1604.04562doi: 10.48550/arXiv.1604.04562. [116] J. West and M. Mace, "Browsing as the Killer App: Explaining the Rapid Success of Apple's Iphone," Telecommunications Policy, vol. 34, no. 5, pp. 270-286, 2010. [117] J. Williams and G. Zweig, "End-to-End Lstm-Based Dialog Control Optimized with Supervised and Reinforcement Learning," ArXiv, vol. abs/1606.01269, 2016. [118] Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, "Sequential Matching Network: A New Architecture for Multi-Turn Response Selection in Retrieval-Based Chatbots," p. arXiv:1612.01627doi: 10.48550/arXiv.1612.01627. [119] L. Xiao, X. Yang, X. Lan, Y. Wang, and C. Xu, "Towards Visual Grounding: A Survey," p. arXiv:2412.20206doi: 10.48550/arXiv.2412.20206. [120] R. Yan, Y. Song, and H. Wu, "Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System," Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy, [Online]. Available: https://doi.org/10.1145/2911451.2911542, 2016. [121] P.L. Yang, "Human-Robot Interaction Framework with Continuous Emotion Recognition for People with Mild Cognitive Impairment," Master Master Thesis, Mechanical Engineering, National Taiwan Unversity, 2022, 2022. [122] S. Young, M. Gašić, B. Thomson, and J.D. Williams, "Pomdp-Based Statistical Spoken Dialog Systems: A Review," Proceedings of the IEEE, vol. 101, no. 5, pp. 1160-1179, 2013. [123] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016. [124] S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston, "Personalizing Dialogue Agents: I Have a Dog, Do You Have Pets Too?," p. arXiv:1801.07243doi: 10.48550/arXiv.1801.07243. [125] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, and B. Dolan, "Dialogpt: Large-Scale Generative Pre-Training for Conversational Response Generation," p. arXiv:1911.00536doi: 10.48550/arXiv.1911.00536. [126] Z. Zhang, R. Takanobu, Q. Zhu, M. Huang, and X. Zhu, "Recent Advances and Challenges in Task-Oriented Dialog Systems," Science China Technological Sciences, vol. 63, no. 10, pp. 2011-2027, 2020. [127] H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, "Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory," p. arXiv:1704.01074doi: 10.48550/arXiv.1704.01074. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99278 | - |
| dc.description.abstract | 聽障與語言障礙者在日常溝通中持續面臨顯著挑戰,同時現代社會對個體情感需求的滿足亦日益受到重視。為應對此雙重需求,本研究旨在設計並實現一套創新的智慧互動框架。此框架以行動應用程式為核心交互介面,致力於改善聽障與語言障礙人士的溝通可及性與有效性,並為一般使用者提供富有同理心的情感支持及便捷的個性化生活輔助,從而全面提升用戶生活品質,積極促進社會包容性與個體福祉。
系統關鍵技術與功能實現如下:整合人臉辨識與情緒辨識技術,以增強人機互動感知與個性化體驗;開發多語言對話系統,透過融合GPT-4o模型及應用提示工程,實現中、英、台語的自然語言理解與生成。此外,本研究開發了一款界面簡潔直觀的跨平台行動應用程式,並針對聽障或無法言語者的特殊需求深度優化,用戶可藉此與具備個人助理及情感陪伴功能的聊天機器人互動。為進一步強化對聽障用戶的溝通輔助,系統集成了文字轉手語影片與語句潤飾功能,並初步具備處理多模態視覺與語言輸入的潛力。經由多樣化情境(包括一般用戶語音互動及聽障用戶文字互動)的驗證,初步結果顯示本系統在溝通輔助方面表現良好,且用戶接受度高。 | zh_TW |
| dc.description.abstract | Individuals with hearing and speech impairments face persistent significant challenges in daily communication, while the fulfillment of individual emotional needs in modern society is also gaining increased attention. Addressing these dual needs, this research designs and implements an innovative intelligent interaction framework. This framework, with a mobile application as its core interactive interface, aims to improve communication accessibility and effectiveness for individuals with hearing and speech impairments, while also providing general users with empathetic emotional support and convenient personalized life assistance, thereby comprehensively enhancing user quality of life and actively promoting social inclusivity and individual well-being.
Key technological and functional implementations are as follows: integration of facial recognition and emotion recognition technologies to enhance human-computer interactional perception and personalization; development of a multilingual dialogue system, achieving natural language understanding and generation in Mandarin, English, and Taiwanese through the fusion of the GPT-4o model and application of prompt engineering. Furthermore, this study developed a cross-platform mobile application with a clean and intuitive interface, deeply optimized for the specific needs of users who are deaf or non-speaking. Users can interact via this application with a chatbot equipped with personal assistant and emotional companionship functions. To further strengthen communication support for users with hearing impairments, the system integrates text-to-sign-language video and text refinement features, and possesses nascent capabilities for processing multimodal visual and linguistic input. Validation through diverse scenarios (including voice interactions for general users and text-based interactions for users with hearing impairments) indicates that the system performs well in communication assistance and has high user acceptance. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T17:05:53Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-21T17:05:53Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 中文口試委員審定書 ii
英文口試委員審定書 iv 誌謝 vi 摘要 viii Abstract x Contents xii List of Tables xvi List of Figures xvii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization of Thesis 4 Chapter 2 Literature Review 7 2.1 Dialogue System 7 2.1.1 Task-Oriented Dialogue 7 2.1.2 Non-Task-Oriented Dialogue 10 2.2 Deep Learning for Computer Vision 12 2.2.1 Object Classification and Detection 13 2.2.2 Face Recognition 14 2.2.3 Emotion Recognition 16 2.2.4 Vision-Language Models 19 2.3 Application Development on Mobile 22 2.3.1 Evolution progress of application 23 2.3.2 The development process of the mobile application 25 2.4 Human-Robot Interaction 26 2.4.1 Human Behaviors 27 2.4.2 The Role of HRI Theory in Companion Robots 29 Chapter 3 Human-Robot Interaction 31 3.1 Overview of the Framework 31 3.2 Multilingual System 33 3.2.1 Speech Recognition 33 3.2.2 Speech Synthesis 40 3.3 Dialogue System 43 3.3.1 Natural Language Understanding 43 3.3.2 Natural Language Generation 54 3.3.3 Multi-modal Integration with VLM 60 3.3.4 Retrieval Augmented Generation 62 3.4 Face Recognition System 69 3.4.1 Face Detection 70 3.4.2 Face Alignment 71 3.4.3 Facial Feature Extraction and Facial Feature Matching 72 3.5 Emotion Recognition System 74 3.5.1 Facial Emotion Recognition 74 3.5.2 Robot Emotional Expression Module 79 3.6 Summary 82 Chapter 4 Interaction with Deaf and Non-Speaking Individuals 83 4.1 System Overview 83 4.2 Design and Development of the Mobile Application 84 4.2.1 User Interface for Human-to-Human Communication 84 4.2.2 User Interface for Human-Chatbot Interaction 87 4.3 Communication Assistance Services 91 4.3.1 Translation Agent 91 4.3.2 Text-to-Sign Language Video Generation 94 4.4 Communication Flow of Interaction System 98 4.4.1 Communication Flow for Human-to-Human Interaction 98 4.4.2 Communication Flow for Human-Chatbot Interaction 100 4.5 Summary 101 Chapter 5 Experiments 103 5.1 Hardware and Software System 103 5.1.1 Andy Chat Robot 103 5.2 Experiment Scenarios and Results 109 5.2.1 Face and Emotion Recognition System 109 5.2.2 Multilingual Dialogue System 111 5.2.3 Interaction with Deaf and Non-Speaking Individuals 117 5.3 Discussion 132 Chapter 6 Conclusions and Future Work 135 6.1 Conclusions 135 6.2 Future Work 137 References 139 Appendix A Questionnaire (English version) 151 Appendix B Questionnaire (Chinese version) 155 Biography 159 | - |
| dc.language.iso | en | - |
| dc.subject | 多語言對話系統 | zh_TW |
| dc.subject | 人機互動 | zh_TW |
| dc.subject | 大型語言模型 | zh_TW |
| dc.subject | 聽障溝通輔助 | zh_TW |
| dc.subject | 手語生成 | zh_TW |
| dc.subject | 情感陪伴 | zh_TW |
| dc.subject | 行動應用程式 | zh_TW |
| dc.subject | Mobile Application | en |
| dc.subject | Hearing Impairment Communication Aid | en |
| dc.subject | Multilingual Dialogue System | en |
| dc.subject | Sign Language Generation | en |
| dc.subject | Emotional Companionship | en |
| dc.subject | Large Language Model | en |
| dc.subject | Human-Robot Interaction | en |
| dc.title | 應用人工智慧與聊天機器人於聾啞人的互動系統開發 | zh_TW |
| dc.title | Development of An Interaction System for Speech and Language Impairment People using AI and Chat Robots | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 郭重顯;林峻永;蔡清元 | zh_TW |
| dc.contributor.oralexamcommittee | Chung-Hsien Kuo;Chun-Yeon Lin;Tsing-Iuan Tsay | en |
| dc.subject.keyword | 人機互動,行動應用程式,多語言對話系統,聽障溝通輔助,手語生成,情感陪伴,大型語言模型, | zh_TW |
| dc.subject.keyword | Human-Robot Interaction,Large Language Model,Mobile Application,Sign Language Generation,Multilingual Dialogue System,Hearing Impairment Communication Aid,Emotional Companionship, | en |
| dc.relation.page | 159 | - |
| dc.identifier.doi | 10.6342/NTU202502992 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-08-07 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 機械工程學系 | - |
| dc.date.embargo-lift | N/A | - |
| Appears in Collections: | 機械工程學系 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-113-2.pdf Restricted Access | 37.88 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
