請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90883完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 黃漢邦 | zh_TW |
| dc.contributor.advisor | Han-Pang Huang | en |
| dc.contributor.author | 吳政彥 | zh_TW |
| dc.contributor.author | Zheng-Yan Wu | en |
| dc.date.accessioned | 2023-10-24T16:09:38Z | - |
| dc.date.available | 2025-08-10 | - |
| dc.date.copyright | 2023-10-24 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-08-10 | - |
| dc.identifier.citation | [1] "World Health Organization. Ageing and Health." https://www.who.int/news-room/fact-sheets/detail/ageing-and-health (accessed June, 2023).
[2] "Espnet: End-to-End Speech Processing Toolkit." https://github.com/espnet/espnet (accessed June, 2023). [3] "Chinese/Taiwanese/Hakka Machine Translation." http://tts001.iptcloud.net:8802/ (accessed June, 2023). [4] "Yolov8." https://github.com/ultralytics/ultralytics (accessed April, 2023). [5] "Machine Learning Is Fun! Part 4: Modern Face Recognition with Deep Learning." https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78 (accessed May, 2023). [6] "Introducing Chatgpt." https://openai.com/blog/chatgpt (accessed June, 2023). [7] "Mit Fab Lab. Robotic Path Planning." https://fab.cba.mit.edu/classes/865.21/topics/path_planning/robotic.html (accessed April, 2023). [8] "Speechrecognition." https://github.com/Uberi/speech_recognition (accessed October, 2022). [9] "Kaldi Speech Recognition Toolkit." https://github.com/kaldi-asr/kaldi (accessed June, 2023). [10] "Realtime Taiwanese/Multi-Lingual Subtitling." http://tts001.iptcloud.net:8806/ (accessed April, 2023). [11] J. E. Ahlskog, Y. E. Geda, N. R. Graff-Radford, and R. C. Petersen, "Physical Exercise as a Preventive or Disease-Modifying Treatment of Dementia and Brain Aging," Mayo clinic proceedings, vol. 86, no. 9: Elsevier, pp. 876-884, 2011. [12] G. Amato, F. Falchi, C. Gennaro, and C. Vairo, "A Comparison of Face Verification with Facial Landmarks and Deep Features," 10th International Conference on Advances in Multimedia (MMEDIA), pp. 1-6, 2018. [13] S. Arora, S. Dalmia, P. Denisov, X. Chang, Y. Ueda, Y. Peng, Y. Zhang, S. Kumar, K. Ganesan, and B. Yan, "Espnet-Slu: Advancing Spoken Language Understanding through Espnet," ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE, pp. 7167-7171, 2022. [14] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer Normalization," arXiv preprint arXiv:1607.06450, 2016. [15] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2481-2495, 2015. [16] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," arXiv preprint arXiv:1409.0473, 2014. [17] V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, "Blazepose: On-Device Real-Time Body Pose Tracking," arXiv preprint arXiv:2006.10204, 2020. [18] V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann, "Blazeface: Sub-Millisecond Neural Face Detection on Mobile Gpus," arXiv preprint arXiv:1907.05047, 2019. [19] J. L. Bentley, "Multidimensional Binary Search Trees Used for Associative Searching," Commun. ACM, vol. 18, pp. 509-517, 1975. [20] E. S. Bogardus, "Social Distance and Its Origins," Journal of Applied Sociology, 1992. [21] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell, "Language Models Are Few-Shot Learners," Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020. [22] A. Bulat and G. Tzimiropoulos, "How Far Are We from Solving the 2d & 3d Face Alignment Problem?(and a Dataset of 230,000 3d Facial Landmarks)," Proceedings of the IEEE international conference on computer vision, pp. 1021-1030, 2017. [23] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, "Orb-Slam3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap Slam," IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874-1890, 2021. [24] P. P. Chakrabarti, S. Ghose, and S. DeSarkar, "Admissibility of Ao∗ When Heuristics Overestimate," Artificial Intelligence, vol. 34, no. 1, pp. 97-113, 1987. [25] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, "Listen, Attend and Spell," arXiv preprint arXiv:1508.01211, 2015. [26] X. Chen, A. Milioto, E. Palazzolo, P. Giguere, J. Behley, and C. Stachniss, "Suma++: Efficient Lidar-Based Semantic Slam," 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): IEEE, pp. 4530-4537, 2019. [27] Y. Cheng, B. Wang, B. Yang, and R. T. Tan, "Monocular 3d Multi-Person Pose Estimation by Integrating Top-Down and Bottom-up Networks," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7649-7659, 2021. [28] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation," arXiv preprint arXiv:1406.1078, 2014. [29] S.-Y. Chung and H.-P. Huang, "Robot Motion Planning in Dynamic Uncertain Environments," Advanced Robotics, vol. 25, no. 6-7, pp. 849-870, 2011. [30] M. Daniel, "A Mathematical Overview of Bresenham Algorithms in the Determination of Active Pixel Positions," International Journal of Innovative Research in Computer and Communication Engineering, vol. Volume 5, 2017. [31] D. DeTone, T. Malisiewicz, and A. Rabinovich, "Superpoint: Self-Supervised Interest Point Detection and Description," Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 224-236, 2018. [32] E. W. Dijkstra, "A Note on Two Problems in Connexion with Graphs," Numerische Mathematik, vol. 1, no. 1, pp. 269-271, 1959. [33] R. Ebendt and R. Drechsler, "Weighted a∗ Search – Unifying View and Application," Artificial Intelligence, vol. 173, no. 14, pp. 1310-1342, 2009. [34] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," kdd, vol. 96, no. 34, pp. 226-231, 1996. [35] T. Fleiner, S. Leucht, H. Foerstl, W. Zijlstra, and P. Haussermann, "Effects of Short-Term Exercise Interventions on Behavioral and Psychological Symptoms in Patients with Dementia: A Systematic Review," Journal of Alzheimer's Disease, vol. 55, no. 4, pp. 1583-1594, 2017. [36] D. Fox, W. Burgard, and S. Thrun, "The Dynamic Window Approach to Collision Avoidance," IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23-33, 1997. [37] J. D. Gammell, S. S. Srinivasa, and T. D. Barfoot, "Informed Rrt: Optimal Sampling-Based Path Planning Focused Via Direct Sampling of an Admissible Ellipsoidal Heuristic," 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems: IEEE, pp. 2997-3004, 2014. [38] D. Gao, J. Shi, S.-P. Chuang, L. P. Garcia, H.-y. Lee, S. Watanabe, and S. Khudanpur, "Euro: Espnet Unsupervised Asr Open-Source Toolkit," ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE, pp. 1-5, 2023. [39] N. Garcia-Casares, R. M. Moreno-Leiva, and J. A. Garcia-Arnes, "Music Therapy as a Non-Pharmacological Treatment in Alzheimer's Disease. A Systematic Review," Revista de Neurologia, vol. 65, no. 12, pp. 529-538, 2017. [40] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014. [41] R. B. Girshick, "Fast R-Cnn," 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440-1448, 2015. [42] A. Graves, "Sequence Transduction with Recurrent Neural Networks," arXiv preprint arXiv:1211.3711, 2012. [43] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks," Proceedings of the 23rd international conference on Machine learning, pp. 369-376, 2006. [44] G. Grisetti, C. Stachniss, and W. Burgard, "Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters," IEEE Transactions on Robotics, vol. 23, no. 1, pp. 34-46, 2007. [45] E. T. Hall, The Hidden Dimension. Anchor, 1966. [46] J.-h. Han, S.-H. Park, D.-H. Lee, K.-W. Noh, and J.-M. Lee, "A Mobile Robot Estimating the Real-Time Moving Sound Sources by Using the Curvature Trajectory," Journal of Institute of Control, Robotics and Systems, vol. 20, 2014. [47] D. Harabor and A. Grastien, "Online Graph Pruning for Pathfinding on Grid Maps," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, no. 1, pp. 1114-1119, 2011. [48] P. E. Hart, N. J. Nilsson, and B. Raphael, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths," IEEE Transactions on Systems Science and Cybernetics, vol. 4, no. 2, pp. 100-107, 1968. [49] T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, and X. Tan, "Espnet-Tts: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit," ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP): IEEE, pp. 7654-7658, 2020. [50] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, "Support Vector Machines," IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, 1998. [51] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017. [52] D.-W. Hugh, R. Nicholas, and A. Pieter, "A Linear Approximation for Graph-Based Simultaneous Localization and Mapping," Robotics: Science and Systems Vii: MIT Press, pp. 41-48, 2012. [53] L. Hung, M. Gregorio, J. Mann, C. Wallsworth, N. Horne, A. Berndt, C. Liu, E. Woldum, A. Au-Yeung, and H. Chaudhury, "Exploring the Perceptions of People with Dementia About the Social Robot Paro in a Hospital Setting," Dementia, vol. 20, no. 2, pp. 485-504, 2021. [54] T. Imai and A. Kishimoto, A Novel Technique for Avoiding Plateaus of Greedy Best-First Search in Satisficing Planning. 2011. [55] H. Inaguma, S. Kiyono, K. Duh, S. Karita, N. E. Y. Soplin, T. Hayashi, and S. Watanabe, "Espnet-St: All-in-One Speech Translation Toolkit," arXiv preprint arXiv:2004.10234, 2020. [56] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, "G 2 O: A General Framework for Graph Optimization," 2011 IEEE International Conference on Robotics and Automation: IEEE, pp. 3607-3613, 2011. [57] R. E. Kalman, "A New Approach to Linear Filtering and Prediction Problems," 1960. [58] S. Karaman and E. Frazzoli, "Sampling-Based Algorithms for Optimal Motion Planning," The international journal of robotics research, vol. 30, no. 7, pp. 846-894, 2011. [59] E. J. Keogh and M. J. Pazzani, "Derivative Dynamic Time Warping, 2001," Irvine, California, 2012. [60] G. Kim and A. Kim, "Scan Context: Egocentric Spatial Descriptor for Place Recognition within 3d Point Cloud Map," 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802-4809, 2018. [61] M. Kim, D. Lee, and K.-Y. Kim, "System Architecture for Real-Time Face Detection on Analog Video Camera," International Journal of Distributed Sensor Networks, vol. 11, no. 5, p. 251386, 2015. [62] N. Koceska, S. Koceski, P. Beomonte Zobel, V. Trajkovik, and N. Garcia, "A Telemedicine Robot System for Assisted and Independent Living," Sensors, vol. 19, no. 4, p. 834, 2019. [63] S. Koenig and M. Likhachev, "D^* Lite," Aaai/iaai, vol. 15, pp. 476-483, 2002. [64] S. Kohlbrecher, O. v. Stryk, J. Meyer, and U. Klingauf, "A Flexible and Scalable Slam System with Full 3d Motion Estimation," 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, pp. 155-160, 2011. [65] R. E. Korf, "Artificial Intelligence Search Algorithms," ed: Citeseer, 1999. [66] J. Kossaifi, G. Tzimiropoulos, S. Todorovic, and M. Pantic, "Afew-Va Database for Valence and Arousal Estimation in-the-Wild," Image and Vision Computing, vol. 65, pp. 23-36, 2017. [67] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. [68] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, "Human-Aware Robot Navigation: A Survey," Robotics and Autonomous Systems, vol. 61, no. 12, pp. 1726-1743, 2013. [69] H. W. Kuhn, "The Hungarian Method for the Assignment Problem," Naval research logistics quarterly, vol. 2, no. 1‐2, pp. 83-97, 1955. [70] J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," 2001. [71] X. Lai, Y. Chen, F. Lu, J. Liu, and J. Jia, "Spherical Transformer for Lidar-Based 3d Recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17545-17555, 2023. [72] S. M. LaValle, "Rapidly-Exploring Random Trees: A New Tool for Path Planning," 1998. [73] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [74] C. Li, J. Shi, W. Zhang, A. S. Subramanian, X. Chang, N. Kamo, M. Hira, T. Hayashi, C. Boeddeker, and Z. Chen, "Espnet-Se: End-to-End Speech Enhancement and Separation Toolkit Designed for Asr Integration," 2021 IEEE Spoken Language Technology Workshop (SLT): IEEE, pp. 785-792, 2021. [75] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A Convolutional Neural Network Cascade for Face Detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5325-5334, 2015. [76] Y. F. Liao, J. S. Tsay, P. Kang, H. L. Khoo, L. K. Tan, L. C. Chang, U. G. Iunn, H. L. Su, T. G. Thiann, H. K. Tiun, and S. L. Liao, "Taiwanese across Taiwan Corpus and Its Applications," 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1-5, 2022. [77] C.-J. Lin and H.-H. Chen, "A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan," International Journal of Computational Linguistics & Chinese Language Processing, Volume 4, Number 1, February 1999, pp. 59-84, 1999. [78] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017. [79] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759-8768, 2018. [80] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single Shot Multibox Detector," Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14: Springer, pp. 21-37, 2016. [81] X. Liu, G. V. Nardari, F. C. Ojeda, Y. Tao, A. Zhou, T. Donnelly, C. Qu, S. W. Chen, R. A. Romero, and C. J. Taylor, "Large-Scale Autonomous Flight with Real-Time Semantic Slam under Dense Forest Canopy," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5512-5519, 2022. [82] G. Livingston, J. Huntley, A. Sommerlad, D. Ames, C. Ballard, S. Banerjee, C. Brayne, A. Burns, J. Cohen-Mansfield, and C. Cooper, "Dementia Prevention, Intervention, and Care: 2020 Report of the Lancet Commission," The Lancet, vol. 396, no. 10248, pp. 413-446, 2020. [83] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. [84] M.-T. Luong, H. Pham, and C. D. Manning, "Effective Approaches to Attention-Based Neural Machine Translation," arXiv preprint arXiv:1508.04025, 2015. [85] H. Lv, G. Yang, H. Zhou, X. Huang, H. Yang, and Z. Pang, "Teleoperation of Collaborative Robot for Remote Dementia Care in Home Environments," IEEE Journal of Translational Engineering in Health and Medicine, vol. 8, pp. 1-10, 2020. [86] P. Maresova and B. Klimova, "Supporting Technologies for Old People with Dementia: A Review," IFAC-PapersOnLine, vol. 48, no. 4, pp. 129-134, 2015. [87] A. H. Maslow, "A Theory of Human Motivation," Psychological review, vol. 50, no. 4, p. 370, 1943. [88] J. Mumm and B. Mutlu, "Human-Robot Proxemics: Physical and Psychological Distancing in Human-Robot Interaction," Proceedings of the 6th international conference on Human-robot interaction, pp. 331-338, 2011. [89] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, "Orb-Slam: A Versatile and Accurate Monocular Slam System," IEEE transactions on robotics, vol. 31, no. 5, pp. 1147-1163, 2015. [90] R. Mur-Artal and J. D. Tardós, "Orb-Slam2: An Open-Source Slam System for Monocular, Stereo, and Rgb-D Cameras," IEEE transactions on robotics, vol. 33, no. 5, pp. 1255-1262, 2017. [91] N. J. Nilsson, Principles of Artificial Intelligence. Springer Science & Business Media, 1982. [92] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A Generative Model for Raw Audio," arXiv preprint arXiv:1609.03499, 2016. [93] OpenAI, "Gpt-4 Technical Report," ArXiv, vol. abs/2303.08774, 2023. [94] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, and A. Ray, "Training Language Models to Follow Instructions with Human Feedback," Advances in Neural Information Processing Systems, vol. 35, pp. 27730-27744, 2022. [95] D. Portugal, P. Alvito, E. Christodoulou, G. Samaras, and J. Dias, "A Study on the Deployment of a Service Robot in an Elderly Care Center," International Journal of Social Robotics, vol. 11, pp. 317-341, 2019. [96] R. Prenger, R. Valle, and B. Catanzaro, "Waveglow: A Flow-Based Generative Network for Speech Synthesis," ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE, pp. 3617-3621, 2019. [97] L. Pu, W. Moyle, and C. Jones, "How People with Dementia Perceive a Therapeutic Robot Called Paro in Relation to Their Pain and Mood: A Qualitative Study," Journal of clinical nursing, vol. 29, no. 3-4, pp. 437-446, 2020. [98] G. Pundak, T. N. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, "Deep Context: End-to-End Contextual Speech Recognition," 2018 IEEE spoken language technology workshop (SLT): IEEE, pp. 418-425, 2018. [99] C. Rösmann, A. Makarow, and T. Bertram, "Online Motion Planning Based on Nonlinear Model Predictive Control with Non-Euclidean Rotation Groups," 2021 European Control Conference (ECC): IEEE, pp. 1583-1590, 2021. [100] C. Rösmann, W. Feiten, T. Wösch, F. Hoffmann, and T. Bertram, "Trajectory Modification Considering Dynamic Constraints of Autonomous Robots," ROBOTIK 2012; 7th German Conference on Robotics: VDE, pp. 1-6, 2012. [101] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989. [102] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding with Unsupervised Learning," 2018. [103] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language Models Are Unsupervised Multitask Learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019. [104] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016. [105] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658-666, 2019. [106] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," ArXiv, vol. abs/1505.04597, 2015. [107] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "Orb: An Efficient Alternative to Sift or Surf," 2011 International Conference on Computer Vision, pp. 2564-2571, 2011. [108] W. B. S. Thrun, and D. Fox, Probabilistic Robotics. MIT Press, 2005. [109] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520, 2018. [110] J. M. Santos, D. Portugal, and R. P. Rocha, "An Evaluation of 2d Slam Techniques Available in Robot Operating System," 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1-6, 2013. [111] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017. [112] T. Shan, B. Englot, C. Ratti, and D. Rus, "Lvi-Sam: Tightly-Coupled Lidar-Visual-Inertial Odometry Via Smoothing and Mapping," 2021 IEEE international conference on robotics and automation (ICRA): IEEE, pp. 5692-5698, 2021. [113] E. Shelhamer, J. Long, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2014. [114] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, and R. Skerrv-Ryan, "Natural Tts Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions," 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP): IEEE, pp. 4779-4783, 2018. [115] J. Shi, S. Guo, T. Qian, N. Huo, T. Hayashi, Y. Wu, F. Xu, X. Chang, H. Li, and P. Wu, "Muskits: An End-to-End Music Processing Toolkit for Singing Voice Synthesis," arXiv preprint arXiv:2205.04029, 2022. [116] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014. [117] D. Sonntag, "Persuasive Ai Technologies for Healthcare Systems," 2016 AAAI Fall Symposium Series, 2016. [118] B. Steux and O. E. Hamzaoui, "Tinyslam: A Slam Algorithm in Less Than 200 Lines C-Language Program," 2010 11th International Conference on Control Automation Robotics & Vision, pp. 1975-1979, 2010. [119] P.-H. Su, P. Budzianowski, S. Ultes, M. Gasic, and S. Young, "Sample-Efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management," arXiv preprint arXiv:1707.00130, 2017. [120] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks," Advances in neural information processing systems, vol. 27, 2014. [121] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective Search for Object Recognition," International journal of computer vision, vol. 104, pp. 154-171, 2013. [122] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in neural information processing systems, vol. 30, 2017. [123] S. Verma, J. Fu, M. Yang, and S. Levine, "Chai: A Chatbot Ai for Task-Oriented Dialogue with Offline Reinforcement Learning," arXiv preprint arXiv:2204.08426, 2022. [124] R. Vincent, B. Limketkai, and M. Eriksen, "Comparison of Indoor Robot Localization Techniques in the Absence of Gps," Detection and sensing of mines, explosive objects, and obscured targets XV, vol. 7664: SPIE, pp. 606-610, 2010. [125] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol. 1: Ieee, pp. I-I, 2001. [126] M. Vlachos, G. Kollios, and D. Gunopulos, "Discovering Similar Multidimensional Trajectories," Proceedings 18th international conference on data engineering: IEEE, pp. 673-684, 2002. [127] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "Scaled-Yolov4: Scaling Cross Stage Partial Network," Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029-13038, 2021. [128] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, "Residual Attention Network for Image Classification," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156-3164, 2017. [129] H. Y. Wang and H. P. Huang, "Interaction System Based on Exercise Assistance and Cognitive Training Game for Older Adults with Mild Cognitive Impairment," Master Thesis, Department of Mechanical Engineering, National Taiwan Unversity, 2022. [130] S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, and N. Chen, "Espnet: End-to-End Speech Processing Toolkit," arXiv preprint arXiv:1804.00015, 2018. [131] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, "Finetuned Language Models Are Zero-Shot Learners," arXiv preprint arXiv:2109.01652, 2021. [132] J. Weizenbaum, "Eliza—a Computer Program for the Study of Natural Language Communication between Man and Machine," Commun. ACM, vol. 9, no. 1, pp. 36–45, 1966. [133] T.-H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P.-H. Su, S. Ultes, and S. Young, "A Network-Based End-to-End Trainable Task-Oriented Dialogue System," arXiv preprint arXiv:1604.04562, 2016. [134] T. Z. Xiang Gao, Yi Liu, Qinrui Yan, 14 Lectures on Visual Slam: From Theory to Practice. Publishing House of Electronics Industry, 2017. [135] Z. Xiao, W. Zhang, T. Wang, C. C. Loy, D. Lin, and J. Pang, "Position-Guided Point Cloud Panoptic Segmentation Transformer," arXiv preprint arXiv:2303.13509, 2023. [136] P. L. Yang and H. P. Huang, "Human–Robot Interaction Framework with Continuous Emotion Recognition for People with Mild Cognitive Impairment," Master Thesis, Department of Mechanical Engineering, National Taiwan Unversity, 2022. [137] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, "Lift: Learned Invariant Feature Transform," Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14: Springer, pp. 467-483, 2016. [138] S. Young, M. Gašić, B. Thomson, and J. D. Williams, "Pomdp-Based Statistical Spoken Dialog Systems: A Review," Proceedings of the IEEE, vol. 101, no. 5, pp. 1160-1179, 2013. [139] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, "Bytetrack: Multi-Object Tracking by Associating Every Detection Box," European Conference on Computer Vision: Springer, pp. 1-21, 2022. [140] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, and B. Dolan, "Dialogpt: Large-Scale Generative Pre-Training for Conversational Response Generation," arXiv preprint arXiv:1911.00536, 2019. [141] Y.-L. Zhao, H.-P. Huang, T.-L. Chen, P.-C. Chiang, Y.-H. Chen, J.-H. Yeh, C.-H. Huang, J.-F. Lin, and W.-T. Weng, "A Smart Sterilization Robot System with Chlorine Dioxide for Spray Disinfection," IEEE Sensors Journal, vol. 21, no. 19, pp. 22047-22057, 2021. [142] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-Iou Loss: Faster and Better Learning for Bounding Box Regression," Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 12993-13000, 2020. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90883 | - |
| dc.description.abstract | 隨著科技不斷進步,機器人已從工業應用工具逐漸普及到大眾生活,提供各種日常生活服務。例如,商場中的機器人可提供室內導覽服務,而在戶外則能提供自動駕駛服務,甚至為高齡長輩提供照顧服務。然而,這些服務都仰賴機器人身上的感測器,包括二維或三維光學雷達、 RGB-D相機、麥克風和喇叭等。透過這些感測器,機器人能感知周圍環境的變化,偵測靜態與動態物體的位置,提供安全的導航,引導人們順利到達目的地,同時還能感知人們的情緒和姿態動作,並透過完善的對話系統提供良好的互動體驗。
本論文提出兩個核心系統,以達成以下目標:在人機互動任務中,藉由整合本實驗室先前開發的情緒辨識和運 動輔助系統,結合人臉辨識系統和本論文提出的多語言對話系統及流行的 ChatGPT,完成任務導向和非任務導向的對話任務;而在導航任務中,我們提出了一個基於語意分割和路徑規劃的移動機器人導航系統,以應對動態環境的挑戰。首先,利用本論文所開發的多語言對話系統獲取使用者的目標點,透過 YOLOv8對周圍環境進行物體辨識和語意分割,將辨識出的物體進行動態和靜態分類,結合深度相機或三維光學雷達獲取物體的深度值,構建點雲並進行分群,以獲得物體的三維邊界框,並追蹤這些動態和靜態點雲,使機器人能避免與周圍環境中的移動物體碰撞。接 下來,我們利用本論文提出的 DJPS路徑規劃算法,能快速且高效地獲取全局最佳路徑。最後,結合 改良過的 時間彈性帶 (ModifiedTimed Elastic Band, Modified TEB) 局部軌跡修正,使輪型移動機器人能順利完成局部軌跡修正,使輪型移動機器人能順利完成具有社會認知的導航任務。 | zh_TW |
| dc.description.abstract | With the continuous advancement of technology, robots have gradually transitioned from industrial tools to ubiquitous entities in people's daily lives, providing various services for everyday activities. For instance, robots in shopping malls can offer indoor navigation services, while outdoors they can provide autonomous driving services, and even offer assistance and care for the elderly. However, these services heavily rely on the sensors equipped on the robots, such as 2D or 3D LiDAR, RGB-D cameras, microphones, and speakers. Through these sensors, robots can perceive changes in their surrounding environment, detect the positions of static and dynamic objects, provide safe navigation, guide individuals to their destinations, and also sense people's emotions and gestures, facilitating enhanced interactive experiences through sophisticated dialogue systems.
This thesis proposes two core systems to achieve the following objectives: in the human-robot interaction task, a combination of the laboratory's previously developed emotion recognition and exercise assistance systems, together with the facial recognition system and the multilingual dialogue system proposed in this thesis, enables the completion of task-oriented and non-task-oriented dialogue tasks. In the navigation task, a mobile robot navigation system is proposed based on semantic segmentation and path planning to address challenges in dynamic environments. Firstly, the multi-language dialogue system developed in this thesis is utilized to obtain the user's desired destination. The YOLOv8 algorithm is employed for object recognition and semantic segmentation of the surrounding environment. The recognized objects are classified into dynamic and static categories. By integrating a depth camera or 3D lidar, depth information of the objects is acquired, generating point clouds. Clustering of these point clouds enables the extraction of three-dimensional bounding boxes for the objects. Subsequently, the dynamic and static point clouds are tracked to avoid collisions with moving objects in the surrounding environment. Furthermore, the proposed DJPS path planning algorithm efficiently obtains the global optimal path. Finally, the combination of Modified Timed Elastic Band (Modified TEB) approach for local trajectory correction ensures the successful completion of socially aware navigation tasks for wheeled mobile robots. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-24T16:09:38Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-10-24T16:09:38Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
摘要 iii Abstract v List of Tables xiii List of Figures xv Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 2 1.3 Organization of Thesis 4 Chapter 2 Literature Review 6 2.1 Dialogue System 6 2.1.1 Task-Oriented Dialogue 6 2.1.2 Non-Task-Oriented Dialogue 8 2.2 Deep Learning for Computer Vision 9 2.2.1 Object Classification and Detection 10 2.2.2 Semantic Segmentation 12 2.3 SLAM 13 2.3.1 Visual SLAM 14 2.3.2 LiDAR SLAM 16 2.3.3 Visual-LiDAR fusion SLAM 19 2.4 Wheeled Mobile System 19 2.4.1 Differential Wheeled Robot Kinematics 20 2.4.2 Map Representation 23 2.5 Human-Robot Interaction 25 2.5.1 Behavioral Cues and Social Signals 26 2.5.2 Human-Robot Proxemics 29 Chapter 3 Human-Robot Interaction 32 3.1 Introduction 32 3.2 Face Recognition System 32 3.2.1 Face Detection 32 3.2.2 Face Alignment 36 3.2.3 Facial Feature Encoding and Model Training 38 3.2.4 Face Matching 39 3.3 Multilingual Dialogue System 39 3.3.1 Speech Recognition 40 3.3.2 Natural Language Understanding 46 3.3.3 Natural Language Generation 59 3.3.4 Speech Synthesis 66 3.4 Emotion Recognition System 69 3.4.1 Spatial Feature Extraction 70 3.4.2 Spatial-Temporal Feature Extraction 74 3.4.3 Hidden Markov Model for Engagement 78 3.4.4 Robot Emotional Expression Module 78 3.5 Exercise Assistance System 83 3.5.1 Feature Extraction 85 3.5.2 Characteristic Angles 91 3.5.3 Method for Grading Dynamic Posture 93 3.6 Summary 96 Chapter 4 Robot Navigation System 97 4.1 Introduction 97 4.2 Moving Object Tracking 98 4.2.1 Object Detection and Segmentation 98 4.2.2 Object Tracking Algorithm 101 4.3 3D Bounding Box Estimation 102 4.3.1 Point Cloud Segmentation 103 4.3.2 Point Cloud Clustering 103 4.3.3 Classification of Dynamic and Static Objects 107 4.4 Mapping 107 4.5 Global Planner 109 4.5.1 Graph-Based Search Algorithms 110 4.5.2 DDAO* Algorithm 119 4.5.3 DJPS Algorithm 122 4.6 Local Planner 129 4.6.1 Dynamic Window Approach 129 4.6.2 Time Elastic Band Approach 132 4.6.3 Modified Time Elastic Band Approach 137 4.7 Summary 140 Chapter 5 Simulations and Experiments 141 5.1 Hardware and Software System 141 5.1.1 Mobi Robot 141 5.1.2 Disinfection Robot 150 5.2 Simulation and Experiment Results 156 5.2.1 Face Recognition System 156 5.2.2 Multilingual Dialogue System 157 5.2.3 Exercise Assistance System 178 5.2.4 Global Planner 207 5.2.5 Voice Navigation System 209 5.2.6 Moving Object Tracking 219 5.2.7 Human-aware Navigation System 221 5.3 Discussion 222 Chapter 6 Conclusions and Future Works 224 6.1 Conclusions 224 6.2 Future Works 225 References 226 | - |
| dc.language.iso | en | - |
| dc.subject | 情緒辨識 | zh_TW |
| dc.subject | 姿態辨識 | zh_TW |
| dc.subject | 多語言對話系統 | zh_TW |
| dc.subject | 語意分割 | zh_TW |
| dc.subject | 路徑規劃 | zh_TW |
| dc.subject | 社會認知導航 | zh_TW |
| dc.subject | 人臉辨識 | zh_TW |
| dc.subject | Socially-aware Navigation | en |
| dc.subject | Face Recognition | en |
| dc.subject | Emotion Recognition | en |
| dc.subject | Posture Recognition | en |
| dc.subject | Multilingual Dialogue System | en |
| dc.subject | Semantic Segmentation | en |
| dc.subject | Path Planning | en |
| dc.title | 具備多語言對話、運動輔助、情緒辨識與人類感知導航系統的社交智慧移動機器人之開發 | zh_TW |
| dc.title | Development of a Socially Intelligent Mobile Robot with Multilingual Dialogue, Exercise Assistance, Emotion Recognition and Human-Aware Navigation Systems | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 林峻永;廖元甫;林其禹 | zh_TW |
| dc.contributor.oralexamcommittee | Chun-Yeon Lin;Yuan-Fu Liao;Chyi-Yeu Lin | en |
| dc.subject.keyword | 人臉辨識,情緒辨識,姿態辨識,多語言對話系統,語意分割,路徑規劃,社會認知導航, | zh_TW |
| dc.subject.keyword | Face Recognition,Emotion Recognition,Posture Recognition,Multilingual Dialogue System,Semantic Segmentation,Path Planning,Socially-aware Navigation, | en |
| dc.relation.page | 236 | - |
| dc.identifier.doi | 10.6342/NTU202302737 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2023-08-11 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 機械工程學系 | - |
| 顯示於系所單位: | 機械工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf 未授權公開取用 | 31.4 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
