Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101156
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor黃明蕙zh_TW
dc.contributor.advisorMing-Hui Huangen
dc.contributor.author王彥碩zh_TW
dc.contributor.authorYan-Shuo Wangen
dc.date.accessioned2025-12-31T16:08:50Z-
dc.date.available2026-01-01-
dc.date.copyright2025-12-31-
dc.date.issued2025-
dc.date.submitted2025-12-18-
dc.identifier.citationBrown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., & Hesse, C. (2020). Language Models are Few-Shot Learners. Advances in neural information processing systems, 33, 1877-1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Chen, Y., & Prentice, C. (2024). Integrating Artificial Intelligence and Customer Experience. Australasian Marketing Journal. https://doi.org/10.1177/14413582241252904
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 4299–4307.
Fu, D., He, K., Wang, Y., Hong, W., Gongque, Z., Zeng, W., Wang, W., Wang, J., Cai, X., & Xu, W. (2025). Agentrefine: Enhancing agent generalization through refinement tuning. arXiv preprint arXiv:2501.01702.
Furniturewala, S., Jandial, S., Java, A., Banerjee, P., Shahid, S., Bhatia, S., & Jaidka, K. (2024). “Thinking” Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)(Association for Computational Linguistics), 213-227. https://doi.org/10.18653/v1/2024.emnlp-main.13 (Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)
Ho, D. H., & Fan, C. (2025). Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning. arXiv preprint arXiv:2506.16064.
Huang, A., Block, A., Foster, D. J., Rohatgi, D., Zhang, C., Simchowitz, M., Ash, J. T., & Krishnamurthy, A. (2025). Self-Improvement in Language Models: The Sharpening Mechanism The Thirteenth International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=WJaUkwci9o
Huang, M.-H., & Rust, R. T. (2018). Artificial intelligence in service. Journal of service research, 21(2), 155-172.
Huang, M.-H., & Rust, R. T. (2024). Automating Creativity. https://arxiv.org/abs/2405.06915
Huang, M.-H., & Rust, R. T. (2024). The Caring Machine: Feeling AI for Customer Care. Journal of Marketing, 88(5), 1-23. https://doi.org/10.1177/00222429231224748
Jiang, D., Zhang, J., Weller, O., Weir, N., Van Durme, B., & Khashabi, D. (2025). Self-[in] correct: Llms struggle with discriminating self-generated responses Proceedings of the AAAI Conference on Artificial Intelligence,
Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., Zhou, X., Wang, E., & Dong, X. (2024). Better Zero-Shot Reasoning with Role-Play Prompting. In Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), 4099-4113. https://doi.org/10.18653/v1/2024.naacl-long.228 (Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))
Mendonça, J., Lavie, A., & Trancoso, I. (2024, August). On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation The 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024), Bangkok, Thailand. https://aclanthology.org/2024.nlp4convai-1.1/
Mizrahi, M., Kaplan, G., Malkin, D., Dror, R., Shahaf, D., & Stanovsky, G. (2024). State of What Art? A Call for Multi-Prompt LLM Evaluation. Transactions of the Association for Computational Linguistics, 12, 933-949. https://doi.org/10.1162/tacl_a_00681
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, 27730–27744.
Pan, L., Saxon, M., Xu, W., Nathani, D., Wang, X., & Wang, W. Y. (2024). Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies. Transactions of the Association for Computational Linguistics, 12, 484-506. https://doi.org/10.1162/tacl_a_00660
Shawar, B. A., & Atwell, E. (2007). Chatbots: are they really useful? Journal for Language Technology and Computational Linguistics, 22(1), 29-49.
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. F. (2020). Learning to summarize with human feedback. Advances in neural information processing systems, 33, 3008-3021.
Tueanrat, Y., Papagiannidis, S., & Alamanos, E. (2021). Going on a Journey: A Review of the Customer Journey Literature. Journal of Business Research, 125, 336-353. https://doi.org/10.1016/j.jbusres.2020.12.028
Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv preprint arXiv:1506.05869.
Wei, J., Kim, S., Jung, H., & Kim, Y.-H. (2024). Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data. Proc. ACM Hum.-Comput. Interact., 8(CSCW1), Article 87. https://doi.org/10.1145/3637364
Wulf, J., & Meierhofer, J. (2024a). Exploring the potential of large language models for automation in technical customer service. arXiv preprint arXiv:2405.09161.
Wulf, J., & Meierhofer, J. (2024b). Utilizing Large Language Models for Automating Technical Customer Support. arXiv preprint arXiv:2406.01407. https://doi.org/10.48550/arXiv.2406.01407
Xu, Y., Shieh, C.-H., & van Esch, P. (2020). AI Customer Service: Task Complexity, Problem-Solving Ability, and Usage Intention. Australasian Marketing Journal (AMJ). https://doi.org/10.1016/j.ausmj.2020.03.005
Zhang, Z., Peng, L., Pang, T., Han, J., Zhao, H., & Schuller, B. W. (2023). Refashioning Emotion Recognition Modeling: The Advent of Generalized Large Models. arXiv preprint arXiv:2308.11578. https://arxiv.org/abs/2308.11578
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Advances in neural information processing systems, https://proceedings.neurips.cc/paper_files/paper/2023/hash/91f18a1287b398d378ef22505bf41832-Abstract-Datasets_and_Benchmarks.html
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101156-
dc.description.abstract近年來,大型語言模型 (LLMs) 已經被大量整合至客服領域。然而,LLM 在應用於現實客服場景時仍會面臨許多問題,包括回應品質不一致、情感表達平淡,以及難以理解多階段互動中的複雜語境。本研究提出一個應用於四階段客戶服務旅程,的推論時、無需微調的獎勵工程框架上,以模擬現實世界的實務問題。我們透過不同的實驗條件(基線、閾值、關鍵評估者與外部評估者)處理了 7 則富含情感的客戶抱怨推文,研究發現:(1) 實務的客戶服務情境需要建立穩健的品質底線 (quality floor);(2) 關鍵評估者的效率最低,消耗最多的 Token 且平均迭代次數最高(3.07 次,相較於其他條件約 1.5 次);(3) 雖然外部評估者表現最佳(成功率 98.95%),但在預算有限的情況下,閾值條件 (Threshold condition) 是最推薦的選擇。本研究提供了一個探索性研究的範例,旨在測試獎勵工程方法是否能提升回應效率,並發現不同的基礎模型在給定相同的評估標準 (rubric) 下,實際上能彼此達成共識。結果顯示,當客戶服務回應必須達到高標準時,迭代優化機制是不可或缺的。此外,在應用此框架時,「以 LLM 為評審 (LLM-as-a-Judge)」扮演了穩健的評估角色。zh_TW
dc.description.abstractRecently, Large Language Models (LLMs) have been widely integrated into the customer service field. However, LLMs still face obstacles when applied to real-world scenarios. Issues include inconsistent responses, emotional flatness, and a failure to understand the complex context of multi-stage interactions. This study proposes an inference-time, fine-tuning-free reward engineering framework applied to the four-stage customer care journey to simulate real-world practical problems. We processed 7 emotionally rich tweets through different experimental conditions (Baseline, Threshold, Critical Evaluator, and External Evaluator) and found that: (1) A robust quality floor is needed for practical customer service scenarios. (2) The Critical Evaluator was the least efficient, requiring the most tokens and having the highest average iteration counts (3.07 iterations, compared to the others at around 1.5). (3) While the External Evaluator achieved the best performance (98.95% success rate), the Threshold condition is the most recommended under a tight budget. This study provides an example of exploratory research to test whether the reward engineering approach optimizes response efficiency, and finds that different base models can actually reach a consensus with each other given the same evaluation rubric. The results show that an iterative refinement mechanism is essential when the customer care responses must meet a high standard. Furthermore, LLM-as-a-Judge serves as a robust evaluation role when applying this framework.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-12-31T16:08:50Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-12-31T16:08:50Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 i
摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vii
LIST OF TABLES viii
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Questions 3
1.3 Research Objectives 3
1.4 Research Hypothesis 4
1.5 Thesis Organization 5
Chapter 2 Literature Review 7
2.1 Introduction to AI in Customer Services 7
2.1.1 The Multi-Stage Customer Care Journey 9
2.2 Challenges in Real-World Customer Service 11
2.3 Structured Approaches to In-Context Control 12
2.3.1 Structured Prompt Design and Role-Play Prompting 13
2.3.2 Iterative Prompt Design for Refinement 14
2.4 LLM Optimization: From Self-Refinement to Collaborative AI Agent Framework 14
2.4.1 Self-Refinement: Rise and Limitations 15
2.4.2 Collaborative Correction in AI Agent Workflows 16
2.4.3 Evaluation via LLM-as-a-Judge 17
2.5 Summary and Conceptual Framework 18
Chapter 3 Methodology 20
3.1 Overview of Research Framework 20
3.2 Dataset Design and Message Control 22
3.3 Models and Tool Selection 24
3.4 Experimental Design and Agent Framework 25
3.5 Evaluation Protocol and Metrics 26
3.6 Experimental Setup 27
Chapter 4 Experiment and Results 28
4.1 Quantitative Analysis 28
4.1.1 Overall Performance and Efficiency 28
4.1.2 Stage-Wise Performance and Challenge Identification 30
4.1.3 Stage-Wise Average Iterations 31
4.2 Qualitative Analysis: A Case Study in Refinement 32
4.3 Summary of Key Findings 34
Chapter 5 Discussion and Conclusion 36
5.1 Interpretation of Findings 36
5.2 Theoretical Implications 37
5.3 Practical Implications 38
5.4 Research Limitations 38
5.5 Future Research and Conclusion 39
REFERENCES 41
Appendix A Prompt Templates & Evaluation Rubrics 43
A.1 Selected Customer Tweets for Experiments 43
A.2 Core Agent Prompts 43
A.2.1 System Prompt 43
A.2.2 Stage Generation Prompts 44
A.3 Evaluation Prompts 44
A.3.1 Standard Evaluation Prompts 44
A.3.2 Critical Evaluation Prompts 48
A.3.3 External Critique Prompts 49
A.4 Evaluation Rubrics 52
-
dc.language.isoen-
dc.subject客戶服務旅程-
dc.subject大型語言模型-
dc.subject上下文學習-
dc.subject獎勵工程-
dc.subjectLLM-as-a-Judge-
dc.subjectCustomer Care Journey-
dc.subjectLarge Language Model (LLM)-
dc.subjectIn-Context Learning-
dc.subjectReward Engineering-
dc.subjectLLM-as-a-Judge-
dc.title運用獎勵工程提升顧客服務旅程體驗zh_TW
dc.titleUsing Reward Engineering to Enhance Customer Care Journeyen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee何承遠;張瑋倫zh_TW
dc.contributor.oralexamcommitteeCheng-Yuan Ho;Wei-Lun Changen
dc.subject.keyword客戶服務旅程,大型語言模型上下文學習獎勵工程LLM-as-a-Judgezh_TW
dc.subject.keywordCustomer Care Journey,Large Language Model (LLM)In-Context LearningReward EngineeringLLM-as-a-Judgeen
dc.relation.page53-
dc.identifier.doi10.6342/NTU202504806-
dc.rights.note未授權-
dc.date.accepted2025-12-18-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-liftN/A-
Appears in Collections:資訊管理學系

Files in This Item:
File SizeFormat 
ntu-114-1.pdf
  Restricted Access
1.1 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved