請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101156| 標題: | 運用獎勵工程提升顧客服務旅程體驗 Using Reward Engineering to Enhance Customer Care Journey |
| 作者: | 王彥碩 Yan-Shuo Wang |
| 指導教授: | 黃明蕙 Ming-Hui Huang |
| 關鍵字: | 客戶服務旅程,大型語言模型上下文學習獎勵工程LLM-as-a-Judge Customer Care Journey,Large Language Model (LLM)In-Context LearningReward EngineeringLLM-as-a-Judge |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 近年來,大型語言模型 (LLMs) 已經被大量整合至客服領域。然而,LLM 在應用於現實客服場景時仍會面臨許多問題,包括回應品質不一致、情感表達平淡,以及難以理解多階段互動中的複雜語境。本研究提出一個應用於四階段客戶服務旅程,的推論時、無需微調的獎勵工程框架上,以模擬現實世界的實務問題。我們透過不同的實驗條件(基線、閾值、關鍵評估者與外部評估者)處理了 7 則富含情感的客戶抱怨推文,研究發現:(1) 實務的客戶服務情境需要建立穩健的品質底線 (quality floor);(2) 關鍵評估者的效率最低,消耗最多的 Token 且平均迭代次數最高(3.07 次,相較於其他條件約 1.5 次);(3) 雖然外部評估者表現最佳(成功率 98.95%),但在預算有限的情況下,閾值條件 (Threshold condition) 是最推薦的選擇。本研究提供了一個探索性研究的範例,旨在測試獎勵工程方法是否能提升回應效率,並發現不同的基礎模型在給定相同的評估標準 (rubric) 下,實際上能彼此達成共識。結果顯示,當客戶服務回應必須達到高標準時,迭代優化機制是不可或缺的。此外,在應用此框架時,「以 LLM 為評審 (LLM-as-a-Judge)」扮演了穩健的評估角色。 Recently, Large Language Models (LLMs) have been widely integrated into the customer service field. However, LLMs still face obstacles when applied to real-world scenarios. Issues include inconsistent responses, emotional flatness, and a failure to understand the complex context of multi-stage interactions. This study proposes an inference-time, fine-tuning-free reward engineering framework applied to the four-stage customer care journey to simulate real-world practical problems. We processed 7 emotionally rich tweets through different experimental conditions (Baseline, Threshold, Critical Evaluator, and External Evaluator) and found that: (1) A robust quality floor is needed for practical customer service scenarios. (2) The Critical Evaluator was the least efficient, requiring the most tokens and having the highest average iteration counts (3.07 iterations, compared to the others at around 1.5). (3) While the External Evaluator achieved the best performance (98.95% success rate), the Threshold condition is the most recommended under a tight budget. This study provides an example of exploratory research to test whether the reward engineering approach optimizes response efficiency, and finds that different base models can actually reach a consensus with each other given the same evaluation rubric. The results show that an iterative refinement mechanism is essential when the customer care responses must meet a high standard. Furthermore, LLM-as-a-Judge serves as a robust evaluation role when applying this framework. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101156 |
| DOI: | 10.6342/NTU202504806 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-114-1.pdf 未授權公開取用 | 1.1 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
