Entity2Intent：一個用於歧義感知意圖分類的實體引導推理框架

鳳凰; Avijit Balabantaray

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100604

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許永真博士	zh_TW
dc.contributor.advisor	Jane Yung-jen Hsu	en
dc.contributor.author	鳳凰	zh_TW
dc.contributor.author	Avijit Balabantaray	en
dc.date.accessioned	2025-10-08T16:05:19Z	-
dc.date.available	2025-10-09	-
dc.date.copyright	2025-10-08	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-08	-
dc.identifier.citation	[1] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. [2] M. A. Arbib. Schema theory. The encyclopedia of artificial intelligence, 2:1427–1443, 1992. [3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakan-tan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [4] I. Casanueva, I. Temnikova, M. Henderson, D. Gerz, and I. Vulic. Efficient in-tent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45. Association for Computational Linguistics, 2020. [5] J. J. Y. Chung, E. Kamar, and S. Amershi. Increasing diversity while maintainingaccuracy: Text data generation with large language models and human interventions. arXiv preprint arXiv:2306.04140, 2023. [6] S. Fakhoury, A. Naik, G. Sakkas, S. Chakraborty, and S. K. Lahiri. Llm-based test-driven interactive code generation: User study and empirical evaluation. IEEE Transactions on Software Engineering, 2024. [7] Y. Gong, C. Liu, J. Yuan, F. Yang, X. Cai, G. Wan, J. Chen, R. Niu, and H. Wang. Density-based dynamic curriculum learning for intent detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 3034–3037, 2021. [8] T. Hartvigsen, S. Gabriel, H. Palangi, M. Sap, D. Ray, and E. Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech de-tection. arXiv preprint arXiv:2203.09509, 2022. [9] Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y. Yang, J. Callan, and G. Neubig. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, 2023. [10] E. Kamalloo, N. Dziri, C. L. Clarke, and D. Rafiei. Evaluating open-domain question answering in the era of large language models. arXiv preprint arXiv:2305.06984, 2023. [11] S. Larson, A. Mahendran, J. Peper, C. Clarke, A. Lee, P. Hill, J. K. Kummerfeld, K. Leach, M. A. Laurenzano, and L. Tang. An evaluation dataset for intent clas- sification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1311–1316. Association for Computational Linguistics, 2019. [12] M. Li, R. Wu, H. Liu, J. Yu, X. Yang, B. Han, and T. Liu. Instant: Semi-supervised learning with instance-dependent thresholds. Advances in Neural Information Processing Systems, 36:2922–2938, 2023. [13] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9):1–35, 2023. [14] X. Liu, A. Eshghi, P. Swietojanski, and V. Rieser. Benchmarking natural language understanding services for building conversational agents. In Increasing naturalness and flexibility in spoken dialogue interaction: 10th international workshop on spoken dialogue systems, pages 165–183. Springer, 2021. [15] Y. Liu, K. Shi, K. S. He, L. Ye, A. R. Fabbri, P. Liu, D. Radev, and A. Cohan. On learning to summarize with large language models as references. arXiv preprint arXiv:2305.14239, 2023. [16] Z. Lu, J. Tian, W. Wei, X. Qu, Y. Cheng, D. Chen, et al. Mitigating boundary am-biguity and inherent bias for text classification in the era of large language models. arXiv preprint arXiv:2406.07001, 2024. [17] L. Qin, W. Che, Y. Li, H. Wen, and T. Liu. A stack-propagation framework with token-level intent detection for spoken language understanding. arXiv preprint arXiv:1909.02188, 2019. [18] P. Rebmann et al. Evaluating semantic awareness in large language models for process mining tasks. arXiv preprint arXiv:2407.02310, 2024. [19] J. Robinson, C. M. Rytting, and D. Wingate. Leveraging large language models for multiple choice question answering. arXiv preprint arXiv:2210.12353, 2022. [20] G. Sahu, P. Rodriguez, I. H. Laradji, P. Atighehchian, D. Vazquez, and D. Bahdanau. Data augmentation for intent classification with off-the-shelf large language models. arXiv preprint arXiv:2204.01959, 2022. [21] B. Schwartz. The paradox of choice. Positive psychology in practice: Promoting human flourishing in work, health, education, and everyday life, pages 121–138, 2015. [22] T. Shin, Y. Razeghi, R. L. Logan IV, E. Wallace, and S. Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXivpreprint arXiv:2010.15980, 2020. [23] M. Siino and I. Tinnirello. Prompt engineering for identifying sexism using gpt mistral 7b. In Proc. of the 25th Working Notes of the Conference and Labs of the Evaluation Forum, volume 3740, pages 1228–1236, 2024. [24] A. Suresh, R. Gupta, and M. Yu. Conformal intent classification and clarification for fast and accurate intent recognition. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023. [25] G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M.Dai, A. Hauth, K. Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. [26] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288,2023. [27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. [28] Y. Wang, Z. Zhang, and R. Wang. Element-aware summarization with large languagemodels: Expert-aligned evaluation and chain-of-thought method. arXiv preprint arXiv:2305.13412, 2023. [29] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al.Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:2482424837, 2022. [30] V. Yadav, Z. Tang, and V. Srinivasan. Paraphrase and aggregate with large language models for minimizing intent classification errors. arXiv preprint arXiv:2406.17163, 2024. [31] B. Zhang, B. Haddow, and A. Birch. Prompting large language model for machine translation: A case study. In International Conference on Machine Learning, pages 41092–41110. PMLR, 2023. [32] C. Zhang, Y. Li, N. Du, W. Fan, and P. S. Yu. Joint slot filling and intent detection via capsule neural networks. arXiv preprint arXiv:1812.09471, 2018. [33] Y. Zhang and Y. Choi. Intent-sim: When should llms ask for clarification? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. [34] L. Zhou, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, D. Li, H. Zhang, J. E. Gonzalez, and I. Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023. [35] Y. Zhou, A. I. Muresanu, Z. Han, K. Paster, S. Pitis, H. Chan, and J. Ba. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022. [36] W. Zhu, H. Liu, Q. Dong, J. Xu, S. Huang, L. Kong, J. Chen, and L. Li. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675, 2023. [37] Y. Zhu, H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, H. Chen, Z. Liu, Z. Dou, and J.R. Wen. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100604	-
dc.description.abstract	大型語言模型 (LLM) 憑藉其泛化能力，在意圖分類任務中展現出強大的表現。然而，在語義模糊的環境中，它們的有效性會下降，因為許多意圖標籤的含義緊密相關或重疊。這些挑戰在細粒度分類任務中尤其明顯，因為意圖之間的細微差別會導致邊界模糊和頻繁的誤分類。我們還觀察到，增加意圖類別的數量會導致準確率下降，這是由緊密相關的意圖標籤之間的語義重疊和邊界模糊造成的。在這種模糊場景下，基於提示的通用方法往往無法發揮作用，因為它們往往依賴淺層的詞彙線索，難以消除緊密相關的意圖的歧義。為了克服這些限制，我們提出了 Entity2Intent，這是一種新穎的實體引導推理框架，它引入了對使用者查詢的結構化解釋，從而能夠在語義重疊的標籤空間中實現更準確的意圖分類。我們在三個廣泛使用的意圖分類基準資料集（Banking77、Clinc150 和 LIU54）上進行了實驗，以評估我們提出的 Entity2Intent 框架的有效性。與 Zero-Shot、Few-Shot 和 PC-CoT 等強基線相比，我們的方法始終保持更高的準確率，在每種設定下，與最佳基線相比，平均提升高達 +0.51。	zh_TW
dc.description.abstract	Large Language Models (LLMs) have demonstrated strong performance in intent classification tasks due to their generalization capabilities. However, their effectiveness declines in semantically ambiguous settings, where many intent labels are closely related or overlapping in meaning. These challenges are especially pronounced in fine-grained classification tasks, where subtle distinctions between intents give rise to boundary ambiguity and frequent misclassification. We also observed that increasing the number of intent classes leads to lower accuracy, driven by semantic overlap and boundary ambiguity among closely related intent labels. In such ambiguous scenarios, general prompt-based methods often fall short, as they tend to rely on shallow lexical cues and struggle to disambiguate closely related intents. To overcome these limitations, we propose Entity2Intent, a novel entity-guided reasoning framework that introduces structured interpretation of user queries, enabling more accurate intent classification in semantically overlapping label spaces. We conducted experiments on three widely used intent classification benchmark datasets, Banking77, Clinc150, and LIU54, to evaluate the effectiveness of our proposed Entity2Intent framework. Compared to strong baselines such as Zero-Shot, Few-Shot, and PC-CoT, our method consistently achieves higher accuracy, with average improvements of up to +0.51% (GPT-3.5-Turbo) and +3.56% (LLaMA2-70B-Chat) over the best baseline in each setting.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-10-08T16:05:19Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-10-08T16:05:19Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements – ii 摘要 – iii Abstract – iv Contents – v List of Figures – ix List of Tables – xi Chapter 1 Introduction – 1 1.1 Research Objective – 3 1.2 Thesis Organization – 4 Chapter 2 Related Work – 5 2.1 Large Language Models – 5 2.1.1 Prompting – 6 2.1.2 In-context Learning – 6 2.1.3 Chain-of-Thought Prompting – 7 2.2 LLMs on Intent Classification – 7 2.2.1 LLMs on Intent Disambiguation – 8 2.2.2 Human-in-the-loop – 8 2.2.3 Semantic Reasoning – 9 Chapter 3 Problem Definition – 10 3.1 Boundary Ambiguity in Intent Classification – 10 3.2 Notation – 12 Chapter 4 Methodology – 13 4.1 Empirical Motivation and Cognitive Foundations – 13 4.1.1 The Paradox of Choice – 13 4.1.2 Schema Theory – 13 4.2 Entity2Intent – 14 4.2.1 Self-Reduction – 14 4.2.2 Iterative Top Reduction (ITR) – 15 4.2.3 Cluster-Based Window Reduction (CBWR) – 16 4.2.4 Entity-Guided Reasoning – 18 Chapter 5 Experiments and Analysis – 21 5.1 Datasets – 21 5.2 Metric of Evaluation – 23 5.3 Models – 24 5.4 Implementation Details – 24 5.4.1 Challenge Set Construction – 25 5.4.2 Baseline Method – 26 5.5 Main Result – 26 5.5.1 Confusion Matrix Analysis: PC-CoT vs. Entity2Intent – 28 5.5.2 LLM Call Efficiency: Entity2Intent vs. PC-CoT – 29 5.6 Ablation Study of Entity2Intent – 30 5.6.1 Entity2Intent Evaluation on Full Label Set (No Reduction) – 30 5.6.2 Entity2Intent Without Structured Roles – 31 Chapter 6 Conclusions – 33 6.1 Contributions – 33 6.2 Limitation and Future Work – 34 References – 35 Appendix A — Prompting Details – 41 A.1 Entity2Intent Prompt – 41 A.2 Zero-Shot Prompt – 42 A.3 Few-Shot Prompt – 42 A.4 Standard Reduction Prompt – 42 Appendix B — Entity2Intent Success and Failure Case Study for Each Dataset – 43 Appendix C — Reductions of Intent Space Using CBWR and ITR Methods – 47 Appendix D — Impact of Label Set Size in Few-Shot Setting – 48 Appendix E — Full Intent Label Lists for – 50	-
dc.language.iso	en	-
dc.subject	大型語言模型,意圖分類,語意推理,實體引導推理,零樣本學習,少樣本學習,	zh_TW
dc.subject	Large Language Models,Intent Classification,Semantic Reasoning,Entity-Guided Reasoning,Zero-Shot Learning,Few-Shot Learning,	en
dc.title	Entity2Intent：一個用於歧義感知意圖分類的實體引導推理框架	zh_TW
dc.title	Entity2Intent: An Entity-Guided Reasoning Framework for Ambiguity-Aware Intent Classification	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	陳彥仰	zh_TW
dc.contributor.coadvisor	Mike Y. Chen	en
dc.contributor.oralexamcommittee	古倫維;黃喬敬	zh_TW
dc.contributor.oralexamcommittee	Lun-Wei Ku;Chiao-Ching Huang	en
dc.relation.page	57	-
dc.identifier.doi	10.6342/NTU202502626	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2025-10-09	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	3.14 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。