Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97826
Title: 主題模型強化之短文本檢索與生成品質提升
Enhancing Retrieval and Generation Quality for Short Texts with Topic Models
Authors: 黎家愷
Chia-Kai Li
Advisor: 郭瑞祥
Ruey-Shan Guo
Keyword: 資訊檢索,自然語言處理,檢索增強生成,主題建模,大型語言模型,
Information Retrieval,Natural Language Processing,Retrieval-Augmented Generation,Topic Modeling,Large Language Models,
Publication Year : 2025
Degree: 碩士
Abstract: 檢索增強生成(RAG)的效能在很大程度上取決於檢索階段的品質。傳統的RAG 系統主要依賴向量相似度匹配,卻無法完美捕捉查詢與文檔之間的潛在主題結構,導致檢索結果不理想。為了解決這個問題,本研究在 RAG 的檢索階段引入主題建模(透過 BERTopic 流程 ),對 Amazon 產品評論進行主題聚類,並利用查詢重寫來判定查詢的主題 。透過將檢索範圍限定於相關主題,我們的方法能有效過濾無關文檔,提升檢索精度,並進一步改善下游文本生成的品質。我們在多個檢索基準數據集上進行實驗評估,結果顯示,相較於傳統的向量檢索方法(Naive RAG),BERTopic 檢索策略在檢索相關度與 RAG 的整體表現方面均有顯著提升。
The effectiveness of Retrieval-Augmented Generation (RAG) largely depends on the quality of the retrieval stage. Traditional RAG systems primarily rely on vector similarity matching, which cannot perfectly capture the underlying thematic structure between queries and documents, leading to suboptimal retrieval results. To address this issue, this study introduces topic modeling (via the BERTopic process) in the RAG retrieval phase, performs topic clustering on Amazon product reviews, and utilizes query rewriting to determine the topics of queries. By restricting the retrieval scope to relevant topics, our approach effectively filters out irrelevant documents, enhances retrieval precision, and further improves the quality of downstream text generation. We conducted experimental evaluations on multiple retrieval benchmark datasets, and the results indicate that, compared to traditional vector-based retrieval methods (Naive RAG), the BERTopic retrieval strategy significantly improves both retrieval relevance and the overall performance of RAG.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97826
DOI: 10.6342/NTU202501611
Fulltext Rights: 未授權
metadata.dc.date.embargo-lift: N/A
Appears in Collections:工業工程學研究所

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
9.03 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved