Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94284
Title: Multi-DSI: 可微搜尋索引的非確定性標識符和概念對齊
Multi-DSI: Non-deterministic Identifier and Concept Alignment for Differentiable Search Index
Authors: 柳宇澤
Yu-Ze Liu
Advisor: 鄭卜壬
Pu-Jen Cheng
Keyword: 生成式資訊檢索,可微分搜尋索引,查詢生成應用,概念對齊,多重索引點檢索,
Generative Information Retrieval,Differentiable Search Index,Query Generation,Concept Alignment,Multiple Indexing Point Retrieval,
Publication Year : 2024
Degree: 碩士
Abstract: 信息檢索(IR)已經被研究了很長一段時間。為解決IR問題,提出了許多方法,這些方法大致分為兩個方向:統計方法和深度學習方法。統計方法通常利用詞語的分佈來計算查詢與文檔的相似性,而深度學習模型則傾向於學習編碼器,並將查詢和文檔投射到向量空間中進行檢索。隨著生成性深度學習模型的出現,生成性信息檢索(Generative IR)引起了越來越多的關注。生成性信息檢索為解決信息檢索問題提供了新視角,並且透過生成模型直接生成文檔的標示符,減少了在推理過程中計算相似性所需的複雜度,該複雜度極大地受語料庫規模的影響。然而,現有方法面臨兩個問題:(1)當文檔僅用一個語義標識符(ID)表示時,檢索模型可能無法捕捉到文檔多方面且複雜的內容;(2)當生成的訓練數據存在語義模糊時,檢索模型可能難以區分相似文檔內容之間的差異。為了解決這些問題,我們提出了Multi-DSI,旨在(1)提供多個非確定性的語義標識符(Non-deterministic Semantic Identifier);(2)對齊查詢和文檔的概念以避免模糊性。在兩個基準數據集上的大量實驗表明,所提出的模型比基線方法顯著提高了7.4%的性能。
There are many methods proposed to tackle IR problems. They are roughly divided into two directions, statistical methods and deep learning methods. While statistical methods usually utilize the distribution of words to calculate the similarities of the queries and documents, deep learning models tend to learn encoders and project queries and documents to a vector space for retrieval. With the advent of generative deep learning models, generative IR has gained increasing attention. However, existing methods face two issues: (1) when a document is represented by a single semantic ID, the retrieval model may fail to capture the multifaceted and complex content of the document; and (2) when the generated training data exhibits semantic ambiguity, the retrieval model may struggle to distinguish the differences in the content of similar documents. To address these issues, we propose Multi-DSI to (1) offer multiple non-deterministic semantic identifiers and (2) align the concepts of queries and documents to avoid ambiguity. Extensive experiments on two benchmark datasets demonstrate that the proposed model significantly outperforms baseline methods by 7.4%.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94284
DOI: 10.6342/NTU202402960
Fulltext Rights: 同意授權(全球公開)
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-112-2.pdf846.78 kBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved