Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80951
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Jane Yung-jen Hsu)
dc.contributor.authorYi-Chen Linen
dc.contributor.author林依蓁zh_TW
dc.date.accessioned2022-11-24T03:23:20Z-
dc.date.available2021-09-17
dc.date.available2022-11-24T03:23:20Z-
dc.date.copyright2021-09-17
dc.date.issued2021
dc.date.submitted2021-09-11
dc.identifier.citation[1] A. Bustos and A. Pertusa. Learning eligibility in cancer clinical trials using deep neural networks. 2018. [2] W. W. Chapmana, W. Bridewell, P. Hanburya, G. F. CooperabBruce, and G. Buchanana. A simple algorithm for identifying negated findings and diseases in discharge summaries. 2001. [3] Q. Chen, Y. Peng, and Z. Lu. Biosentvec: creating sentence embeddings for biomed- ical texts. 2019. [4] J. Gao, C. Xiao, L. M. Glass, and J. Sun. Compose: Cross-modal pseudo-siamese network for patient trial matching. 2020. [5] T. Kang, S. Zhang, Y. Tang, G. W. Hruby, A. Rusanov, N. Elhadad, and C. Weng. Eliie: An open-source information extraction system for clinical trial eligibility crite- ria. 2017. [6] M. Pagliardini, P. Gupta, and M. Jaggi. Unsupervised Learning of Sentence Embed- dings using Compositional n-Gram Features. In NAACL 2018 - Conference of the North American Chapter of the Association for Computational Linguistics, 2018. [7] C. Patel, J. Cimino, J. Dolby, A. Fokoue, A. Kalyanpur, A. Kershenbaum, L. Ma, E. Schonberg, and K. Srinivas. Matching patient records to clinical trials using ontologies. 2007. [8] Y. Peng, S. Yan, and Z. Lu. Transfer learning in biomedical natural language process- ing: An evaluation of bert and elmo on ten benchmarking datasets. In Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019), pages 58–65, 2019. [9] Y. Tseo, M. I. Salkola, A. Mohamed, A. Kumar, and F. Abnousi. Information ex- traction of clinical trial eligibility criteria. 2020. [10] C. Weng, X. Wu, Z. Luo, M. R. Boland, D. Theodoratos, and S. B. Johnson. Elixr: an approach to eligibility criteria extraction and representation. 2011. [11] C. Yuan, P. B. Ryan, C. Ta, Y. Guo, Z. Li, J. Hardin, R. Makadia, P. Jin, N. Shang, T. Kang, and C. Weng. Criteria2query: a natural language interface to clinical databases for cohort definition. 2019. [12] X. Zhang, C. Xiao, L. M. Glass, and J. Sun. Deepenroll: Patient-trial matching with deep embedding and entailment prediction. 2020. [13] Y. Zhang, Q. Chen, Z. Yang, H. Lin, and Z. Lu. Biowordvec, improving biomedical word embeddings with subword information and mesh. scientific data. 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80951-
dc.description.abstract建構一個自動化的病例與試驗配對(Patient-Trial Matching)系統,提升病例與試驗配對效率,可以幫助醫療人員節省寶貴時間,更快地取得合適的臨床試驗資訊,作為醫療方案的參考選項。 病例與試驗配對是基於病人狀況(病例)搜尋出符合資格條件(Eligibility Criteria)的臨床試驗。由於資格條件皆是以非結構化的自然語言形式表示,並且包含了臨床醫學知識的複雜語義,過去相關研究大多將其視為資訊自動抽取問題,並以基於規則的方式來解決,近來則以機器學習或神經網路模型來處理。然而,這些方法都需要花費龐大人力或取得大量人工標注資料才能進行。 我們提出一個迭代式工作方法(Iterative Approach)來進行病例與試驗配對。在缺乏標註資料的前期研究階段,以語義相似度進行句級配對(Matching in Sentence Level),並以此作為核心模組,建構一個包含文字預處理、文本向量表示、句對匹配、反義偵測、數值過濾器、整合排序等模組的完整配對流程。取得的配對結果皆交由醫學專家進行人工評量與標註,我們從標註結果的分析中發現,語義相似等於語義相關乃至相符的假設並不完全適用於本研究的文本特性。因此,在後期研究中嘗試以自監督式學習(Self-supervised Learning)的方式解決該問題。 我們將標註資料、病例資料、試驗條件三者進行關聯,針對特定的臨床特徵詞產生虛擬標註資料集,用來訓練語義正反(相符與否)的分類模型,並將分類器導回整體工作流程中。利用上一輪的配對結果進行『標註-生成-訓練』三個步驟產生分類器,再產生下一輪的配對結果,如此反覆迭代,即可逐漸提升整體的配對效能。 我們依序在三批病例上執行兩輪此迭代式工作流程,結果是,導入分類器後皆增加了15%精確率,提升了整體流程的配對效能,驗證了本研究的迭代方法和配對流程是可行而且有效的。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:23:20Z (GMT). No. of bitstreams: 1
U0001-1009202113321700.pdf: 3883373 bytes, checksum: 0ab306a98c27f35f23f22882f0ac5364 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents"Acknowledgments i Abstract iii List of Figures xi List of Tables xii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 Chapter 2 Related Work 7 2.1 Automated Criteria Extraction 7 2.2 Patient-Trial Matching 11 2.3 Sentence Embedding 11 Chapter 3 Solving Cold Start Problem in Patient-Trial Matching with Semantic-based Method 13 3.1 Patient-Trial Matching Using Iterative Approach 14 3.2 Solving Cold Start Problem in Patient-Trial Matching with Content-based Method 14 3.2.1 Cold Start Problem 15 3.2.2 Content-based Method 15 3.2.3 Matching Not in Document Level, But in Sentence Level 15 3.2.4 Feature Extraction with Semantic Method 17 3.3 Workflow 17 3.3.1 Text Preprocessing 17 3.3.2 Text Representation Using Pre-trained Sentence Embedding 18 3.3.3 Sentence Pair Matching 19 3.3.4 Negation Processing 20 3.3.5 Numeric Criteria Filter 22 3.3.6 Summation, Aggregation and Ranking 25 3.4 Evaluation 26 3.5 Problem Discovering: Similar, Relevant, and Suitable 26 3.5.1 相似不等於相關 26 3.5.2 相關不等於相符 29 3.5.3 反義偵測結果不正確 30 3.5.4 病例缺漏 31 3.5.5 小結 32 Chapter 4 Solving Sentence Matching Problem with Self-supervised Learning 33 4.1 Enhanced Workflow 34 4.1.1 Evaluate by Medical Expert 35 4.1.2 Analysis Unsuitable Pairs 35 4.1.3 Generate Pseudo Labeled Dataset 36 4.1.4 Train Classifier with BERT-based Pre-trained Model 38 4.1.5 Adapt Classifiers into Matching Workflow 39 4.2 Evaluation 39 Chapter 5 Experiments 42 5.1 Text Preprocessing 42 5.1.1 Clinical Trial Documents 42 5.1.2 Patient Notes 43 5.2 Case Study: Patient Case 1 44 5.2.1 Supplement Missing Information in Patient Note 44 5.2.2 Problem Analysis and Revision 45 5.2.3 Negation Processing 46 5.2.4 Conclusion 47 5.3 Case Study: Patient Case 2 47 5.3.1 Age Filter 47 5.4 Case Study: Patient Case 5 48 5.4.1 Numeric Value Filter 48 5.5 Experiment Results on 6 Patient Notes 49 5.6 Binary Classifier for Sentence Matching 49 5.6.1 Manual Annotation Data 50 5.6.2 First Round 50 5.6.3 Second Round 52 Chapter 6 Conclusion 53 Bibliography 55 "
dc.language.isozh-TW
dc.subject語義匹配zh_TW
dc.subject病例與臨床試驗配對zh_TW
dc.subject自然語言處理zh_TW
dc.subject自監督式學習zh_TW
dc.subjectSemantic Matchingen
dc.subjectPatient-trial Matchingen
dc.subjectSelf-supervised Learningen
dc.subjectNatural Language Processingen
dc.title基於語義方法和自監督式學習之病例與試驗配對zh_TW
dc.titleTowards Semantic Matching and Self-supervised Learning for Patient-trial Matchingen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee古倫維(Hsin-Tsai Liu),蔡芸琤(Chih-Yang Tseng)
dc.subject.keyword病例與臨床試驗配對,自然語言處理,語義匹配,自監督式學習,zh_TW
dc.subject.keywordPatient-trial Matching,Natural Language Processing,Semantic Matching,Self-supervised Learning,en
dc.relation.page56
dc.identifier.doi10.6342/NTU202103105
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-09-13
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-1009202113321700.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
3.79 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved