Patient-LITE：利用大型語言模型，打造以患者為中心的通用型臨床試驗資格篩選問卷

游子慧; Tzu-Hui Yu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96601

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	曾宇鳳	zh_TW
dc.contributor.advisor	Yufeng Jane Tseng	en
dc.contributor.author	游子慧	zh_TW
dc.contributor.author	Tzu-Hui Yu	en
dc.date.accessioned	2025-02-20T16:08:50Z	-
dc.date.available	2025-02-21	-
dc.date.copyright	2025-02-20	-
dc.date.issued	2025	-
dc.date.submitted	2025-01-13	-
dc.identifier.citation	A. K. Adupa, R. P. Garg, J. Corona-Cox, S. Shah, S. R. Jonnalagadda, et al. An information extraction approach to prescreen heart failure patients for clinical trials. arXiv preprint arXiv:1609.01594, 2016. M. R. Boland, S. W. Tu, S. Carini, I. Sim, and C. Weng. Elixr-time: a temporal knowledge representation for clinical research eligibility criteria. AMIA summits on translational science proceedings, 2012:71, 2012. E. Buhle Jr, J. Goldwein, and I. Benjamin. Oncolink: a multimedia oncology informa- tion resource on the internet. In Proceedings of the Annual Symposium on Computer Application in Medical Care, page 103. American Medical Informatics Association, 1994. P. M. Calverley, R. J. Nordyke, R. Halbert, S. Isonaka, and D. Nonikov. Development of a population-based screening questionnaire for copd. COPD: Journal of Chronic Obstructive Pulmonary Disease, 2(2):225–232, 2005. F. Chung, H. R. Abdullah, and P. Liao. Stop-bang questionnaire: a practical approach to screen for obstructive sleep apnea. Chest, 149(3):631–638, 2016. S. Datta, K. Lee, H. Paek, F. J. Manion, N. Ofoegbu, J. Du, Y. Li, L.-C. Huang, J. Wang, B. Lin, et al. Autocriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal of the American Medical Informatics Association, 31(2):375–385, 2024. M. DeBellis, N. Duttab, J. Ginoc, and A. Balajid. Integrating ontologies and large language models to implement retrieval augmented generation (rag). Applied Ontology, 1:1–5, 2024. K. Durden, P. Hurley, D. L. Butler, A. Farner, S. P. Shriver, and M. E. Fleury. Provider motivations and barriers to cancer clinical trial screening, referral, and operations: Find- ings from a survey. Cancer, 130(1):68–76, 2024. M. E. Fleury. Consensus recommendations for improving the cancer clinical trial matching environment. Cancer, 130(1):11–15, 2024. T. Haddad, J. M. Helgeson, K. E. Pomerleau, A. M. Preininger, M. C. Roebuck, I. Dankwa- Mullan, G. P. Jackson, and M. P. Goetz. Accuracy of an artificial intelligence system for cancer clinical trial eligibility screening: retrospective pilot study. JMIR Medical Informatics, 9(3):e27767, 2021. D. M. d. Hamer, P. Schoor, T. B. Polak, and D. Kapitan. Improving patient pre-screening for clinical trials: assisting physicians with large language models. arXiv preprint arXiv:2304.07396, 2023. T. Hao, H. Liu, and C. Weng. Valx: a system for extracting and structuring numeric lab test comparison statements from text. Methods of information in medicine, 55(03): 266–275, 2016. Q. Jin, Z. Wang, C. S. Floudas, F. Chen, C. Gong, D. Bracken-Clarke, E. Xue, Y. Yang, J. Sun, and Z. Lu. Matching patients to clinical trials with large language models. ArXiv, 2023. T. Kang, S. Zhang, Y. Tang, G. W. Hruby, A. Rusanov, N. Elhadad, and C. Weng. Eliie: An open-source information extraction system for clinical trial eligibility criteria. Journal of the American Medical Informatics Association, 24(6):1062–1071, 2017. E. S. Kim, D. Bernstein, S. G. Hilsenbeck, C. H. Chung, A. P. Dicker, J. L. Ersek, S. Stein, F. R. Khuri, E. Burgess, K. Hunt, et al. Modernizing eligibility criteria for molecularly driven trials. Journal of Clinical Oncology, 33(25):2815–2820, 2015. C. Liu, C. Yuan, A. M. Butler, R. D. Carvajal, Z. R. Li, C. N. Ta, and C. Weng. Dquest: dynamic questionnaire for search of clinical trials. Journal of the American Medical Informatics Association, 26(11):1333–1343, 2019. X. Liu, G. L. Hersch, I. Khalil, and M. Devarakonda. Clinical trial information extraction with bert. In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), pages 505–506. IEEE, 2021. Z. Luo, R. Duffy, S. Johnson, and C. Weng. Corpus-based approach to creating a semantic lexicon for clinical research eligibility criteria from umls. Summit on Translational Bioinformatics, 2010:26, 2010. K. Milian, A. Bucur, and A. Ten Teije. Formalization of clinical trial eligibility criteria: Evaluation of a pattern-based approach. In 2012 IEEE International Conference on Bioinformatics and Biomedicine, pages 1–4. IEEE, 2012a. K. Milian, A. Ten Teije, A. Bucur, and F. van Harmelen. Patterns of clinical trial eligi- bility criteria. In Knowledge Representation for Health-Care: AIME 2011 Workshop KR4HC 2011, Bled, Slovenia, July 2-6, 2011. Revised Selected Papers 3, pages 145– 157. Springer, 2012b. R. Miotto, S. Jiang, and C. Weng. etacts: a method for dynamically filtering clinical trial search results. Journal of biomedical informatics, 46(6):1060–1067, 2013. M. Nievas, A. Basu, Y. Wang, and H. Singh. Distilling large language models for matching patients to clinical trials. Journal of the American Medical Informatics Association, page ocae073, 2024. G. K. Parai, C. Jonquet, R. Xu, M. A. Musen, and N. H. Shah. The lexicon builder web service: building custom lexicons from two hundred biomedical ontologies. In AMIA Annual Symposium Proceedings, volume 2010, page 587. American Medical Informat- ics Association, 2010. G. Peikos, S. Symeonidis, P. Kasela, and G. Pasi. Utilizing chatgpt to enhance clinical trial enrollment. arXiv preprint arXiv:2306.02077, 2023. A. Ralevski, N. Taiyab, M. Nossal, L. Mico, S. Piekos, and J. Hadlock. Using large language models to abstract complex social determinants of health from original and deidentified medical notes: Development and validation study. Journal of Medical Internet Research, 26:e63445, 2024. W. S. Saba. Stochastic llms do not understand language: towards symbolic, explainable and ontologically based llms. In International Conference on Conceptual Modeling, pages 3–19. Springer, 2023. M. Tahmid Rahman Laskar, S. Alqahtani, M. Saiful Bari, M. Rahman, M. Abdullah Matin Khan, H. Khan, I. Jahan, A. Bhuiyan, C. W. Tan, M. R. Parvez, et al. A sys- tematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations. arXiv e-prints, pages arXiv–2407, 2024. S. Tian, A. Erdengasileng, X. Yang, Y. Guo, Y. Wu, J. Zhang, J. Bian, and Z. He. Transformer-based named entity recognition for parsing clinical trial eligibility crite- ria. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 1–6, 2021. Y. Tseo, M. Salkola, A. Mohamed, A. Kumar, and F. Abnousi. Information extraction of clinical trial eligibility criteria. arXiv preprint arXiv:2006.07296, 2020. J. M. Unger, R. Vaidya, D. L. Hershman, L. M. Minasian, and M. E. Fleury. Systematic review and meta-analysis of the magnitude of structural, clinical, and physician and pa- tient barriers to cancer clinical trial participation. JNCI: Journal of the National Cancer Institute, 111(3):245–255, 2019. Z. Wang, C. Xiao, and J. Sun. Autotrial: prompting language models for clinical trial design. arXiv preprint arXiv:2305.11366, 2023. C. Weng, X. Wu, Z. Luo, M. R. Boland, D. Theodoratos, and S. B. Johnson. Elixr: an approach to eligibility criteria extraction and representation. Journal of the American Medical Informatics Association, 18(Supplement_1):i116–i124, 2011. C. Wong, S. Zhang, Y. Gu, C. Moung, J. Abel, N. Usuyama, R. Weerasinghe, B. Piening, T. Naumann, C. Bifulco, et al. Scaling clinical trial matching using large language models: a case study in oncology. In Machine Learning for Healthcare Conference, pages 846–862. PMLR, 2023. M. Wornow, Y. Xu, R. Thapa, B. Patel, E. Steinberg, S. Fleming, M. A. Pfeffer, J. Fries, and N. H. Shah. The shaky foundations of large language models and foundation models for electronic health records. npj Digital Medicine, 6(1):135, 2023. M. Wornow, A. Lozano, D. Dash, J. Jindal, K. W. Mahaffey, and N. H. Shah. Zero-shot clinical trial patient matching with llms. arXiv preprint arXiv:2402.05125, 2024. C. Yuan, P. B. Ryan, C. Ta, Y. Guo, Z. Li, J. Hardin, R. Makadia, P. Jin, N. Shang, T. Kang, et al. Criteria2query: a natural language interface to clinical databases for cohort defini- tion. Journal of the American Medical Informatics Association, 26(4):294–305, 2019. K. Zeng, Z. Pan, Y. Xu, and Y. Qu. An ensemble learning strategy for eligibility criteria text classification for clinical trial recruitment: algorithm development and validation. JMIR Medical Informatics, 8(7):e17832, 2020. C. ZiHang, S. QianMin, C. GaoYi, H. JiHan, and L. Ying. Application scheme of clinical trial questionnaire pre recruitment integrating llm and knowledge graph. Available at SSRN 4713177, 2024.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96601	-
dc.description.abstract	傳統的臨床試驗招募方式通常以站點為中心，依賴患者資格篩查。這種方式效率較低，且往往難以招募足夠的參與者。患者一般處於被動等待試驗匹配的狀態，這使他們無法有效比較並選擇最適合的試驗。現有的以患者為中心的系統雖然簡化了資格標準並提供問卷，但仍需大量人工來處理標準重疊或重複的問題。為了改善這一情況，我們推出了 Patient-LITE，一個基於大型語言模型 (LLMs)技術的系統，能夠生成對於病患而言更易懂的問卷。它能將來自多個試驗的資格標準整合成一份簡潔的表單，達到 74-78% 的準確率。其技術創新在於混合專家方法 (mixure-of-experts)，將多個基於 GPT 的模型與優化策略相結合。主要特徵包括:(1)獨創的「分解後組合」資料管道流程(data pipeline)，(2) 經過微調(fine-tuning) 的小型模型，用於高質量且具成本效益的問題生成，以及 (3)提示工程以協助文字擷取與邏輯推理。 Patient-LITE 是首個基於 LLM 的跨試驗問卷設計框架，並可根據不同疾病進行靈活調整。我們提供了評估指標和微調數據集，對試驗資格的表示、分類和驗證文獻做出了貢獻。這項研究期待為後續醫學研究者提供啟發，協助其開發技術，促進醫學知識的普及。	zh_TW
dc.description.abstract	The traditional site-centric approach to clinical trial recruitment, which relies on patient eligibility screening, is often inefficient and struggles to enroll sufficient participants. Patients typically wait passively for trial matches, limiting their ability to compare and select the most suitable trials. Existing patient-centric systems simplify complex criteria and provide questionnaires, but still require significant manual effort to navigate overlapping or duplicated criteria. We introduce Patient-LITE, a tool that uses large language models (LLMs) to generate patient-friendly questionnaires consolidating eligibility criteria from multiple trials into one concise form, achieving a correctness rate of 74-78%. The technical innovation lies in a mixture-of-experts approach, combining multiple GPT-based models and optimization strategies. Key features include: (1) a break-then-assemble pipeline for cross-trial criterion extraction, (2) fine-tuned smaller models for high-quality, cost-efficient question generation, and (3) prompt engineering for context-aware logic and attribute extraction. Patient-LITE is the first LLM-based framework for generating inter-trial questionnaires, with adaptable prompts for various diseases. We provide evaluation metrics and a fine-tuned dataset, contributing to the literature on trial eligibility representation, classification, and verification. Our work can inspire researchers in the broader medical field to adopt similar methods, enhancing patient comprehension and democratizing medical knowledge.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-20T16:08:50Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-20T16:08:50Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 ................................................. i Abstract ............................................ iii Contents ............................................ v List of Figures ...................................... ix List of Tables ....................................... xi Chapter 1: Introduction 1 1.1 Backgrounds ........................................ 1 1.1.1 The Prevailing Recruitment Paradigm: Site-Centric Matching ......... 1 1.1.2 Transitioning to Patient-Centric Recruitment Solutions ............ 2 1.1.3 LLMs for Enhanced Patient-Centric Eligibility Matching ............ 4 1.2 Patient-LITE and Its Contribution .................. 6 1.2.1 Aims and Scope .................................. 6 1.2.2 Research Questions .............................. 6 1.2.3 Technical Breakthrough and Clinical Relevance ... 7 1.2.4 Contribution and Limitations ................... 9 Chapter 2: Literature Review 11 2.1 Trial Eligibility Extraction and Classification .... 11 2.2 Trial Eligibility Query and Questionnaire Generation 13 Chapter 3: Methods 15 3.1 System Overview .................................... 15 3.2 The "Breaker" Component ............................ 17 3.2.1 Pre-classification Stage ....................... 17 3.2.2 Classification Stage ........................... 19 3.3 The "Assembler" Component ......................... 21 3.3.1 Question Generation ............................ 21 3.3.2 Option Curation ................................ 23 3.4 Data and Model ..................................... 24 3.4.1 Dataset ........................................ 24 3.4.2 Model .......................................... 25 Chapter 4: Results 27 4.1 Descriptive Statistics ............................. 27 4.2 Sample Output ...................................... 29 4.2.1 Overview of Output ............................. 29 4.2.2 Highlight I: Fine-grained Categorization Enables Cross-Trial Criterion Integration ............. 31 4.2.3 Highlight II: Fine-tuning Enables Significant Improvement Using Low-cost Models .................. 34 4.2.4 Highlight III: Prompt Engineering Enables Context-Aware Logic and Attribute Extraction .......... 35 4.3 Quantitative and Qualitative Evaluation ............ 36 4.3.1 Overall Quantitative Performance ............... 36 4.3.2 Error Analysis on the Breaker .................. 39 4.3.3 Error Analysis on the Assembler ................ 42 Chapter 5: Discussion ................................ 47 Chapter 6: Conclusion ................................ 51 References .......................................... 53	-
dc.language.iso	en	-
dc.subject	臨床試驗媒合	zh_TW
dc.subject	病患導向	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	Patient-Centric	en
dc.subject	Large Language Models (LLMs)	en
dc.subject	Clinical Trial Matching	en
dc.title	Patient-LITE：利用大型語言模型，打造以患者為中心的通用型臨床試驗資格篩選問卷	zh_TW
dc.title	Patient-LITE: Empowering Patients with a Generalizable LLM-Based Questionnaire Generator for Accelerated Trial Eligibility Screening	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蘇柏翰;林盈安	zh_TW
dc.contributor.oralexamcommittee	Bo-Han Su;Ying-An Lin	en
dc.subject.keyword	大型語言模型,臨床試驗媒合,病患導向,	zh_TW
dc.subject.keyword	Large Language Models (LLMs),Clinical Trial Matching,Patient-Centric,	en
dc.relation.page	58	-
dc.identifier.doi	10.6342/NTU202500069	-
dc.rights.note	未授權	-
dc.date.accepted	2025-01-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	生醫電子與資訊學研究所	-
dc.date.embargo-lift	N/A	-
Appears in Collections:	生醫電子與資訊學研究所

Files in This Item:

File	Size	Format
ntu-113-1.pdf Restricted Access	1.1 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets