請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86728
標題: | 使用輕量化自注意力機制在高缺值時序列電子醫療病歷上的應用 Using Lightweight Self-attention Based Models on High Missing Rate Time Series Electronic Health Records |
作者: | 劉沛穎 Pei-Ying Liu |
指導教授: | 林澤 Che Lin |
關鍵字: | 機器學習,電子醫療病例,肝癌,缺值,自注意力機制, machine learning,electronic health record,liver cancer,missing value,self attention mechanism, |
出版年 : | 2022 |
學位: | 碩士 |
摘要: | 近年來,機器學習的技術被各個領域廣泛應用,其中也包括醫學領域。在醫學領域中,病人的資料會以電子病歷的方式被儲存下來,這種電子病歷具有一些特性,例如:每個病患就診的頻率不一樣、每個病患就診時所做的量測項目也會有差異,這些因素都會導致電子病歷有嚴重的缺值問題。此外,絕大多數的病人是比較健康的,僅有少數的人會得到嚴重疾病,因此,在資料標籤方面會有很嚴重的不平衡。這些議題都是電子病歷需要深入探討解決的問題。本研究的方法為世代研究 (Cohort Study),主要探討模型輸入在某個觀察值之後的一年內的所有觀測值,能夠預測五年內病人得到肝癌的風險。本篇論文探討利用 Transformer模型中的encoder,利用 half-head attention 提取特徵和遮罩之間的關聯性,應用focal rank loss 將病患的風險強化排序訓練,以及透過 pre-training 的技術,讓模型 在不同的任務中能夠有更好的初始化參數,並在 AUPRC、AUROC 和 concordance index 上有更好的表現。根據實驗的結果,half-head attention 以及 focal rank loss 皆能有效提升模型的表現和穩定度,而 pre-training 在子群分析中也有很好的成效。我們的結果表明,將基於自我注意的模型直接應用於 EHR 可能並不總是最佳結果。使用我們特別設計的模型來處理具有高缺失率的 EHR 能有更好的成效。 In recent years, machine learning technology has been widely used in various fields, including medicine. In the medical field, patient information is stored in the form of electronic medical records. Since the frequency of each patient's visits and lab tests varies, high missing rates are often observed in electronic medical records. In addition, the vast majority of patients are relatively healthy, with only a minority developing severe diseases. There is a serious imbalance in data labeling. These issues are crucial for electronic medical records and need to be resolved. In this thesis, we consider a cohort study, which mainly explores the model input of all observations within one year after a certain entry condition and outputs the risk of liver cancer within the next five years. We further discuss the use of the Transformer-based model and half-head attention to extracting the correlation between features and masks. We further use the focal rank loss to strengthen the ranking nature of patients' risks and use the pre-training technology for different tasks, improving the model performance on AUPRC, AUROC, and concordance index. According to the experimental results, both half-head attention and focal rank loss can improve the performance and stability of the model, and the pre-training technique also has good results in subgroup analysis. Our results suggest that applying self-attention-based models directly to EHR may not always provide the best results. EHR data with high missing rates can perform better using our well-designed model. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86728 |
DOI: | 10.6342/NTU202201849 |
全文授權: | 同意授權(全球公開) |
電子全文公開日期: | 2027-07-28 |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-110-2.pdf 此日期後於網路公開 2027-07-28 | 5.86 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。