Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88407
Title: | Meta-EHR:針對具有高不平衡與缺失率電子醫療病歷之元學習方法 Meta-EHR: A meta-learning approach for electronic health records with a high imbalance ratio and missing rate |
Authors: | 張舒翔 Shu-Hsiang Chang |
Advisor: | 林澤 Che Lin |
Keyword: | 電子醫療病例,不平衡學習,缺失值,元學習,時間序列, Electronic health records,imbalanced learning,missing data,meta-learning,time-series, |
Publication Year : | 2023 |
Degree: | 碩士 |
Abstract: | 數據不平衡是深度學習中一個實際且關鍵的問題。此外,在現實世界中,電子醫療病例數據集經常遇到高缺失率的情況。 這兩個問題都可以視為數據中的雜訊,並可能對一般的深度學習算法的泛化性能造成不利影響。在醫療領域中,準確的預測和分類是至關重要的,因為這直接關係到患者的診斷和治療結果。為了解決這些問題,我們在本研究中引入了一種新的元學習(Meta-learning)方法,專門處理電子醫療病例數據集中的二元分類任務中的數據雜訊。這種元學習方法利用從平衡且低缺失率數據中選擇的子集信息,為每個樣本自動分配適當的權重。這樣的權重分配將增強有用信息的樣本並在訓練過程中抑制噪聲樣本的干擾。這樣的處理方式使得模型能夠更好地處理不平衡和缺失的數據,從而提高了預測的準確性和泛化能力。值得注意的是,這種元學習方法與深度學習模型的架構無關,可以同時應對高不平衡率和高缺失率問題。通過實驗驗證,我們證明了這種元學習方法在極端情況下具有更好的效果。在不平衡率為172且缺失率為74.6%的最極端情況下,我們的方法優於沒有元學習的原始模型多達10.3%的接收者操作特徵曲線(AUROC)下的面積,以及3.2%的精確召回曲線(AUPRC)下的面積。我們的結果為訓練有極度雜訊的EHR資料集的模型邁出了重要的第一步。這意味著我們能夠處理高度不平衡和高缺失率的數據,並且在這些困難情況下提升了模型的預測性能。我們相信,隨著更多的研究和創新,元學習將為深度學習領域帶來更多的突破和進展,並在解決實際問題時發揮更大的作用。 Data imbalance is a practical and crucial issue in deep learning. Moreover, real-world datasets, such as electronic health records (EHRs), often suffer from high missing rates. Both issues can be understood as noises in data that may lead to poor generalization results for standard deep learning algorithms. This thesis introduces a novel meta-learning approach to deal with these noise issues in an EHR dataset for a binary classification task. This meta-learning approach leverages the information from a selected subset of balanced, low-missing rate data to automatically assign proper weight to each sample. Such weights would enhance the informative samples and suppress the opposites during training. Furthermore, the meta-learning approach is model-agnostic for deep learning-based architectures that simultaneously handle the high imbalance ratio and high missing rate problems. Through experiments, we demonstrate that this meta-learning approach is better in extreme cases. In the most extreme one, with an imbalance ratio of 172 and a 74.6% missing rate, our method outperforms the original model without meta-learning by as much as 10.3% of the area under the receiver-operating characteristic curve (AUROC) and 3.2% of the area under the precision-recall curve (AUPRC). Our results mark the first step towards training a robust model for extremely noisy EHR datasets. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88407 |
DOI: | 10.6342/NTU202301473 |
Fulltext Rights: | 同意授權(限校園內公開) |
Appears in Collections: | 電信工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-111-2.pdf Access limited in NTU ip range | 2.71 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.