使用輕量化自注意力機制在高缺值時序列電子醫療病歷上的應用

劉沛穎; Pei-Ying Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86728

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林澤	zh_TW
dc.contributor.advisor	Che Lin	en
dc.contributor.author	劉沛穎	zh_TW
dc.contributor.author	Pei-Ying Liu	en
dc.date.accessioned	2023-03-20T00:13:54Z	-
dc.date.available	2023-12-27	-
dc.date.copyright	2022-08-23	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [3] J. Liu, Z. Zhang, and N. Razavian, “Deep ehr: Chronic disease prediction using medical notes,” in Machine Learning for Healthcare Conference. PMLR, 2018, pp. 440–464. [4] Y. Xue, D. Klabjan, and Y. Luo, “Predicting icu readmission using grouped physiological and medication trends,” Artificial intelligence in medicine, vol. 95, pp. 27–37, 2019. [5] A. J. Steele, S. C. Denaxas, A. D. Shah, H. Hemingway, and N. M. Luscombe, “Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease,” PloS one, vol. 13, no. 8, p. e0202344, 2018. [6] Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” Scientific reports, vol. 8, no. 1, pp.1–12, 2018. [7] B. K. Beaulieu-Jones, D. R. Lavage, J. W. Snyder, J. H. Moore, S. A. Pendergrass, and C. R. Bauer, “Characterizing and managing missing structured data in electronic health records: data analysis,” JMIR medical informatics, vol. 6, no. 1, p. e8960, 2018. [8] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019. [9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”arXiv preprint arXiv:1907.11692, 2019. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [11] K. He, R. Girshick, and P. Dollár, “Rethinking imagenet pre-training,” in Proceedings of the IEEE/ CVF International Conference on Computer Vision, 2019, pp. 4918–4927. [12] S. Kornblith, J. Shlens, and Q. V. Le, “Do better imagenet models transfer better?” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2661–2671. [13] E. Choi, C. Xiao, W. Stewart, and J. Sun, “Mime: Multilevel medical embedding of electronic health records for predictive healthcare,” Advances in neural information processing systems, vol. 31, 2018. [14] J. Shao and B. Zhong, “Last observation carry-forward and last observation analysis,” Statistics in medicine, vol. 22, no. 15, pp. 2429–2441, 2003. [15] M. W. Gardner and S. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences,” Atmospheric environment, vol. 32, no. 14-15, pp. 2627–2636, 1998. [16] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Icml, 2010. [17] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016. [18] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. [19] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–186. [20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014. [21] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015. [22] Y. Li, S. Rao, J. R. A. Solares, A. Hassaine, R. Ramakrishnan, D. Canoy, Y. Zhu, K. Rahimi, and G. Salimi-Khorshidi, “Behrt: transformer for electronic health records,” Scientific reports, vol. 10, no. 1, pp. 1–12, 2020. [23] Y. Meng, W. Speier, M. K. Ong, and C. W. Arnold, “Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 8, pp. 3121–3129, 2021. [24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [25] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016. [26] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. [27] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 129–136. [28] F. E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, “Evaluating the yield of medical tests,” Jama, vol. 247, no. 18, pp. 2543–2546, 1982. [29] T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets,” PloS one, vol. 10, no. 3, p. e0118432, 2015. [30] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631. [31] G. Papatheodoridis, G. Dalekos, V. Sypsa, C. Yurdaydin, M. Buti, J. Goulis, J. L. Calleja, H. Chi, S. Manolakopoulos, G. Mangia et al., “Page-b predicts the risk of developing hepatocellular carcinoma in caucasians with chronic hepatitis b on 5-year antiviral therapy,” Journal of hepatology, vol. 64, no. 4, pp. 800–806, 2016. [32] H.-I. Yang, M.-F. Yuen, H. L.-Y. Chan, K.-H. Han, P.-J. Chen, D.-Y. Kim, S.-H. Ahn, C.-J. Chen, V. W.-S. Wong, W.-K. Seto et al., “Risk estimation for hepatocel- lular carcinoma in chronic hepatitis b (reach-b): development and validation of a predictive score,” The lancet oncology, vol. 12, no. 6, pp. 568–574, 2011. [33] C. Chen, A. Liaw, L. Breiman et al., “Using random forest to learn imbalanced data,” University of California, Berkeley, vol. 110, no. 1-12, p. 24, 2004. [34] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794. [35] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014. [36] Cheng, “Deep sti: Deep stochastic time-series imputation on electronic medical records,” 2021, unpublished. [37] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2114–2124. [38] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, http://www.deeplearningbook.org. [39] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining. Pearson Education India, 2016. [40] J. Honaker and G. King, “What to do about missing values in time-series cross-section data,” American journal of political science, vol. 54, no. 2, pp. 561–581, 2010. [41] T.-Y. Liu et al., “Learning to rank for information retrieval,” Foundations and Trends® in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009. [42] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116, 1998. [43] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [44] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020. [45] M. Huh, P. Agrawal, and A. A. Efros, “What makes imagenet good for transfer learning?” arXiv preprint arXiv:1608.08614, 2016. [46] C. Song, Y. Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1179–1188. [47] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86728	-
dc.description.abstract	近年來，機器學習的技術被各個領域廣泛應用，其中也包括醫學領域。在醫學領域中，病人的資料會以電子病歷的方式被儲存下來，這種電子病歷具有一些特性，例如：每個病患就診的頻率不一樣、每個病患就診時所做的量測項目也會有差異，這些因素都會導致電子病歷有嚴重的缺值問題。此外，絕大多數的病人是比較健康的，僅有少數的人會得到嚴重疾病，因此，在資料標籤方面會有很嚴重的不平衡。這些議題都是電子病歷需要深入探討解決的問題。本研究的方法為世代研究 (Cohort Study)，主要探討模型輸入在某個觀察值之後的一年內的所有觀測值，能夠預測五年內病人得到肝癌的風險。本篇論文探討利用 Transformer模型中的encoder，利用 half-head attention 提取特徵和遮罩之間的關聯性，應用focal rank loss 將病患的風險強化排序訓練，以及透過 pre-training 的技術，讓模型在不同的任務中能夠有更好的初始化參數，並在 AUPRC、AUROC 和 concordance index 上有更好的表現。根據實驗的結果，half-head attention 以及 focal rank loss 皆能有效提升模型的表現和穩定度，而 pre-training 在子群分析中也有很好的成效。我們的結果表明，將基於自我注意的模型直接應用於 EHR 可能並不總是最佳結果。使用我們特別設計的模型來處理具有高缺失率的 EHR 能有更好的成效。	zh_TW
dc.description.abstract	In recent years, machine learning technology has been widely used in various fields, including medicine. In the medical field, patient information is stored in the form of electronic medical records. Since the frequency of each patient's visits and lab tests varies, high missing rates are often observed in electronic medical records. In addition, the vast majority of patients are relatively healthy, with only a minority developing severe diseases. There is a serious imbalance in data labeling. These issues are crucial for electronic medical records and need to be resolved. In this thesis, we consider a cohort study, which mainly explores the model input of all observations within one year after a certain entry condition and outputs the risk of liver cancer within the next five years. We further discuss the use of the Transformer-based model and half-head attention to extracting the correlation between features and masks. We further use the focal rank loss to strengthen the ranking nature of patients' risks and use the pre-training technology for different tasks, improving the model performance on AUPRC, AUROC, and concordance index. According to the experimental results, both half-head attention and focal rank loss can improve the performance and stability of the model, and the pre-training technique also has good results in subgroup analysis. Our results suggest that applying self-attention-based models directly to EHR may not always provide the best results. EHR data with high missing rates can perform better using our well-designed model.	en
dc.description.provenance	Made available in DSpace on 2023-03-20T00:13:54Z (GMT). No. of bitstreams: 1 U0001-2807202218050900.pdf: 5995830 bytes, checksum: c65b4eab0e1156a13497702f92582ca3 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 ii 摘要 iii Abstract iv Contents vi List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Data 4 2.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Cohort Study Data Selection . . . . . . . . . . . . . . . . . . 13 2.3.3 Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Subgroups of the Cohort Data . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3 Methods 18 3.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . 18 3.2 Self-Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 BERT: Pre-Training of Deep Bidirectional Transformers . . . 21 3.3 Transformer-Based EHR Models (TransEHR) . . . . . . . . . . . . 22 3.3.1 Weight Constraint (WC-TransEHR) . . . . . . . . . . . . . 25 3.3.2 Half-head Attention (HHA-TransEHR) . . . . . . . . . . . . 26 3.4 Focal Rank Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Imputation Transformer (TransEHR-impute) . . . . . . . . . . . . . 29 3.6 Pretrain and Finetune . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 4 Experiments 31 4.1 Prediction Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 AUPRC & AUROC . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 Concordance Index . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Baseline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Self-attention Benchmark (TransEHR) . . . . . . . . . . . . . . . . . 35 4.6 Weight Constraint and Half-head Attention . . . . . . . . . . . . . . 36 4.7 Rank Loss with Focal Loss . . . . . . . . . . . . . . . . . . . . . . . 37 4.8 Runtime Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.9 Survival Plot for First Diagnosis Patients . . . . . . . . . . . . . . . 39 Chapter 5 Discussion 41 5.1 Half-head Attention and Focal Rank Loss Achieve Better Performance on Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Focal Rank Loss Can Improve Concordance Index . . . . . . . . . . 47 5.3 The Effect of Adding Mask to the Prediction . . . . . . . . . . . . . 49 5.4 Comparison of TransEHR and Half-Head Attention with a Similar Number of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5 Pre-training in the First Diagnosis Group Can Improve the Perfor- mance of Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.6 Using a Full Transformer as the Imputer . . . . . . . . . . . . . . . . 52 5.7 Summarization Improves Model Performance . . . . . . . . . . . . . 53 5.8 Future Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 6 Conclusion 57 Bibliography 59	-
dc.language.iso	en	-
dc.title	使用輕量化自注意力機制在高缺值時序列電子醫療病歷上的應用	zh_TW
dc.title	Using Lightweight Self-attention Based Models on High Missing Rate Time Series Electronic Health Records	en
dc.type	Thesis	-
dc.date.schoolyear	110-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王偉仲;陳縕儂;蘇東弘	zh_TW
dc.contributor.oralexamcommittee	Wei-Zhong Wang;Yun-Nong Chen;Dong-Hong Su	en
dc.subject.keyword	機器學習,電子醫療病例,肝癌,缺值,自注意力機制,	zh_TW
dc.subject.keyword	machine learning,electronic health record,liver cancer,missing value,self attention mechanism,	en
dc.relation.page	65	-
dc.identifier.doi	10.6342/NTU202201849	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-07-29	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	2027-07-28	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-110-2.pdf 此日期後於網路公開 2027-07-28	5.86 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。