請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88355完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 周承復 | zh_TW |
| dc.contributor.advisor | Cheng-Fu Chou | en |
| dc.contributor.author | 吳添毅 | zh_TW |
| dc.contributor.author | Tien-Yi Wu | en |
| dc.date.accessioned | 2023-08-09T16:41:44Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-08-09 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-07-17 | - |
| dc.identifier.citation | 國家發展委員會(2022)。中華民國人口推估(2022年至2070年)(初版)。 臺北市:國家發展委員會。
衛生福利部國民健康署(2022)。民國一百零八年中老年身心社會生活狀況長期追蹤調查成果報告。臺灣老人研究叢刊系列(十四),臺北市。 取自: https://www.hpa.gov.tw/Pages/ashx/File.ashx?FilePath=~/File/Attach/1282/File_18237.pdf。 衛生福利部統計處(2022)。110年死因統計結果分析。 取自: https://www.mohw.gov.tw/dl-78404-173e483e-dcfc-4b50-ab35-54f8b0b568dd.html Y. J. Sheen, C. C. Hsu, Y. D. Jiang, C. N. Huang, J. S. Liu, and W. H. Sheu, "Trends in prevalence and incidence of diabetes mellitus from 2005 to 2014 in Taiwan," (in eng), J Formos Med Assoc, vol. 118 Suppl 2, pp. S66-s73, Nov 2019, doi: 10.1016/j.jfma.2019.06.016. B. Fletcher, M. Gulanick, and C. Lamendola, "Risk Factors for Type 2 Diabetes Mellitus," Journal of Cardiovascular Nursing, vol. 16, no. 2, pp. 17-23, 2002. [Online]. Available: https://journals.lww.com/jcnjournal/Fulltext/2002/01000/Risk_Factors_for_Type_2_Diabetes_Mellitus.3.aspx. C. Rong, O. Bruce, and F. Wuwei, "Diabetes and Stroke: Epidemiology, Pathophysiology, Pharmaceuticals and Outcomes," The American Journal of the Medical Sciences, vol. 351, no. 4, pp. 380-386, 2016, doi: https://doi.org/10.1016/j.amjms.2016.01.011. A. K. Boehme, C. Esenwa, and M. S. Elkind, "Stroke risk factors, genetics, and prevention," Circulation research, vol. 120, no. 3, pp. 472-495, 2017. A. Arboix, "Cardiovascular risk factors for acute stroke: Risk profiles in the different subtypes of ischemic stroke," (in eng), World J Clin Cases, vol. 3, no. 5, pp. 418-29, May 16 2015, doi: 10.12998/wjcc.v3.i5.418. W. Wang et al., "A longitudinal study of hypertension risk factors and their relation to cardiovascular disease: the Strong Heart Study," Hypertension, vol. 47, no. 3, pp. 403-409, 2006. P. Balakumar, K. Maung-U, and G. Jagadeesh, "Prevalence and prevention of cardiovascular disease and diabetes mellitus," Pharmacological Research, vol. 113, pp. 600-609, 2016/11/01/ 2016, doi: https://doi.org/10.1016/j.phrs.2016.09.040. H. Mamdouh et al., "Prevalence and associated risk factors of hypertension and pre-hypertension among the adult population: findings from the Dubai Household Survey, 2019," BMC Cardiovascular Disorders, vol. 22, no. 1, p. 18, 2022/01/28 2022, doi: 10.1186/s12872-022-02457-4. 劉介宇 et al.(2006)。台灣地區鄉鎮市區發展類型應用於大型健康調查抽樣設計之研究。健康管理學刊, 4(1),頁 1-22。 doi: 10.29805/JHM.200606.0001。 歐鎧豪(2022)。用於失智症預測的多模態注意力網路。國立臺灣大學資訊工程學研究所碩士學位論文,臺北市。 S. Rendle, "Factorization Machines," in 2010 IEEE International Conference on Data Mining, 13-17 Dec. 2010 2010, pp. 995-1000, doi: 10.1109/ICDM.2010.127. H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, "DeepFM: A Factorization-Machine based Neural Network for CTR Prediction," presented at the Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017. [Online]. Available: https://doi.org/10.24963/ijcai.2017/239. L. García-Olmos et al., "Comorbidity Patterns in Patients with Chronic Diseases in General Practice," PLOS ONE, vol. 7, no. 2, p. e32141, 2012, doi: 10.1371/journal.pone.0032141. E. Loza, J. A. Jover, L. Rodriguez, and L. Carmona, "Multimorbidity: prevalence, effect on quality of life and daily functioning, and variation of this effect when one condition is a rheumatic disease," (in eng), Semin Arthritis Rheum, vol. 38, no. 4, pp. 312-9, Feb 2009, doi: 10.1016/j.semarthrit.2008.01.004. G. Kim, H. Lim, Y. Kim, O. Kwon, and J.-H. Choi, "Intra-person multi-task learning method for chronic-disease prediction," Scientific Reports, vol. 13, no. 1, p. 1069, 2023/01/19 2023, doi: 10.1038/s41598-023-28383-9. R. Feng et al., "ChroNet: A multi-task learning based approach for prediction of multiple chronic diseases," Multimedia Tools and Applications, vol. 81, no. 29, pp. 41511-41525, 2022/12/01 2022, doi: 10.1007/s11042-020-10482-8. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. M. E. Peters et al., "Deep Contextualized Word Representations," in North American Chapter of the Association for Computational Linguistics, 2018. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of NAACL (2019), Minneapolis, Minnesota, June 2019: Association for Computational Linguistics, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, doi: 10.18653/v1/N19-1423. [Online]. Available: https://aclanthology.org/N19-1423 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," Advances in neural information processing systems, vol. 26, 2013. Y. Choi, C. Y. Chiu, and D. Sontag, "Learning Low-Dimensional Representations of Medical Concepts," (in eng), AMIA Jt Summits Transl Sci Proc, vol. 2016, pp. 41-50, 2016. Z. Che, Y. Cheng, Z. Sun, and Y. Liu, "Exploiting convolutional neural network for risk prediction with medical feature embedding," arXiv preprint arXiv:1701.07474, 2017. S. Hochreiter and J. Schmidhuber, "Long short-term memory," (in eng), Neural Comput, vol. 9, no. 8, pp. 1735-80, Nov 15 1997, doi: 10.1162/neco.1997.9.8.1735. A. Vaswani et al., "Attention is All you Need," 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," presented at the Proceedings of the European Conference on Computer Vision (ECCV), September, 2018. C.-F. Yeh et al., "Transformer-transducer: End-to-end speech recognition with self-attention," arXiv preprint arXiv:1910.12977, 2019. D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2014. S. Chaudhari, G. Polatkan, R. Ramanath, and V. Mithal, "An Attentive Survey of Attention Models," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 12, pp. 1 - 32, 2019. G. Jonas, A. Michael, G. David, Y. Denis, and D. Yann, "Convolutional Sequence to Sequence Learning," 2017. J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun, "XDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems," presented at the Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. [Online]. Available: https://doi.org/10.1145/3219819.3220023. R. Wang, B. Fu, G. Fu, and M. Wang, "Deep & Cross Network for Ad Click Predictions," presented at the Proceedings of the ADKDD'17 , articleno = 12 , numpages = 7, 2017. [Online]. Available: https://doi.org/10.1145/3124749.3124754. R. Wang et al., "DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-Scale Learning to Rank Systems," presented at the Proceedings of the Web Conference 2021, 2021. [Online]. Available: https://doi.org/10.1145/3442381.3450078. Y. Wang, C. Zhai, and H. Hassan, "Multi-task Learning for Multilingual Neural Machine Translation," Online, November 2020: Association for Computational Linguistics, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1022-1034, doi: 10.18653/v1/2020.emnlp-main.75. [Online]. Available: https://aclanthology.org/2020.emnlp-main.75 S. Ruder, "An overview of multi-task learning in deep neural networks," arXiv preprint arXiv:1706.05098, 2017. J. Baxter, "A Bayesian/information theoretic model of learning to learn via multiple task sampling," Machine learning, vol. 28, no. 1, pp. 7-39, 1997. Y. Yang and T. M. Hospedales, "Trace norm regularised deep multi-task learning," arXiv preprint arXiv:1606.04038, 2016. H. Tang, J. Liu, M. Zhao, and X. Gong, "Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations," in Fourteenth ACM Conference on Recommender Systems, 2020, pp. 269-278. I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, "Cross-stitch networks for multi-task learning," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3994-4003. J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi, "Modeling task relationships in multi-task learning with multi-gate mixture-of-experts," in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1930-1939. D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," CoRR, vol. abs/1412.6980, 2014. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014. J. N. Mandrekar, "Receiver operating characteristic curve in diagnostic test assessment," (in eng), J Thorac Oncol, vol. 5, no. 9, pp. 1315-6, Sep 2010, doi: 10.1097/JTO.0b013e3181ec173d. L. Van der Maaten and G. Hinton, "Visualizing data using t-SNE," Journal of machine learning research, vol. 9, no. 11, 2008. L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001. J. L. Birk, I. M. Kronish, N. Moise, L. Falzon, S. Yoon, and K. W. Davidson, "Depression and multimorbidity: Considering temporal characteristics of the associations between depression and multiple chronic diseases," (in eng), Health Psychol, vol. 38, no. 9, pp. 802-811, Sep 2019, doi: 10.1037/hea0000737. W. V. Bobo, B. R. Grossardt, S. Virani, J. L. St Sauver, C. M. Boyd, and W. A. Rocca, "Association of Depression and Anxiety With the Accumulation of Chronic Conditions," JAMA Network Open, vol. 5, no. 5, pp. e229817-e229817, 2022, doi: 10.1001/jamanetworkopen.2022.9817. A. Martn et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," ArXiv, vol. abs/1603.04467, 2016. M. n. Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning," presented at the Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, 2016. F. Pedregosa et al., "Scikit-learn: Machine Learning in P ython," Journal of Machine Learning Research, vol. 12, pp. 2825--2830, 2011. C. R. Harris et al., "Array programming with NumPy," Nature, vol. 585, no. 7825, pp. 357-362, 2020/09/01 2020, doi: 10.1038/s41586-020-2649-2. J. Reback et al. "pandas-dev/pandas: Pandas 1.3.5." Zenodo. W. McKinney, "Data structures for statistical computing in python," in Proceedings of the 9th Python in Science Conference, 2010, vol. 445, no. 1: Austin, TX, pp. 51-56. T. A. Caswell et al. "matplotlib/matplotlib: REL: v3.5.1." Zenodo. M. L. Waskom, "seaborn: statistical data visualization," Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021 2021, doi: 10.21105/joss.03021. J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger, "DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network," BMC medical research methodology, vol. 18, no. 1, pp. 1-12, 2018. W. Cao, V. Mirjalili, and S. Raschka, "Rank consistent ordinal regression for neural networks with application to age estimation," Pattern Recognition Letters, vol. 140, pp. 325-331, 2020. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88355 | - |
| dc.description.abstract | 隨著臺灣醫療水平的進步,人口結構產生了劇烈的變化。我國現階段已是高齡社會(aged society)的一員,老年人口的健康管理成為一個需要被關注的重要課題。慢性疾病(Chronic Disease)對患者的生活品質和長期健康產生極大的影響,因此,慢性疾病風險的預測具有重大意義。其中,糖尿病(Diabetes Mellitus,DM)、心臟病(Heart Disease)、腦中風(Stroke)和高血壓(Hypertension)是臺灣高齡族群中常見的慢性疾病。
罹患慢性疾病必然會造成民眾的經濟負擔,為此保險公司提供多樣保險種類供民眾選擇。然而,民眾需要耗費大量的心力來研究適合自身的保險種類;保險業者也需要耗費人力對投保民眾的健康狀態進行風險評估。有鑑於此,本論文旨在透過深度學習技術,利用病患的個人資訊(Personal Information)以及就醫紀錄(Medical Records)來預測病患罹患糖尿病、心臟病、腦中風和高血壓的風險程度。對於保險業者而言,可以利用模型對不同地區預測出的罹患疾病風險程度,對客戶進行簡單的分群,以加速並制定完善的核保流程,並且業務可以針對不同客戶群推薦適合的保險種類,以實現雙贏效果。 在本研究中,我們將運用衛生福利資料科學中心(Health and Welfare Data Science Center,HWDC)提供的隨機抽樣資料集,其中包含了200萬人的就醫資料,作為我們疾病預測模型的訓練資料。此資料集不僅涵蓋了個人資訊(例如:年齡、性別…..),亦包括病患的就醫紀錄,其中包含大量醫學文獻所提及與糖尿病、心臟病、腦中風和高血壓相關的風險因素(Risk Factor)。 本論文旨在透過多任務學習(Multi-Task Learning)的概念,將原本僅適用於單一疾病預測的點擊率(Click Through Rate,CTR)預測模型以及多模態網路(Multi-Modal Network)模型拓展成可以同時預測多種疾病的多任務學習模型。透過此方法,我們能在降低大量模型參數並節省訓練時間的情況下,讓模型保有一定的預測能力,甚至優於單任務學習(Single-Task Learning)訓練出來的模型性能表現。這樣的結果有助於印證糖尿病、心臟病、腦中風和高血壓之間的直接或間接關聯,並與醫學文獻的看法相一致。 除了降低模型參數和訓練時間的優點外,本研究亦探索了Self-Attention機制中注意力分數(Attention Score)對於就醫紀錄(Medical Records)中疾病之間的解釋性,以發現對於模型預測風險程度有較大影響的高風險疾病或是相關共病症;除此之外,我們還會進一步分析個人資訊(如:年齡、性別……)對模型性能的影響。最終實驗結果與醫學文獻中所陳述之危險因素(Risk Factor)相互印證。 | zh_TW |
| dc.description.abstract | With the advancement of Taiwan's medical technology, there have been drastic changes in the demographic structure. Currently, our country is a member of the aged society, and the health management of elderly population has become an important issue that requires attention. Chronic diseases have a significant impact on patients' quality of life and long-term health. Therefore, the prediction of chronic disease risks holds great significance. Among them, diabetes mellitus (DM), heart disease, stroke, and hypertension are common chronic diseases in Taiwan's elderly population.
The occurrence of chronic diseases inevitably leads to financial burdens on the public, prompting insurance companies to offer various types of insurance for people to choose from. However, individuals need to invest a considerable amount of effort in researching the insurance options that suit their needs, while insurance providers need to allocate resources to assess the health risks of insured individuals. In light of this, this paper aims to utilize deep learning techniques to predict the risk levels of patients developing diabetes, heart disease mellitus, stroke, and hypertension using personal information and medical records. For insurance providers, the model can be used to predict disease risk levels in different region, enabling simple customer segmentation to accelerate and refine the underwriting process. Additionally, sales representatives can recommend suitable insurance types to different customer groups, achieving a win-win outcome. In this paper, we will use the dataset provided by Health and Welfare Data Science Center (HWDC), which includes medical records of 2 million individuals, as our training data for the disease prediction model. This dataset not only encompasses personal information (e.g., age, gender, etc.) but also includes patients' medical records, which contain a wealth of risk factors related to diabetes mellitus, heart disease, stroke, and hypertension mentioned in plenty of medical literature. The purpose of this paper is to extend the originally single disease prediction model, such as Click Through Rate (CTR) model and Multi-Modal Network model, to a multi-task learning model capable of simultaneously predicting multiple risks of diseases. By applying the concept of multi-task learning, we can maintain a certain level of predictive ability of the model while reducing a significant number of model parameters and saving training time. In fact, the performance of the multi-task learning model may even surpass the single-task learning model. Such results help validate the direct or indirect correlations among diabetes mellitus, heart disease, stroke, and hypertension, aligning with perspectives found in medical literature. In addition to the advantages of reducing model parameters and training time, this paper also explores the interpretability of Attention Score in the Self-Attention mechanism concerning diseases in medical records. The goal is to discover high-risk diseases or related multimorbidity that have a significant impact on the model's performance. Furthermore, we will analyze the influence of personal information such as age and gender on the model's performance. The ultimate experimental results corroborate the risk factors stated in medical literature. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:41:44Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-08-09T16:41:44Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
致謝 iii 中文摘要 v Abstract vii 《I》 Introduction 1 《II》 Related Work 5 A. Word2Vec 5 B. LSTM(Long Short-Term Memory) 6 C. Self-Attention 7 1. Self-Attention Mechanism 8 2. Multi-Head Self-Attention 10 3. Positional Encoding 10 D. Click Through Rate(CTR) 11 E. Multi-Task Learning 12 1. Hard Parameter Sharing 13 2. Soft Parameter Sharing 13 3. Multi-Task Learning小結 14 《III》 Problem Definition and Methodology 17 A. 問題定義 17 B. 資料集 18 1. 資料庫介紹 18 1) 全民健保承保檔(H_NHI_ENROL) 18 2) 全民健保處方及治療明細檔(H_NHI_OPTDE) 19 2. 資料前處理 20 1) Merge H_NHI_ENROL and H_NHI_OPDTE 20 2) Dataset for Pre-trained ICD Word2Vec Model 20 3) Dataset for Disease Prediction Model 21 C. 模型架構 23 1. Pre-trained ICD Word2Vec Model 23 2. Single-Task Learning Disease Prediction Model 24 1) Multi-Modal based 24 2) CTR Based 25 3. Multi-Task Learning Disease Prediction Model 25 1) Multi-Modal based 26 2) CTR based 26 D. 模型訓練配置 28 《IV》 Experiments 29 A. Evaluation Metrics 30 1. 混淆矩陣(Confusion Matrix) 30 2. ROC-AUC(Area Under ROC Curve) 31 3. 對數損失(Log Loss) 31 4. Balanced Accuracy 32 5. F1 Score 32 B. 資料視覺化 33 C. 單任務&多任務模型性能表現與參數數量 35 D. 分群和訓練先後順序之差異 40 1. 疾病與地區之卡方獨立性檢測 41 2. 訓練和分群順序之性能表現差異 41 E. 就醫紀錄對模型性能的影響 44 1. 就醫紀錄隨機屏蔽 44 2. 就醫紀錄在時間軸上的平移 46 3. 就醫紀錄在時間軸上隨機排序 47 4. 就醫紀錄對於模型預測之重要性總結 49 F. 個人資訊對模型性能的影響 49 G. 不同年齡區間訓練之性能表現 50 H. ICD碼的注意力分數(Attention Score) 52 1. 高注意力分數之ICD碼 52 2. 屏蔽ICD碼之性能表現 53 3. 注意力分數(Attention Score)解釋性總結 54 I. 實驗平台 55 《V》 Conclusion 57 《VI》 Future Work 59 A. 改進方向 59 1. 添加其他任務 59 2. 模型架構 59 3. 任務關聯性 59 B. 相關議題 60 1. Survival Analysis 60 2. Ordinal Regression 60 參考文獻 61 附錄 67 A. 高注意力分數(Attention Score)之ICD碼 67 B. 屏蔽ICD碼之性能表現降幅排序 70 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 多模態網路 | zh_TW |
| dc.subject | 多任務學習 | zh_TW |
| dc.subject | 慢性疾病 | zh_TW |
| dc.subject | ICD碼嵌入 | zh_TW |
| dc.subject | 疾病發生率預測 | zh_TW |
| dc.subject | Chronic Disease | en |
| dc.subject | ICD Code Embedding | en |
| dc.subject | Disease Incidence Prediction | en |
| dc.subject | Multi-Modal Network | en |
| dc.subject | Multi-Task Learning | en |
| dc.title | 用於慢性疾病預測之多任務學習多模態網路 | zh_TW |
| dc.title | Multi-Task Learning Multi-Modal Network for Chronic Diseases Prediction | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 吳曉光;呂政修;歐昱言;修丕承 | zh_TW |
| dc.contributor.oralexamcommittee | Hsiao-Kuang Wu;Jenq-Shiou Leu;Yu-Yen Ou;Pi-Cheng Hsiu | en |
| dc.subject.keyword | ICD碼嵌入,慢性疾病,疾病發生率預測,多任務學習,多模態網路, | zh_TW |
| dc.subject.keyword | ICD Code Embedding,Chronic Disease,Disease Incidence Prediction,Multi-Task Learning,Multi-Modal Network, | en |
| dc.relation.page | 71 | - |
| dc.identifier.doi | 10.6342/NTU202301508 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2023-07-18 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf 未授權公開取用 | 2.52 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
