利用機器學習方式自動判讀病歷並產生ICD-10編碼

Yu-Hsuan Chang; 張宇軒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73336

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆
dc.contributor.author	Yu-Hsuan Chang	en
dc.contributor.author	張宇軒	zh_TW
dc.date.accessioned	2021-06-17T07:29:09Z	-
dc.date.available	2019-07-03
dc.date.copyright	2019-07-03
dc.date.issued	2019
dc.date.submitted	2019-06-18
dc.identifier.citation	[1] 'International Classification of Diseases (ICD)'. World Health Organization. Retrieved 23 November 2010. [2] 'ICD-10 Version:2015'. apps.who.int. Retrieved 23 May 2017. [3] Farkas R, Szarvas G. Automatic construction of rule-based ICD-9-CM coding systems. Bioinformatics.2008;9(Suppl3): S10.doi:10.1186/1471-2105-9-S3-S10. [4] Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao, 'Recurrent Convolutional Neural Networks for Text Classification' presented at AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence. [5] Mikolov, T., M. Karafiát, L. Burget, et al. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). 2010. [6] Koopman B, Zuccon G, Nguyen A, Bergheim A, Grayson N. Automatic ICD-10 classification of cancers from free-text death certificates. Int J Med Inform. 2015 Nov;84(11):956–65. doi: 10.1016/j.ijmedinf.2015.08.004. [7] Yoon Kim. Convolutional Neural Networks for Sentence Classification. [8] Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, pages 1422– 1432. [9] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. [10] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013). [11] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008. [12] T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems, pages 1{15. LNCS Vol. 1857, Springer, 2001. [13] Joyce J.M., Kullback-leibler divergence. International Encyclopedia of Statistical Science, Sprienger (2011) [14] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), 2006, pp. 161–168. [15] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. 2016. MIT Press. [16] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555[cs], December 2014 [17] Steven Bird. 2006. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions, COLING-ACL ’06, pages 69–72, Stroudsburg, PA, USA. Association for Computational Linguistics. [18] Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 3 (3), 1–13. [19] Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43 (11):1130-1139. [20] R. JeffreyPennington and C. Manning. Glove: Global vectors for word representation. 2014. 2
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73336	-
dc.description.abstract	目前疾病分類主要依靠人力閱讀大量的文字資料作為分類的依據，一位專業的疾病分類員需要長時間的專業訓練才能進行ICD-10分類的複雜作業，而這項工作即便是由專業的疾病分類人員來進行都需要花費大量的時間才能對一個病人做出正確的編碼，像這樣的工作若能由人工智慧做取代，能夠為醫院省下大量的人力，使得醫療體系的工作變得更加完善。為了達到這樣的目的，我們希望能建立一套ICD-10代碼的自動分類系統，能夠閱讀並處理醫生所寫下的文字資料，由輸入這些文字類型的資料，例如出院診斷，手術紀錄或是病史等等的文字記錄，藉由自然語言處理的技術，配合機器學習的分類方式學習這些代碼的分類規則，最後得到相對應的ICD-10代碼。在我們的研究中，受試者的資料包含臺大醫院使用 ”國際疾病傷害及死因分類標準第十版” 作為疾病分類標準的所有出院病人的入院診斷、轉出加護病房指示、身體檢查、住院治療經過、併發症、檢驗紀錄、檢查進度、影像報告、病理報告、出院診斷、主訴、病史、手術紀錄，以及疾病分類人員最終的編碼結果，資料中不需要關於病人的個人資料，病人資料的個人隱私與此研究不會有相關聯。	zh_TW
dc.description.abstract	Our study aimed to construct a system for ICD-10 coding system, produced by supervised machine learning techniques, in order to categorize automatically free-text medical data using solely their content. There are numerous machine learning techniques and we use supervised machine learning to learn how to classify the ICD-10 codes from free-text data. At present, the work of classifying diseases mainly relies on manpower to read a large amount of written materials, such as discharge diagnosis, chief complaint, medical history, operation records and so on as the basis for classification. Coding is both laborious and time consuming. A disease coder with professional abilities also takes an average of 20 minutes, if we can provide an automatic code classification system with enough accuracy compared with professional coder, this model can significantly reduce the human labor in the code classification time.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T07:29:09Z (GMT). No. of bitstreams: 1 ntu-108-R05945039-1.pdf: 2994786 bytes, checksum: c559162e5cdd69ae1699e06d14c52abf (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 1 中文摘要 2 ABSTRACT 3 CONTENTS 4 LIST OF FIGURES 7 LIST OF TABLES 9 Chapter 1 Introduction 10 Chapter 2 Background 12 Chapter 3 Related work 17 3.1 Text classification 17 3.2 Neural network 19 3.3 ICD classification 19 Chapter 4 Data description 21 4.1 Data introduction 21 4.2 7 classes of free-text data 21 4.2.1 Chief complaint 21 4.2.2 Course and treatment 23 4.2.3 History 25 4.2.4 Pathology report 26 4.2.5 Physical examination 28 4.2.6 Discharge diagnosis 30 4.2.7 Transfer out of ICU diagnosis 32 Chapter 5 Methods 35 5.1 Text preprocessing 35 5.2 Word embedding 35 5.3 Neural network 37 5.4 Evaluation metrics 38 5.5 Ensemble 39 Chapter 6 Results 40 6.1 21 categories result 40 6.2 The first two digits prediction 49 6.3 The first three digits prediction 49 6.4 All labels prediction 50 6.5 Word embedding 51 Chapter 7 Conclusion 53 7.1 Model limitation 53 7.2 Future work 54 REFERENCE 55
dc.language.iso	zh-TW
dc.subject	自然語言處理	zh_TW
dc.subject	神經網路	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	文字分類	zh_TW
dc.subject	ICD-10	en
dc.subject	Machine learning	en
dc.subject	Neural Network	en
dc.subject	Natural language processing	en
dc.subject	text classification	en
dc.title	利用機器學習方式自動判讀病歷並產生ICD-10編碼	zh_TW
dc.title	Automatic ICD-10 classification from free-text data	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	汪大暉,曾意儒,陳俊良,郭律成
dc.subject.keyword	機器學習,神經網路,自然語言處理,文字分類,	zh_TW
dc.subject.keyword	ICD-10,Machine learning,Neural Network,Natural language processing,text classification,	en
dc.relation.page	56
dc.identifier.doi	10.6342/NTU201900940
dc.rights.note	有償授權
dc.date.accepted	2019-06-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	2.92 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。