請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73336
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 賴飛羆 | |
dc.contributor.author | Yu-Hsuan Chang | en |
dc.contributor.author | 張宇軒 | zh_TW |
dc.date.accessioned | 2021-06-17T07:29:09Z | - |
dc.date.available | 2019-07-03 | |
dc.date.copyright | 2019-07-03 | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019-06-18 | |
dc.identifier.citation | [1] 'International Classification of Diseases (ICD)'. World Health Organization. Retrieved 23 November 2010.
[2] 'ICD-10 Version:2015'. apps.who.int. Retrieved 23 May 2017. [3] Farkas R, Szarvas G. Automatic construction of rule-based ICD-9-CM coding systems. Bioinformatics.2008;9(Suppl3): S10.doi:10.1186/1471-2105-9-S3-S10. [4] Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao, 'Recurrent Convolutional Neural Networks for Text Classification' presented at AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence. [5] Mikolov, T., M. Karafiát, L. Burget, et al. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). 2010. [6] Koopman B, Zuccon G, Nguyen A, Bergheim A, Grayson N. Automatic ICD-10 classification of cancers from free-text death certificates. Int J Med Inform. 2015 Nov;84(11):956–65. doi: 10.1016/j.ijmedinf.2015.08.004. [7] Yoon Kim. Convolutional Neural Networks for Sentence Classification. [8] Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, pages 1422– 1432. [9] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. [10] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013). [11] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008. [12] T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems, pages 1{15. LNCS Vol. 1857, Springer, 2001. [13] Joyce J.M., Kullback-leibler divergence. International Encyclopedia of Statistical Science, Sprienger (2011) [14] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), 2006, pp. 161–168. [15] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. 2016. MIT Press. [16] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555[cs], December 2014 [17] Steven Bird. 2006. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions, COLING-ACL ’06, pages 69–72, Stroudsburg, PA, USA. Association for Computational Linguistics. [18] Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 3 (3), 1–13. [19] Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43 (11):1130-1139. [20] R. JeffreyPennington and C. Manning. Glove: Global vectors for word representation. 2014. 2 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73336 | - |
dc.description.abstract | 目前疾病分類主要依靠人力閱讀大量的文字資料作為分類的依據,一位專業的疾病分類員需要長時間的專業訓練才能進行ICD-10分類的複雜作業,而這項工作即便是由專業的疾病分類人員來進行都需要花費大量的時間才能對一個病人做出正確的編碼,像這樣的工作若能由人工智慧做取代,能夠為醫院省下大量的人力,使得醫療體系的工作變得更加完善。
為了達到這樣的目的,我們希望能建立一套ICD-10代碼的自動分類系統,能夠閱讀並處理醫生所寫下的文字資料,由輸入這些文字類型的資料,例如出院診斷,手術紀錄或是病史等等的文字記錄,藉由自然語言處理的技術,配合機器學習的分類方式學習這些代碼的分類規則,最後得到相對應的ICD-10代碼。 在我們的研究中,受試者的資料包含臺大醫院使用 ”國際疾病傷害及死因分類標準第十版” 作為疾病分類標準的所有出院病人的入院診斷、轉出加護病房指示、身體檢查、住院治療經過、併發症、檢驗紀錄、檢查進度、影像報告、病理報告、出院診斷、主訴、病史、手術紀錄,以及疾病分類人員最終的編碼結果,資料中不需要關於病人的個人資料,病人資料的個人隱私與此研究不會有相關聯。 | zh_TW |
dc.description.abstract | Our study aimed to construct a system for ICD-10 coding system, produced by supervised machine learning techniques, in order to categorize automatically free-text medical data using solely their content. There are numerous machine learning techniques and we use supervised machine learning to learn how to classify the ICD-10 codes from free-text data. At present, the work of classifying diseases mainly relies on manpower to read a large amount of written materials, such as discharge diagnosis, chief complaint, medical history, operation records and so on as the basis for classification. Coding is both laborious and time consuming. A disease coder with professional abilities also takes an average of 20 minutes, if we can provide an automatic code classification system with enough accuracy compared with professional coder, this model can significantly reduce the human labor in the code classification time. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T07:29:09Z (GMT). No. of bitstreams: 1 ntu-108-R05945039-1.pdf: 2994786 bytes, checksum: c559162e5cdd69ae1699e06d14c52abf (MD5) Previous issue date: 2019 | en |
dc.description.tableofcontents | 誌謝 1
中文摘要 2 ABSTRACT 3 CONTENTS 4 LIST OF FIGURES 7 LIST OF TABLES 9 Chapter 1 Introduction 10 Chapter 2 Background 12 Chapter 3 Related work 17 3.1 Text classification 17 3.2 Neural network 19 3.3 ICD classification 19 Chapter 4 Data description 21 4.1 Data introduction 21 4.2 7 classes of free-text data 21 4.2.1 Chief complaint 21 4.2.2 Course and treatment 23 4.2.3 History 25 4.2.4 Pathology report 26 4.2.5 Physical examination 28 4.2.6 Discharge diagnosis 30 4.2.7 Transfer out of ICU diagnosis 32 Chapter 5 Methods 35 5.1 Text preprocessing 35 5.2 Word embedding 35 5.3 Neural network 37 5.4 Evaluation metrics 38 5.5 Ensemble 39 Chapter 6 Results 40 6.1 21 categories result 40 6.2 The first two digits prediction 49 6.3 The first three digits prediction 49 6.4 All labels prediction 50 6.5 Word embedding 51 Chapter 7 Conclusion 53 7.1 Model limitation 53 7.2 Future work 54 REFERENCE 55 | |
dc.language.iso | zh-TW | |
dc.title | 利用機器學習方式自動判讀病歷並產生ICD-10編碼 | zh_TW |
dc.title | Automatic ICD-10 classification from free-text data | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 汪大暉,曾意儒,陳俊良,郭律成 | |
dc.subject.keyword | 機器學習,神經網路,自然語言處理,文字分類, | zh_TW |
dc.subject.keyword | ICD-10,Machine learning,Neural Network,Natural language processing,text classification, | en |
dc.relation.page | 56 | |
dc.identifier.doi | 10.6342/NTU201900940 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2019-06-19 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 生醫電子與資訊學研究所 | zh_TW |
顯示於系所單位: | 生醫電子與資訊學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf 目前未授權公開取用 | 2.92 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。