請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57572完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 賴飛羆(Fei-Pei Lai) | |
| dc.contributor.author | Ssu-Ming Wang | en |
| dc.contributor.author | 王思敏 | zh_TW |
| dc.date.accessioned | 2021-06-16T06:52:06Z | - |
| dc.date.available | 2021-08-12 | |
| dc.date.copyright | 2020-08-04 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-07-21 | |
| dc.identifier.citation | 1. Lazakidou, Athina A, Handbook of research on informatics in healthcare and biomedicine, Vol. 1, Hershey, PA: Idea Group Reference, c2006, 2006. 2. Farkas, Richárd, and György Szarvas, 'Automatic construction of rule-based ICD-9-CM coding systems,' BMC bioinformatics, Vol. 9, No. S3, BioMed Central, 2008. 3. Zhang, Yutao, et al., 'Leap: Learning to prescribe effective and safe treatment combinations for multimorbidity,' Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. 4. Wang, Yanshan, et al., 'MedSTS: a resource for clinical semantic textual similarity,' Language Resources and Evaluation (2018): 1-16. 5. WHO. ICD-10 Version: 2015, apps.who.int. (2017). 6. Mills, Ronald E., et al., 'Impact of the transition to ICD-10 on Medicare inpatient hospital payments,' Journal of AHIMA website (2015). 7. Gunter, Tracy D., and Nicolas P. Terry, 'The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions,' Journal of Medical Internet Research 7.1 (2005): e3. 8. Chowdhury, Gobinda G., 'Natural language processing,' Annual review of information science and technology 37.1 (2003): 51-89. 9. Pivovarov, Rimma, and Noémie Elhadad, 'Automated methods for the summarization of electronic health records,' Journal of the American Medical Informatics Association 22.5 (2015): 938-947. 10. Feller, Daniel J., et al., 'Using clinical notes and natural language processing for automated HIV risk assessment,' Journal of Acquired Immune Deficiency Syndromes (1999) 77.2 (2018): 160. 11. Loper, Edward, and Steven Bird, 'NLTK: the natural language toolkit,' arXiv preprint cs/0205028 (2002). 12. Pedregosa, Fabian, et al., 'Scikit-learn: Machine learning in Python,' the Journal of Machine Learning Research 12 (2011): 2825-2830. 13. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. 'Glove: Global vectors for word representation,' Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. 14. Mikolov, Tomas, et al., 'Distributed representations of words and phrases and their compositionality,' Advances in neural information processing systems. 2013. 15. Gardner, Matt, et al., 'Allennlp: A deep semantic natural language processing platform,' arXiv preprint arXiv:1803.07640 (2018). 16. Devlin, Jacob, et al., 'Bert: Pre-training of deep bidirectional transformers for language understanding,' arXiv preprint arXiv:1810.04805 (2018). 17. Huang, Kexin, Jaan Altosaar, and Rajesh Ranganath, 'Clinicalbert: Modeling clinical notes and predicting hospital readmission,' arXiv preprint arXiv:1904.05342 (2019). 18. Lee, Jinhyuk, et al., 'BioBERT: a pre-trained biomedical language representation model for biomedical text mining,' Bioinformatics 36.4 (2020): 1234-1240. 19. Merity, Stephen, 'Single Headed Attention RNN: Stop Thinking With Your Head,' arXiv preprint arXiv:1911.11423 (2019). 20. Chung, Junyoung, et al., 'Empirical evaluation of gated recurrent neural networks on sequence modeling,' arXiv preprint arXiv:1412.3555 (2014). 21. Gers, Felix A., and E. Schmidhuber, 'LSTM recurrent networks learn simple context-free and context-sensitive languages,' IEEE Transactions on Neural Networks 12.6 (2001): 1333-1340. 22. Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio, 'Neural machine translation by jointly learning to align and translate,' arXiv preprint arXiv:1409.0473 (2014). 23. Klein, Guillaume, et al., 'Opennmt: Open-source toolkit for neural machine translation,' arXiv preprint arXiv:1701.02810 (2017). 24. Rajkomar, Alvin, et al., 'Scalable and accurate deep learning with electronic health records,' NPJ Digital Medicine 1.1 (2018): 18. 25. WHO, 'ICD-10-CM Official Guidelines for Coding and Reporting,' (2014) | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57572 | - |
| dc.description.abstract | 背景: 目前,醫療系統及保險申報已廣泛使用ICD 編碼作為給付依據,然而ICD的分類作業仍主要依靠人力閱讀大量的文字資料作為分類的依據,耗時且耗力。自2014 年台灣醫療健保申報的依據由ICD-9 改成ICD-10 後,ICD 的分類又變得更細節、容易混淆,即使是專業的疾病分類師都需要平均至少20 分鐘才能完成一個案例的編碼。 目標: 本篇研究的目標即是建構一個能夠自動根據醫療診斷資料進行ICD-10 編碼的深度學習模型,以利降低耗費在疾病編碼的時間及人力成本。 方法: 在此篇研究中,我們使用台大醫院的診斷資料並應用自然語言處裡的技術(包含Glove, Word2Vec, ELMo, BERT, SHA_RNN)於深度學習網路中以實現ICD-10 的自動編碼。此外,我們亦導入注意力機制於模型中來視覺化決定編碼的重點文字依據,以提供ICD-10 的新手使用者編碼訓練的服務。 結果: 在各個實驗結果中,使用BERT 詞嵌入和Gated recurrent unit (GRU) 的分類模型在ICD-10-CM 與ICD-10-PCS 的編碼上達到最好的結果 (F1-Score 0.715 與0.615)。訓練完的模型亦導入ICD-10 的網頁服務中,以提供所有的ICD-10 使用者自動編碼及訓練的服務。 結論: 目前,這些模型及相關的網頁服務以提供所有ICD-10 使用者使用,將編碼時間從最多40 分鐘降低至約莫2 分鐘即可得到答案,大幅降低ICD-10 使用者編碼的時間。 | zh_TW |
| dc.description.abstract | Background: Nowadays, ICD code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written materials as the basis for coding. Coding is both laborious and time consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more confusing, even a disease coder with professional abilities takes about 20 minutes per case on average. Objective: This thesis aims at constructing a deep learning model for ICD-10 coding, where the model is to automatically determine the corresponding diagnostic and procedure codes based solely on free-text medical notes to reduce human effort. Methods: We used NTUH diagnosis records as the resources and applied NLP techniques, including Glove, Word2Vec, ELMo, BERT, and SHA-RNN, on the DNN architecture to implement ICD-10 auto-coding. Besides, we introduce the attention mechanism into the model to extract the key points from the diagnosis and visualize the coding reference for training freshman in ICD-10. Results: In experiments on the NTUH dataset, our predicting result could achieve F1-score of 0.715 and 0.615 on ICD-10-CM and ICD-10-PCS code with BERT embedding approach on GRU classification model. The well-trained models are applied on the ICD-10 web service for coding and training to all ICD-10 users. Conclusions: The proposed model and web service significantly reduces manpower in coding time; coding time is reduced from 20 mins ~ 40 mins to less than 2 mins. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T06:52:06Z (GMT). No. of bitstreams: 1 U0001-2007202015161200.pdf: 4086718 bytes, checksum: aeb31d3761d24f5b12b546ef9337d5bd (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 口試委員會審定書 ...........................................................................................................# ACKNOWLEDGEMENTS ............................................................................................. ii 中文摘要..........................................................................................................................iv ABSTRACT ......................................................................................................................v CONTENTS .....................................................................................................................vi LIST OF FIGURES..........................................................................................................ix LIST OF TABLES............................................................................................................xi Chapter 1 Introduction..............................................................................................1 1.1 Motivation.......................................................................................................1 1.2 Related Works.................................................................................................1 1.3 Objective.........................................................................................................2 Chapter 2 Background ..............................................................................................5 2.1 ICD-9 to ICD-10.............................................................................................5 2.2 Unstructured EHRs.........................................................................................8 2.3 Natural Language Processing .........................................................................9 2.4 ICD-10 Auto Coding Task ............................................................................10 Chapter 3 Methods...................................................................................................12 3.1 Data Description ...........................................................................................12 3.2 System Framework .......................................................................................15 3.3 Data Processing ............................................................................................16 3.3.1 Pre-processing .....................................................................................16 3.3.2 Post-processing ...................................................................................16 3.4 Feature Extraction.........................................................................................19 3.4.1 GloVe...................................................................................................19 3.4.2 Word2Vec ............................................................................................20 3.4.3 ELMo ..................................................................................................21 3.4.4 BERT...................................................................................................22 3.4.5 SHA-RNN...........................................................................................24 3.5 Classification Model .....................................................................................25 3.5.1 GRU ....................................................................................................25 3.5.2 Fully Connection.................................................................................27 3.5.3 Deep Neural Network Model ..............................................................27 3.5.4 Attention Mechanism ..........................................................................29 3.6 Seq2Seq Model .............................................................................................30 3.7 Evaluation Metrics........................................................................................31 3.7.1 F1-Score ..............................................................................................31 3.7.2 Recall@K............................................................................................31 3.7.3 MSE.....................................................................................................32 3.8 ICD-10 Coding and Training System Framework........................................32 3.8.1 ICD-10 Coder......................................................................................33 3.8.2 ICD-10 Trainer ....................................................................................33 Chapter 4 Results and Discussion...........................................................................34 4.1 ICD-10 Classification Model........................................................................34 4.1.1 ICD-10 21-Chapter Classification.......................................................34 4.1.2 ICD-10-CM Whole Label Classification ............................................34 4.1.3 ICD-10-PCS Whole Label Classification ...........................................40 4.1.4 Performance in Department ................................................................44 4.1.5 ICD-10 Classification with Attention..................................................45 4.2 Seq2Seq Correction on Classification Model...............................................47 4.2.1 ICD-10 Seq2Seq Model ......................................................................47 4.2.2 Seq2Seq Correction.............................................................................47 4.3 Case Studies on ICD-10-CM........................................................................51 4.4 ICD-10 Coding and Training System Framework........................................51 Chapter 5 Conclusions and Future Works ............................................................54 REFERENCE ..................................................................................................................55 | |
| dc.language.iso | en | |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 國際疾病分類標準 | zh_TW |
| dc.subject | 循環神經網路 | zh_TW |
| dc.subject | 文字分類 | zh_TW |
| dc.subject | Deep learning | en |
| dc.subject | Natural language processing | en |
| dc.subject | Text classification | en |
| dc.subject | RNN | en |
| dc.subject | ICD-10 | en |
| dc.title | 基於監督式深度學習之 ICD-10 自動編碼與訓練系統 | zh_TW |
| dc.title | Automatic ICD-10 Coding and Training System with DNN Based on Supervised Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 郭律成(Lu-Cheng Kuo),簡榮彥(Jung-Yien Chien),李妮鍾(Ni-Chung Lee),胡務亮(Wuh-Liang Hwu) | |
| dc.subject.keyword | 自然語言處理,深度學習,國際疾病分類標準,循環神經網路,文字分類, | zh_TW |
| dc.subject.keyword | Natural language processing,Deep learning,ICD-10,RNN,Text classification, | en |
| dc.relation.page | 57 | |
| dc.identifier.doi | 10.6342/NTU202001651 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2020-07-22 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 生醫電子與資訊學研究所 | zh_TW |
| 顯示於系所單位: | 生醫電子與資訊學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2007202015161200.pdf 未授權公開取用 | 3.99 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
