肺癌住院病患電子病歷相似性比對

Yu-Chien Chang; 張豫芊

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79096

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆
dc.contributor.author	Yu-Chien Chang	en
dc.contributor.author	張豫芊	zh_TW
dc.date.accessioned	2021-07-11T15:43:37Z	-
dc.date.available	2023-08-23
dc.date.copyright	2018-08-23
dc.date.issued	2018
dc.date.submitted	2018-08-10
dc.identifier.citation	[1] Collins, Francis S. and H. E. Varmus. “A new initiative on precision medicine.” The New England journal of medicine 372 9 (2015): 793-5. [2] Chen, Rui and Michael Snyder. “Promise of personalized omics to precision medicine.” Wiley interdisciplinary reviews. Systems biology and medicine 5 1 (2013): 73-82. [3] Jameson, J. Larry and Dan L. Longo. “Precision medicine--personalized, problematic, and promising.” The New England journal of medicine 372 23 (2015): 2229-34. [4] Weed, Lincoln. “Medical Records, Patient Care, and Medical Education.” Irish journal of medical science 462 (1964): 271-82. [5] Ferlay, Jacques, Isabelle Soerjomataram, Rajesh Dikshit, Sultan Eser, Colin Mathers, Marise Rebelo, Donald Maxwell Parkin, David Forman, and Freddie Bray. “Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012.” International journal of cancer 136, no. 5 (2015). [6] Long-Term Trend of Top Ten Leading Cancer Statistics, 1979-2015, Taiwan Cancer Registry. [7] Cancer Survival Rates in Taiwan, 2011-2015, Taiwan Cancer Registery. [8] Vos, Theo, Christine Allen, Megha Arora, Ryan M. Barber, Zulfiqar A. Bhutta, Alexandria Brown, Austin Carter et al. “Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015.” The Lancet 388, no. 10053 (2016): 1545-1602. [9] Barkhordari, Mohammadhossein, and Mahdi Niamanesh. 'ScaDiPaSi: an effective scalable and distributable MapReduce-Based method to find patient similarity on huge healthcare networks.' Big Data Research 2, no. 1 (2015): 19-27. [10] Srinivasan, Uma, and Bavani Arunasalam. 'Leveraging big data analytics to reduce healthcare costs.' IT professional 15, no. 6 (2013): 21-28 [11] Subirats, Laia, Luigi Ceccaroni, and Felip Miralles. 'Knowledge representation for prognosis of health status in rehabilitation.' Future Internet 4, no. 3 (2012): 762-775. [12] Gottlieb, Assaf, Gideon Y. Stein, Eytan Ruppin, Russ B. Altman, and Roded Sharan. 'A method for inferring medical diagnoses from patient similarities.' BMC medicine 11, no. 1 (2013): 194. [13] Panahiazar, Maryam, Vahid Taslimitehrani, Naveen L. Pereira, and Jyotishman Pathak. 'Using EHRs for heart failure therapy recommendation using multidimensional patient similarity analytics.' Studies in health technology and informatics 210 (2015): 369. [14] Wang, Fei, Jianying Hu, and Jimeng Sun. 'Medical prognosis based on patient similarity and expert feedback.' In Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 1799-1802. IEEE, 2012. [15] Miotto, Riccardo, Li Li, Brian A. Kidd, and Joel T. Dudley. 'Deep patient: an unsupervised representation to predict the future of patients from the electronic health records.' Scientific reports 6 (2016): 26094. [16] He, Ziping, Jijiang Yang, Qing Wang, and Jianqiang Li. 'A Method of Electronic Medical Record Similarity Computation.' In International Conference on Smart Health, pp. 182-191. Springer, Cham, 2016. [17] Zhang, Yunxuan, Ziping He, Ji-Jiang Yang, Qing Wang, and Jianqiang Li. 'Re-Structuring and Specific Similarity Computation of Electronic Medical Records.' In Computer Software and Applications Conference (COMPSAC), 2017 IEEE 41st Annual, vol. 2, pp. 230-235. IEEE, 2017. [18] Ruch, Patrick, Robert Baud, A. Geissbuhler, and Anne-Marie Rassinoux. 'Comparing general and medical texts for information retrieval based on natural language processing: an inquiry into lexical disambiguation.' Studies in health technology and informatics 1 (2001): 261-265. [19] Chapman, Wendy W., Will Bridewell, Paul Hanbury, Gregory F. Cooper, and Bruce G. Buchanan. 'A simple algorithm for identifying negated findings and diseases in discharge summaries.' Journal of biomedical informatics 34, no. 5 (2001): 301-310. [20] Peng, Yifan, Xiaosong Wang, Le Lu, Mohammadhadi Bagheri, Ronald Summers, and Zhiyong Lu. 'NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.' AMIA Summits on Translational Science Proceedings 2017 (2018): 188. [21] Wang, Xiaosong, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 'Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases.' In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 3462-3471. IEEE, 2017. [22] Schütze, Hinrich, Christopher D. Manning, and Prabhakar Raghavan. Introduction to information retrieval. Vol. 39. Cambridge University Press, 2008. [23] Peter Norvig, “How to Write a Spelling Corrector, ” 2016, https://norvig.com/spell-correct.html [24] Aronson, Alan R. 'Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.' In Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association, 2001. [25] Charniak, Eugene, and Mark Johnson. 'Coarse-to-fine n-best parsing and MaxEnt discriminative reranking.' In Proceedings of the 43rd annual meeting on association for computational linguistics, pp. 173-180. Association for Computational Linguistics, 2005. [26] Manning, Christopher, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 'The Stanford CoreNLP natural language processing toolkit.' In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55-60. 2014. [27] Sioutos, Nicholas, Sherri de Coronado, Margaret W. Haber, Frank W. Hartel, Wen-Ling Shaiu, and Lawrence W. Wright. 'NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information.' Journal of biomedical informatics 40, no. 1 (2007): 30-43. [28] Le, Quoc, and Tomas Mikolov. “Distributed representations of sentences and documents.” In International Conference on Machine Learning, pp. 1188-1196. 2014. [29] Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed representations of words and phrases and their compositionality.” In Advances in neural information processing systems, pp. 3111-3119. 2013. [30] Nguyen, Hoa A., and Hoa Al-Mubaid. 'New ontology-based semantic similarity measure for the biomedical domain.' In Granular Computing, 2006 IEEE International Conference on, pp. 623-628. IEEE, 2006. [31] Althobaiti, Ahmad Fayez S. 'Comparison of Ontology-Based Semantic-Similarity Measures in the Biomedical Text.' Journal of Computer and Communications 5, no. 02 (2017): 17. [32] Lamy, Jean-Baptiste, Alain Venot, and Catherine Duclos. 'PyMedTermino: an open-source generic API for advanced terminology services.' In MIE, pp. 924-928. 2015. [33] Kusner, Matt, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 'From word embeddings to document distances.' In International Conference on Machine Learning, pp. 957-966. 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79096	-
dc.description.abstract	現今，將個體差異納入醫學治療的想法已漸漸變得非常重要，也稱為精準醫療或個人化醫療。精準醫療旨在通過結合患者的行為、細胞、分子、臨床、環境、遺傳因子等參數來徹底改變現代人類治療。在現今醫療資訊系統的建置及電子病歷漸漸普及的趨勢下，此精準醫療的概念，可以更多的被實現。另一方面，在現今大數據的時代，在醫療資訊不斷累積的同時，對於這些資料的分析已變得不可或缺，若能對於這些資料有適當、良好的分析方式，今後的精準醫療研究能有更好的成果。透過對於病歷文字的進行文字探勘技術，我們好比擴大了醫生可以看診的數量限制，另一方面，對於病人的醫療紀錄進行分類、排序相似性，藉以預測哪些患者可能對治療產生類似的反應。在本研究中，我們對肺癌住院病患之出院病摘中的資料—病史這個欄位的資料，就其中擷取出非結構化的資料部分進行分析，該部分包括患者的行為和臨床因素的醫療記錄。我們對於擷取出來的病歷資料，進行錯字校正以及去除否定表達（Negation）的敘述。並實作了兩種模型以進行相似性比對，分別為TF-IDF演算法及Doc2Vec模型。由於對於相似度的評估沒有正確答案，除了一般所用的Jaccard相似係數，我們提出了另一種間接評估的方式，對於每個出院病摘的主診斷做本體論相似性比對。由於疾病分類人員皆會依據出院診斷的文字使用ICD-10-CM編碼規則轉換為診斷碼，此兩種間接評估方式，皆是使用該診斷碼作為評估基礎。由結果我們可以看出進行錯字校正及去除否定敘述的方式使得TF-IDF演算法及Doc2Vec模型中得出來的結果更好。然而，在辨識否定表達並且進一步提高醫療記錄中的拼寫校正檢測率以獲得更準確的相似度計算，仍然存在挑戰。	zh_TW
dc.description.abstract	Nowadays, the idea of incorporating individual variabilities for medical treatments have become of great importance, also known as precision medicine or personalized medicine. Precision medicine aims at revolutionizing modern human treatment by combining patients’ behavior, cellular, molecular, clinical, environmental, genetic parameters. In the research, we analyzed brief history section of discharge summary that included patients’ behaviors and clinical factors. These days, the number of EMRs increased rapidly. Through implementing text-mining techniques on medical records, we on one hand, extended the limit number of patients a physician could see, on the other hand categorized patients and predicted which patients might react similarly to the treatments. Analyzing these EMRs can effectively assist physician in clinical decision-making, provide data support for clinical research as well as personalized healthcare service for patients. In the research, we targeted on inpatients with lung cancer. We went through several preprocessing steps, included correction and negation removal before implementing two different methods on the dataset, which were TF-IDF algorithm and Dov2vec model. Results showed that after making spelling correction and negative expression removal, the performance slightly improved. However, there are still challenges in recognizing negative expressions and further improve spelling correction detection rate in medical records to obtain a more accurate similarity computation.	en
dc.description.provenance	Made available in DSpace on 2021-07-11T15:43:37Z (GMT). No. of bitstreams: 1 ntu-107-R05945029-1.pdf: 1934705 bytes, checksum: e523de490ed936dbe18994acab4ed7eb (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Objective 3 Chapter 2 Related Works 4 Chapter 3 Methodology 6 3.1 Workflow 6 3.2 Materials 6 3.2.1 Dataset 6 3.2.2 Summary Statistics 8 3.2.3 Cohort Statistics 9 3.3 Data Preprocessing 10 3.3.1 Data Cleaning 10 3.3.2 Lemmatization 10 3.3.3 Correction 11 3.3.4 Negative Expressions Detection and Removal 13 3.3.5 Stop Words Removal 15 3.4 Building Model 15 3.4.1 TF-IDF Model 15 3.4.2 Doc2Vec 16 3.5 Similarity Computation 18 3.6 Evaluation Method 18 3.6.1 ICD-10-CM 18 3.6.2 Jaccard Similarity Coefficient 20 3.6.3 Ontology-based Dissimilarity Measure (Distance) 21 Chapter 4 Result and Discussion 23 4.1 TF-IDF Algorithm 23 4.1.1 Jaccard Similarity Coefficient 23 4.1.2 Ontology-based Disimilarity Measure 24 4.2 Doc2Vec 25 4.2.1 Jaccard Similarity Coefficient 25 4.2.2 Ontology-based Dissimilarity Measure 25 4.3 Comparison 26 Chapter 5 Conclusion and Future Work 28 REFERENCE 29
dc.language.iso	zh-TW
dc.subject	文字探勘	zh_TW
dc.subject	電子病歷	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	相似性計算	zh_TW
dc.subject	Similarity computation	en
dc.subject	Text-mining	en
dc.subject	Natural Language Processing	en
dc.subject	Electronic health record	en
dc.title	肺癌住院病患電子病歷相似性比對	zh_TW
dc.title	Comparison of similarities between electronic medical records of hospitalized patients with lung cancer	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.coadvisor	郭律成
dc.contributor.oralexamcommittee	張智星,施吉昇,歐陽彥正
dc.subject.keyword	電子病歷,文字探勘,自然語言處理,相似性計算,	zh_TW
dc.subject.keyword	Electronic health record,Text-mining,Natural Language Processing,Similarity computation,	en
dc.relation.page	33
dc.identifier.doi	10.6342/NTU201802927
dc.rights.note	有償授權
dc.date.accepted	2018-08-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
dc.date.embargo-lift	2023-08-23	-
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-R05945029-1.pdf 未授權公開取用	1.89 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。