肺癌住院病患電子病歷相似性比對

Yu-Chien Chang; 張豫芊

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79096

標題:	肺癌住院病患電子病歷相似性比對 Comparison of similarities between electronic medical records of hospitalized patients with lung cancer
作者:	Yu-Chien Chang 張豫芊
指導教授:	賴飛羆
共同指導教授:	郭律成
關鍵字:	電子病歷,文字探勘,自然語言處理,相似性計算, Electronic health record,Text-mining,Natural Language Processing,Similarity computation,
出版年 :	2018
學位:	碩士
摘要:	現今，將個體差異納入醫學治療的想法已漸漸變得非常重要，也稱為精準醫療或個人化醫療。精準醫療旨在通過結合患者的行為、細胞、分子、臨床、環境、遺傳因子等參數來徹底改變現代人類治療。在現今醫療資訊系統的建置及電子病歷漸漸普及的趨勢下，此精準醫療的概念，可以更多的被實現。另一方面，在現今大數據的時代，在醫療資訊不斷累積的同時，對於這些資料的分析已變得不可或缺，若能對於這些資料有適當、良好的分析方式，今後的精準醫療研究能有更好的成果。透過對於病歷文字的進行文字探勘技術，我們好比擴大了醫生可以看診的數量限制，另一方面，對於病人的醫療紀錄進行分類、排序相似性，藉以預測哪些患者可能對治療產生類似的反應。在本研究中，我們對肺癌住院病患之出院病摘中的資料—病史這個欄位的資料，就其中擷取出非結構化的資料部分進行分析，該部分包括患者的行為和臨床因素的醫療記錄。我們對於擷取出來的病歷資料，進行錯字校正以及去除否定表達（Negation）的敘述。並實作了兩種模型以進行相似性比對，分別為TF-IDF演算法及Doc2Vec模型。由於對於相似度的評估沒有正確答案，除了一般所用的Jaccard相似係數，我們提出了另一種間接評估的方式，對於每個出院病摘的主診斷做本體論相似性比對。由於疾病分類人員皆會依據出院診斷的文字使用ICD-10-CM編碼規則轉換為診斷碼，此兩種間接評估方式，皆是使用該診斷碼作為評估基礎。由結果我們可以看出進行錯字校正及去除否定敘述的方式使得TF-IDF演算法及Doc2Vec模型中得出來的結果更好。然而，在辨識否定表達並且進一步提高醫療記錄中的拼寫校正檢測率以獲得更準確的相似度計算，仍然存在挑戰。 Nowadays, the idea of incorporating individual variabilities for medical treatments have become of great importance, also known as precision medicine or personalized medicine. Precision medicine aims at revolutionizing modern human treatment by combining patients’ behavior, cellular, molecular, clinical, environmental, genetic parameters. In the research, we analyzed brief history section of discharge summary that included patients’ behaviors and clinical factors. These days, the number of EMRs increased rapidly. Through implementing text-mining techniques on medical records, we on one hand, extended the limit number of patients a physician could see, on the other hand categorized patients and predicted which patients might react similarly to the treatments. Analyzing these EMRs can effectively assist physician in clinical decision-making, provide data support for clinical research as well as personalized healthcare service for patients. In the research, we targeted on inpatients with lung cancer. We went through several preprocessing steps, included correction and negation removal before implementing two different methods on the dataset, which were TF-IDF algorithm and Dov2vec model. Results showed that after making spelling correction and negative expression removal, the performance slightly improved. However, there are still challenges in recognizing negative expressions and further improve spelling correction detection rate in medical records to obtain a more accurate similarity computation.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79096
DOI:	10.6342/NTU201802927
全文授權:	有償授權
電子全文公開日期:	2023-08-23
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-R05945029-1.pdf 未授權公開取用	1.89 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。