大型醫療文件資訊探勘

Chi-Huang Chen; 陳啟煌

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22224

標題:	大型醫療文件資訊探勘 Large Scale Data Mining for Healthcare Documents
作者:	Chi-Huang Chen 陳啟煌
指導教授:	賴飛羆(Fei-Pei Lai)
關鍵字:	醫療文件探勘,語意相似度量測,知識探勘及管理,出院病摘系統,醫療資訊探勘, Healthcare Documents Mining,Semantic Similarity Measure,Knowledge Discovery and Management,Discharge Summary System,Healthcare Data Mining,
出版年 :	2010
學位:	博士
摘要:	隨著醫療資訊系統普及運用，許多電子化紀錄儲存在資料庫內，這一些紀錄包含放射線診斷報告、病理報告、手術報告、入院紀錄、出院病摘以及其他醫療記錄。隨著醫療院所長期營運，可收集為數不少的醫療記錄。這些資料背後隱含醫師診斷治療的經驗以及專業知識，值得運用資訊探勘技術來探勘這一些醫療文件，萃取其中之醫療知識。為了從醫療文件取出完整的資訊，在本論文分別提出一個以語意基礎關鍵字比對萃取器、台大醫院完整出院病摘系統實做，內含自動完成、範本、詞庫等功能、以及一套適用於生物醫療領域語意相似度量測的方法。論文中之語意驅動關鍵字比對萃取器可幫助醫療人員從醫療文件萃取出相關資訊並轉存入預先定義好的資料欄位內。在此系統這些文件中的資料可以透過半自動的比對、驗證、萃取轉存到結構化的資料庫內；此外此萃取器及預定範本資料庫的設計可增加大型醫療文件資訊探勘的擴充性。預定範本資料庫可以支援不同性質的醫療研究用途來萃取相關資訊。網頁式出院病摘系統改善了以往主從式架構的舊系統需要到使用者端安裝及更新程式的缺點。此系統更引進了許多應用在網頁架構系統的新技術，包括AJAX 控制元件、SRILM語言模型、inverted table 反查機制，這一些機制可加快出院病摘的寫作。新的病摘系統確實能減少維護成本及提高病摘寫作效率。藉由使用Google 搜尋引擎之網頁計數(page count), 本論文成功設計出一個以語料庫文基礎之語意相似度量測的方法。給定兩個詞P和Q 我們定義了不同的相似度分數，藉由搜尋引擎回傳P, Q 及”P and Q”之網頁計數，及一些詞彙樣式(pattern) 來計算這兩個詞的相似度分數。再將這一些分數利用support vector machine 來整合計算出兩個詞的語意相似度。實驗結果數據顯示我們的方法可以在由A. Hliaoutakis提出的資料集達到相關系數0.798. 在T. Pedersen提出的資料集，雖針對診斷編碼專家的相關系數只達0.496，但針對醫師的部分相關系數可達0.705. 顯然我們的方法比較接近醫師評斷相似度的方式。結果顯示，本論文提出一套機制可實際應用在大型醫療文件資料探勘上。 With the popularity of computerized physician order usages, many electronic medical records are accumulated in the clinical database. The records contain numerous documents involving radiology or pathology reports, operation, admission notes, discharge summaries as well as other healthcare documents. After operating a long period, hospitals can collect a great amount, large scale of medical data. In the data, they encompass physicians' treatment experience and expertise those can be extracted, learned, and obtained. Therefore, it is essential to mining the knowledge from the healthcare documents. In this study, a semantic-driven keyword matching extractor is explored; a discharge summary system has been designed and developed with auto-complete, model essay and user defined phrase functions. To obtain the fully relevant information from the medical documents, a semantic similarity measure is introduced as well. The semantic-driven extractor is adopted for guiding clinicians to extract the appropriate data from textual clinical documents into the case-oriented templates. In the developed system, the matched information in the documents can be structuralized through matching, verifying, and extracting semi-automatically. In addition, the design of information matching modules and the case-oriented templates increase the scalability of the system for involving large scale of healthcare documents mining. The case-oriented templates can support the capability of collecting corresponding extracted data for various medical researches as well. The web-based discharge summary system eliminates defects of client-server architecture by avoiding installation or upgrading individually at each client side. Moreover, it introduces new technologies specialized to be applied under the web-based architecture, including AJAX Control toolkits, SRILM and inverted table. The features can expedite editing discharge summaries. The new system indeed reduces costs and increases productivities. The page-count, corpus-based semantic similarity measure has been exploited via Google Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we computed semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores. Apparently, the study produces a much closer correlation with physicians' scores than with those of the experts'. As the results of the research, we provide a large scale healthcare documents mining mechanism.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22224
全文授權:	未授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 目前未授權公開取用	3.15 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。