請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89714| 標題: | 基於深度學習方法架構之基因虛擬探針模型GeneVPNN於推薦潛在特徵基因組暨預測目標基因群研究 The study of economical genetic testing with GeneVPNN in complex disease recommending latent feature genes and predicting effective target genes |
| 作者: | 張心和 Hsin-Ho Chang |
| 指導教授: | 盧子彬 Tzu-Pin Lu |
| 關鍵字: | 基因預測,自編碼器,深度學習,降維,基因組選擇, Gene prediction,Deep learning,Autoencoder,Dimension reduction,Genomic selection, |
| 出版年 : | 2023 |
| 學位: | 碩士 |
| 摘要: | 背景:
現今不論是在臨床醫學或是在預防醫學等領域無不追求精準醫療目標,然而精準醫療除需仰賴生醫藥領域之技術精進外,更重要關切的議題為如何在有限研究資源限制下,達到研究目的預期成效堪為重要。然而基因檢測技術及研究工作在精準醫療研究體系中占有舉足輕重地位,基因檢測技術已運行於醫學研究多年至為成熟,且各國學術研究單位也應用基因檢測技術針對指標性疾病、癌症發表過所研究疾病之特徵基因組,隨著醫學基因研究之演進,已有多篇學術文獻證實致病之基因已非單純關鍵少數基因影響所為,故必須將研究觸角延伸至更廣範圍之全基因研究,然而全基因檢測分析所耗之經費及時間,實為醫療研究團隊及病患首要必須面對之議題。 鑒於基因研究日趨重要及解決全基因檢測關切之時間、成本議題,本研究提出GeneVPNN基因虛擬探針檢測模型,提供未來生醫藥領域研究致病基因模擬推論參考。 方法: GeneVPNN (Gene Virtual Probe Neural Network)模型為結合統計及深度學習之複合式科學方法,解決關鍵有效目標基因群推薦檢測議題,GeneVPNN主要於前階段以變異係數值分析(CV, Coefficient of Variation)挑選出CV5000有效目標基因群,進階應用深度學習Autoencoder (AE)演算架構,挑出更具代表性之LF-GENESET潛在特徵基因組,並以NN類神經網路推論出完整之CV5000基因探針檢測數值全貌。研究試驗採用GEO資料平台提供之GSE102484資料集,其為台灣和信治癌中心醫院以微陣列晶片檢測亞洲人種乳癌病患之基因探針數值。 結果: GSE102484資料集經拆分成訓練及測試兩資料集,GeneVPNN模型經測試資料集驗證評估後,模型預測之誤差率經統計後,預測CV5000基因探針值誤差率小於30%之數量占比所有預測探針數量可達96.71 %;預測CV5000基因探針值誤差率小於50%之數量占比所有預測數量可達99.47 %。 結論: 本研究兩個主要研究產出,一為模型泛用性設計,GeneVPNN模型所建構之預測流程,在本研究中雖然以女性乳癌術後之追蹤期間的資料集進行研究,但GeneVPNN的設計架構是以適用廣泛疾病前提下所設計之模型,可輔助未來醫學研究於各專科複雜疾病之有效目標基因群CV5000檢測數值之推論,作為生醫藥領域基因研究前期分析工作之參考數據。另一研究產出為推薦最經濟之基因檢測範圍LF-GENESET,有助於探索未知病因基因檢測執行規劃,以GeneVPNN之AE程序所推薦給予最經濟之檢測基因範圍之效益下,醫學研究團隊及病患可省去做全基因檢測之成本,並應用GeneVPNN推論研究族群之有效目標基因群CV5000進行醫學研究及輔助臨床醫學診斷參考。 Background: Nowadays, we are pursuing the goal of precision medicine in the field of clinical medicine and preventive medicine. In addition to relying on the advancement of technology in the field of biomedicine, the more important issue that we must face up to the problem of how to achieve the expected results of research goals under the constraints of limited research resources. However, genetic testing technology plays a pivotal role in the precision medical and has been run in medical research for many years to reach maturity. With the progress of genetic research, much research has confirmed that complex diseases are not simply caused by a few symbol genes. Therefore, it is necessary to extend the research to whole genetic testing. However, the cost and time spent of whole genetic testing are the primary issues that medical research teams and patients must face. In view of that, this study proposes the GeneVPNN model, the virtual probe of genetic testing, to recommend doing real genetic testing range and compute the inference of whole genetic testing. Method: GeneVPNN (Gene Virtual Probe Neural Network) is a composite scientific method combining statistics and deep learning technologies to solve the issue of recommended genetic testing range. At the previous stage, GeneVPNN analyzes the coefficient of variation from RNA microarray raw data to get effective gene group (CV5000); and further, uses Autoencoder (AE) algorithm architecture to select the latent feature genes (LF-GENESET) on the condition of under specified quantity. GeneVPNN also provides prediction function, that uses deep learning neural network to deduce gene probes’ value of CV5000 from LF-GENESET. This study uses GEO GSE102484 dataset that is a gene expression array data of Asian breast cancer patients that made by Taiwan Koo Foundation SYS Cancer Center. Result: The GSE102484 dataset is split into two parts for training and testing. After the trained GeneVPNN model is validated and evaluated by the test dataset, the results of prediction show that the relative error percentage of less than 30% covers 96.71 % of the test dataset and the relative error percentage of less than 50% covers 99.47 % of the test dataset. Conclusion: This study has two main research outputs, one is the versatile design model was made for diseases studied. Although the GeneVPNN model was trained and tested by GSE102484 dataset that is relevant to breast cancer of women, but the framework of GeneVPNN is designed for unspecified diseases analysis. So, it can assist medical research getting a recommended range of genetic testing of unspecified diseases for achieving a more economical genetic testing goal. The other research output is using recommended LF-GENESET to make GeneVPNN deduce the CV5000 gene probe values of unspecified diseases in the early stages of research. Therefore, in addition to saving research’s cost and time by GeneVPNN, it can also assist doctors and researchers in diagnosing patients' postoperative conditions. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89714 |
| DOI: | 10.6342/NTU202301211 |
| 全文授權: | 同意授權(全球公開) |
| 顯示於系所單位: | 流行病學與預防醫學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf | 2.59 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
