Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89714
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor盧子彬zh_TW
dc.contributor.advisorTzu-Pin Luen
dc.contributor.author張心和zh_TW
dc.contributor.authorHsin-Ho Changen
dc.date.accessioned2023-09-18T16:06:54Z-
dc.date.available2023-11-09-
dc.date.copyright2023-09-18-
dc.date.issued2023-
dc.date.submitted2023-06-28-
dc.identifier.citation1. Aronson, S.J. and H.L. Rehm, Building the foundation for genomics in precision medicine. Nature, 2015. 526(7573): p. 336-42.
2. Sporikova, Z., et al., Genetic Markers in Triple-Negative Breast Cancer. Clin Breast Cancer, 2018. 18(5): p. e841-e850.
3. Castilla, L.H., et al., Mutations in the BRCA1 gene in families with early-onset breast and ovarian cancer. Nat Genet, 1994. 8(4): p. 387-91.
4. Wooster, R., et al., Identification of the breast cancer susceptibility gene BRCA2. Nature, 1995. 378(6559): p. 789-92.
5. PALB2 is a novel breast cancer susceptibility gene. Nature Clinical Practice Oncology, 2007. 4(5): p. 271-271.
6. Manolio, T.A., et al., Finding the missing heritability of complex diseases. Nature, 2009. 461(7265): p. 747-53.
7. Greenman, C., et al., Patterns of somatic mutation in human cancer genomes. Nature, 2007. 446(7132): p. 153-8.
8. Nik-Zainal, S., et al., Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature, 2016. 534(7605): p. 47-54.
9. Sjoblom, T., et al., The consensus coding sequences of human breast and colorectal cancers. Science, 2006. 314(5797): p. 268-74.
10. Kulkarni, J.A., et al., The current landscape of nucleic acid therapeutics. Nat Nanotechnol, 2021. 16(6): p. 630-643.
11. Hiam-Galvez, K.J., B.M. Allen, and M.H. Spitzer, Systemic immunity in cancer. Nat Rev Cancer, 2021. 21(6): p. 345-359.
12. Jackson, S.P. and J. Bartek, The DNA-damage response in human biology and disease. Nature, 2009. 461(7267): p. 1071-8.
13. Hodson, R., Precision medicine. Nature, 2016. 537(7619): p. S49.
14. Uffelmann, E., et al., Genome-wide association studies. Nature Reviews Methods Primers, 2021. 1(1): p. 59.
15. Edgar, R., M. Domrachev, and A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): p. 207-10.
16. Cheng, S.H.-C., et al., Validation of the 18-gene classifier as a prognostic biomarker of distant metastasis in breast cancer. PloS one, 2017. 12(9): p. e0184372.
17. Kesteven, G.L., The coefficient of variation. Nature, 1946. 158: p. 520.
18. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921.
19. Hinton, G.E. and R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science, 2006. 313(5786): p. 504-7.
20. Ioffe, S. and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in International conference on machine learning. 2015. pmlr.
21. Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
22. Botling, J., et al., Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis, and tissue microarray validation. Clin Cancer Res, 2013. 19(1): p. 194-204.
23. Pearson, K., On an Extension of the Method of Correlation by Grades or Ranks. Biometrika, 1914. 10(2/3): p. 416-418.
24. Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15545-50.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89714-
dc.description.abstract背景:
現今不論是在臨床醫學或是在預防醫學等領域無不追求精準醫療目標,然而精準醫療除需仰賴生醫藥領域之技術精進外,更重要關切的議題為如何在有限研究資源限制下,達到研究目的預期成效堪為重要。然而基因檢測技術及研究工作在精準醫療研究體系中占有舉足輕重地位,基因檢測技術已運行於醫學研究多年至為成熟,且各國學術研究單位也應用基因檢測技術針對指標性疾病、癌症發表過所研究疾病之特徵基因組,隨著醫學基因研究之演進,已有多篇學術文獻證實致病之基因已非單純關鍵少數基因影響所為,故必須將研究觸角延伸至更廣範圍之全基因研究,然而全基因檢測分析所耗之經費及時間,實為醫療研究團隊及病患首要必須面對之議題。
鑒於基因研究日趨重要及解決全基因檢測關切之時間、成本議題,本研究提出GeneVPNN基因虛擬探針檢測模型,提供未來生醫藥領域研究致病基因模擬推論參考。
方法:
GeneVPNN (Gene Virtual Probe Neural Network)模型為結合統計及深度學習之複合式科學方法,解決關鍵有效目標基因群推薦檢測議題,GeneVPNN主要於前階段以變異係數值分析(CV, Coefficient of Variation)挑選出CV5000有效目標基因群,進階應用深度學習Autoencoder (AE)演算架構,挑出更具代表性之LF-GENESET潛在特徵基因組,並以NN類神經網路推論出完整之CV5000基因探針檢測數值全貌。研究試驗採用GEO資料平台提供之GSE102484資料集,其為台灣和信治癌中心醫院以微陣列晶片檢測亞洲人種乳癌病患之基因探針數值。
結果:
GSE102484資料集經拆分成訓練及測試兩資料集,GeneVPNN模型經測試資料集驗證評估後,模型預測之誤差率經統計後,預測CV5000基因探針值誤差率小於30%之數量占比所有預測探針數量可達96.71 %;預測CV5000基因探針值誤差率小於50%之數量占比所有預測數量可達99.47 %。
結論:
本研究兩個主要研究產出,一為模型泛用性設計,GeneVPNN模型所建構之預測流程,在本研究中雖然以女性乳癌術後之追蹤期間的資料集進行研究,但GeneVPNN的設計架構是以適用廣泛疾病前提下所設計之模型,可輔助未來醫學研究於各專科複雜疾病之有效目標基因群CV5000檢測數值之推論,作為生醫藥領域基因研究前期分析工作之參考數據。另一研究產出為推薦最經濟之基因檢測範圍LF-GENESET,有助於探索未知病因基因檢測執行規劃,以GeneVPNN之AE程序所推薦給予最經濟之檢測基因範圍之效益下,醫學研究團隊及病患可省去做全基因檢測之成本,並應用GeneVPNN推論研究族群之有效目標基因群CV5000進行醫學研究及輔助臨床醫學診斷參考。
zh_TW
dc.description.abstractBackground:
Nowadays, we are pursuing the goal of precision medicine in the field of clinical medicine and preventive medicine. In addition to relying on the advancement of technology in the field of biomedicine, the more important issue that we must face up to the problem of how to achieve the expected results of research goals under the constraints of limited research resources. However, genetic testing technology plays a pivotal role in the precision medical and has been run in medical research for many years to reach maturity. With the progress of genetic research, much research has confirmed that complex diseases are not simply caused by a few symbol genes. Therefore, it is necessary to extend the research to whole genetic testing. However, the cost and time spent of whole genetic testing are the primary issues that medical research teams and patients must face. In view of that, this study proposes the GeneVPNN model, the virtual probe of genetic testing, to recommend doing real genetic testing range and compute the inference of whole genetic testing.
Method:
GeneVPNN (Gene Virtual Probe Neural Network) is a composite scientific method combining statistics and deep learning technologies to solve the issue of recommended genetic testing range. At the previous stage, GeneVPNN analyzes the coefficient of variation from RNA microarray raw data to get effective gene group (CV5000); and further, uses Autoencoder (AE) algorithm architecture to select the latent feature genes (LF-GENESET) on the condition of under specified quantity. GeneVPNN also provides prediction function, that uses deep learning neural network to deduce gene probes’ value of CV5000 from LF-GENESET. This study uses GEO GSE102484 dataset that is a gene expression array data of Asian breast cancer patients that made by Taiwan Koo Foundation SYS Cancer Center.
Result:
The GSE102484 dataset is split into two parts for training and testing. After the trained GeneVPNN model is validated and evaluated by the test dataset, the results of prediction show that the relative error percentage of less than 30% covers 96.71 % of the test dataset and the relative error percentage of less than 50% covers 99.47 % of the test dataset.
Conclusion:
This study has two main research outputs, one is the versatile design model was made for diseases studied. Although the GeneVPNN model was trained and tested by GSE102484 dataset that is relevant to breast cancer of women, but the framework of GeneVPNN is designed for unspecified diseases analysis. So, it can assist medical research getting a recommended range of genetic testing of unspecified diseases for achieving a more economical genetic testing goal. The other research output is using recommended LF-GENESET to make GeneVPNN deduce the CV5000 gene probe values of unspecified diseases in the early stages of research. Therefore, in addition to saving research’s cost and time by GeneVPNN, it can also assist doctors and researchers in diagnosing patients' postoperative conditions.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-18T16:06:54Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-18T16:06:54Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents論文口試委員審定書 I
致謝 II
中文摘要 III
Abstract V
第一章 導論 1
1.1 研究背景 1
1.2 研究動機與目的 2
第二章 研究材料與方法 4
2.1 資料來源 4
2.2 樣本篩選 4
2.3 研究方法論述 5
2.4變異係數分析方法論述 6
2.5 Autoencoder自編碼器試驗方法論述 7
2.6 DNN線性回歸試驗方法論述 12
2.7 評量預測準確度方法論述 14
第三章 結果 16
3.1 資料展示 16
3.2 CV值試驗方法結果 19
3.3 AE試驗方法結果 20
3.4 DNN試驗方法準確度分析結果 23
3.5 GeneVPNN總體可靠度檢定 34
3.6 GeneVPNN試驗方法適用非乳癌疾病泛用性檢定 39
第四章 結論與討論 44
4.1 主要發現 44
4.2 研究討論 46
4.3 結論 46
4.4 研究限制 47
參考文獻 48
-
dc.language.isozh_TW-
dc.subject基因預測zh_TW
dc.subject深度學習zh_TW
dc.subject自編碼器zh_TW
dc.subject降維zh_TW
dc.subject基因組選擇zh_TW
dc.subjectDeep learningen
dc.subjectAutoencoderen
dc.subjectDimension reductionen
dc.subjectGene predictionen
dc.subjectGenomic selectionen
dc.title基於深度學習方法架構之基因虛擬探針模型GeneVPNN於推薦潛在特徵基因組暨預測目標基因群研究zh_TW
dc.titleThe study of economical genetic testing with GeneVPNN in complex disease recommending latent feature genes and predicting effective target genesen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蕭朱杏;陳佩君zh_TW
dc.contributor.oralexamcommitteeChuhsing Kate Hsiao;Pei-Chun Chenen
dc.subject.keyword基因預測,自編碼器,深度學習,降維,基因組選擇,zh_TW
dc.subject.keywordGene prediction,Deep learning,Autoencoder,Dimension reduction,Genomic selection,en
dc.relation.page49-
dc.identifier.doi10.6342/NTU202301211-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-06-29-
dc.contributor.author-college公共衛生學院-
dc.contributor.author-dept流行病學與預防醫學研究所-
顯示於系所單位:流行病學與預防醫學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf2.59 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved