Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95152
標題: 開發臨床資料及基因數據之分析演算法
Development of Analysis Algorithms for Clinical and Genomic Data
作者: 詹涵晴
Han-Ching Chan
指導教授: 盧子彬
Tzu-Pin Lu
關鍵字: 精準醫療,大腸癌,預測模型,PrediXcan,族群差異,R,
precision medicine,prediction model,gene expression,PrediXcan,population difference,R,
出版年 : 2024
學位: 博士
摘要: 精準醫療之目的為考慮個人的臨床特徵、基因、環境因子及生活方式等,以提供更個人化的治療,在過往的治療中,主要都是以群體中主流的情況進行醫療決策,但結果往往會伴隨部分病人治療反應不佳,抑或是有嚴重副作用,醫療成本也隨之增加。隨著近年基因定序技術的進步,人們發現病人在基因組上的個體差異,透過找出基因與疾病間的關係,進而能發展出個人化的有效治療決策,因此在論文的第一部分,我嘗試透過分析Gene Expression Omnibus (GEO)資料庫上的基因表現量資料,尋找影響大腸癌一、二期病人術後復發風險的基因,因臨床上針對這些病人是否需要進行輔助性治療尚未有定論,透過利用支援向量機(Support Vector Machine, SVM)方法將差異表現基因建立預後模型,並使用集成學習(Ensemble learning)的概念進行預測,最終此模型在內外部驗證資料皆有良好的預測表現。基於前述GEO基因表現量資料大多來自於歐美族群,在第二部分我分析了來自台灣癌症登記資料庫之大腸癌病人,建立了專屬台灣人的存活預測模型,並使用美國癌症登記資料庫的資料進行外部驗證,探討了東西方族群在大腸癌存活上的差異。最後一部分的目標為欲比較在基因表現上歐美族群與東亞族群間的差異,PrediXcan為一廣泛使用於將DNA位點資料預測出基因表現量之演算法,惟其使用的權重模型訓練資料為來自歐洲族群,而已知在基因頻率上不同族群間是存在差異性的,因此本論文首先探討了此差異對於基因表現量預測值的影響程度,以及其使用到的DNA位點有多少比例存在族群差異;此外,也使用gnomAD中歐洲及東亞族群的次要等位基因頻率(MAF)資訊,模擬出500組所有PrediXcan位點所需資料,並代入PrediXcan以產生兩個族群的基因表現量參考資料庫,最終建立了一R語言套件,將東亞族群作為主要參考對象,使用者可將其PrediXcan結果輸入,即可得到該基因表現在相對應參考值分布中的百分位數(Percentile Rank, PR),以此可知其表現量相較大部分人群是否有所差異,同時也提供該基因的相關資訊,包括:其使用到的DNA位點數量、在東亞族群參考資料庫中的平均值及標準差、此基因在歐洲與東亞族群是否有顯著差異(使用Kolmogorov-Smirnov檢定)以及差異程度等。此工具可視為一輔助工具,透過利用大型數據庫的MAF資訊發展出標準的參考資料庫,提供使用者確認其基因表現差異程度,方便進一步後續分析探討。總結來說,本論文前兩部分分別在基因與臨床特徵的資料上嘗試建立能更加個人化治療的預測模型,而有鑒於主流的大型數據資料庫以歐美族群為居多,但歐美族群與其他次族群無論是在基因、臨床特徵上的差異對於治療反應是無法忽視的,因此本論文以東亞族群為代表,探討族群差異在PrediXcan演算法上的影響,並建立東亞族群參考資料庫,期望能更有助於東亞族群在精準醫療上的進展。
The goal of precision medicine is to provide more personalized treatments by considering individual clinical characteristics, genetics, environmental factors, and lifestyle. Historically, medical decisions have primarily been based on the predominant conditions within a population, often resulting in suboptimal treatment responses for some patients and severe side effects for others, thereby increasing healthcare costs. In recent years, advancements in gene sequencing technology have uncovered relationships between genes and diseases, leading to the development of personalized and effective treatment strategies.
The first part of the dissertation is aiming to identify genes influencing the risk of recurrence in stage I and II colorectal cancer patients by analyzing gene expression data from the Gene Expression Omnibus (GEO) database. As the need for adjuvant therapy in early-stage patients remains unclear, I applied the Support Vector Machine (SVM) to establish a prognostic model based on differentially expressed genes and utilized ensemble learning for final prediction. This model demonstrated good predictive performance in both internal and external validation datasets.
Since most prediction models in colon cancer were developed from European populations, in the second part, I analyzed Taiwan Cancer Registry (TCR) database to develop a survival prediction model for the Taiwanese population by using their demographic characteristics and tumor-associated features. In addition, to investigate the generalizability of the proposed model and the population differences, I validated the model using data from the Surveillance, Epidemiology, and End Results (SEER) cancer registry dataset. Consequently, the model showed robust prediction performance (Harrell’s c-index > 0.8) in diverse populations.
However, the differences in gene expression levels between European and underrepresented populations are still uncertain. Therefore, in the last part, I aimed to compare gene expression differences between European and East Asian populations. PrediXcan is a widely used algorithm that predicts gene expression levels from deoxyribonucleic acid (DNA) variant data, but the training data for its prediction models come from European populations. Given that allele frequency may differ among diverse populations, I first examined the impact of these differences on predicted gene expression values and the proportion of variants used by PrediXcan that exhibit population differences. In addition, I utilized the minor allele frequency (MAF) information of European and East Asian populations from gnomAD to develop gene expression reference panels for both populations.
Furthermore, I developed an R package that allows users to input their PrediXcan results and obtain the percentile rank (PR) of gene expression within the reference distribution, indicating whether the gene expression level significantly differs from the majority of the population. I also provide gene-related information, including the number of variants used, the mean and standard deviation in the East Asian reference database, whether there is a significant difference between European and East Asian populations (using the Kolmogorov-Smirnov test), and the difference value.
This tool serves as an auxiliary tool, utilizing MAF information from large-scale databases to develop the reference panel, helping users confirm population differences for further analysis. In summary, the first two parts of this dissertation aimed to establish predictive models for more personalized treatments based on genetic and clinical data. Given that large-scale databases are predominantly based on European populations, it is necessary to consider the impact of differences in genetic and clinical characteristics on treatment responses. Here, I focused on the East Asian population, exploring the impact of population differences on the PrediXcan algorithm and developing an East Asian reference panel, aiming to advance precision medicine for the East Asian population.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95152
DOI: 10.6342/NTU202403777
全文授權: 同意授權(限校園內公開)
電子全文公開日期: 2025-08-31
顯示於系所單位:流行病學與預防醫學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
6.23 MBAdobe PDF
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved