Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85711
Title: | 以貝氏統計方法建構機率性基因網路 Bayesian Approaches to Probabilistic Genetic Networks |
Authors: | Yu-Jyun Huang 黃煜鈞 |
Advisor: | 蕭朱杏(Chuhsing Kate Hsiao) |
Keyword: | 基因調控網路,差異網路分析,機率性網路分析,機率推論,估計連線存在機率,重要性排序, gene regulatory network,differential network analysis,probabilistic network analysis,probabilistic inference,existence probability,prioritize findings, |
Publication Year : | 2022 |
Degree: | 博士 |
Abstract: | 在過去的基因與疾病相關性研究中,不少研究指出,導致複雜疾病的成因,可能與一群參與在特定生物途徑 (pathway) 內的基因,對複雜且重要之生物功能的調控機制出現問題有關。因此,了解一群基因是如何相互運作進而使得生物功能可以正常的運轉,是一個重要的研究議題;另一方面,釐清導致非正常的生物功能之主因,藉此找出可能誘發複雜疾病的重要生物標記 (biomarker),亦是在生物統計與生物資訊相關領域的重要研究標的之一。 研究者可以利用基因表達量 (gene expression) 資料所建構出來的基因調控網路 (gene regulatory network),更加清楚地了解一群基因在分子層級的運作模式,也可以比較容易描述與解釋複雜生物功能的機轉。在過去的幾年間,基於高斯圖模型 (Gaussian graphical model) 來建立與估計基因網絡結構的方法已被大量的應用於相關領域,並且也已被證實此類方法估計網路結構之表現相當優異。然而,這些方法多數著重於變數挑選 (variable selection) 的問題,如:某一條連線在網路中是否存在,這種處理方式通常無法對於連線的可能性與線的強度進行估計。在實務分析中,若可以藉由提供機率性的估計,研究者即可利用機率的強度對於結果進行排序,這樣的資訊可以幫助研究者在所有找到的可能證據中,更進一步標示出不同重要性的結果,而這些發現,或許就可以提供比較高的可能性,讓後續的研究或是生物實驗進一步證實。 由以上的動機,本研究將利用貝氏統計方法,提出一套模式來進行機率性的網路結構分析。更精確的來說,本研究有兩個研究目標:(1) 利用基因表達量資料來建構機率性的基因網路;(2) 利用機率性的差異網路分析 (differential network analysis) 來找出不同組別之間的網路結構差異。在第一個研究中,我們提出以貝氏馬可夫隨機場 (Bayesian Markov random field) 估計基因網路結構,藉由結合了條件自迴歸 (conditional autoregressive) 模型與貝氏方法中常用於進行機率性變數挑選的技巧,本方法不只可以對於每一條網絡中的連線估計其存在的機率,同時還可以透過描述條件相關性係數的後驗分佈 (posterior distribution),對於線的相對強度進行機率性的推論。由模擬測試的驗證,研究提出的方法可以具備相當穩定的估計網路結構表現。在膠質母細胞瘤 (glioblastoma) 的研究中也發現,本方法可以藉由機率估計的排序,找出具有生物意義的生物標記。 基於上述成果,在第二個目標中,我們將估計差異網路結構 (differential network) 的議題,並利用相關性研究 (association study) 中偵測交互作項 (interaction) 是否顯著的問題來回答。不僅如此,本研究提出一個貝氏模型,結合了以資料決策 (data-driven) 的技巧進行交互作用項之篩選 (screening),再加上機率性的變數挑選方式,來對於可能的差異網路線 (differential edges) 提供機率性的估計與推論。在模擬測試中,本研究提出的方法可以具備相當突出的估計差異網路之表現。而在利用膠質母細胞瘤研究中的癌症亞型 (tumor subtype) 以進行差異網路的分析中,本方法可以找出在生物學上具有解釋意義的發現,也可以利用機率性的推論以標示出重要的基因,以當作後續生物實驗的候選研究目標。 It is known that complex diseases are associated with the dysregulated mechanism of some essential biological functions and processes. These irregular biological activities are often triggered by a group of functionally related genes instead of a single gene. In order to elucidate how a group of genes regulate the underlying biological mechanism, it is vital to find out a way to estimate and visualize the complex interactions within the cellular system. The Gaussian graphical model-based approach has been widely used to estimate the structure of genetic networks. However, most of these approaches can only provide information on the binary decision, such as whether an edge exists or not. In addition, the strength of interacted pattern between two genes in a pathway is usually under-examined. To fill in these gaps, this research proposes a framework that can infer with probabilities the uncertainty of any specific edge in a network. With the probabilistic estimation, we can prioritize the results and highlight the importance of the findings, which may provide a better chance for further biological experiments to reproduce the discoveries. In this dissertation, we propose Bayesian approaches to conduct probabilistic network analysis. We further dive into two parts of research to answer specific scientific questions. In the first part of this dissertation, we propose a Bayesian Markov random field approach, combining the idea from the conditional autoregressive model and the Spike-and-Slab Lasso prior, to conducting the probabilistic network edge analysis. The novelties of the proposed model are two folds. Firstly, we can estimate the existence probability for each edge in the network. Secondly, with the Bayesian approach, we can conduct probabilistic inference about the relative strength of any specific interactions. The simulation studies and glioblastoma study will be carried out to demonstrate the stable estimation performance as well as the targeting of some biologically meaningful biomarkers associated with glioblastoma progression. On the other hand, it is also important to identify the pattern of network structure when comparing different cellular conditions. Therefore, we focus on the question of undertaking differential network analysis in the second part. We begin by showing that the identification of differential edges can be translated into the study of detecting interaction terms in an association study. We further proposed a Bayesian approach with an efficient screening strategy to estimate the probability of each possible differential edge. To the best of our knowledge, this is the first research to measure the uncertainty of differential edges. This approach will be demonstrated in simulation studies. In the end, we will use the tumor subtype data from the TCGA glioblastoma study to demonstrate that the proposed methods can identify biologically meaningful findings, highlight the hub nodes, and prioritize results in a probabilistic sense. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85711 |
DOI: | 10.6342/NTU202200957 |
Fulltext Rights: | 同意授權(全球公開) |
metadata.dc.date.embargo-lift: | 2025-06-30 |
Appears in Collections: | 流行病學與預防醫學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
U0001-1506202212512400.pdf Until 2025-06-30 | 4.43 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.