請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92795
標題: | 利用熵偵測基因體資料中類別型資料的關聯性 Detecting Association of Categorical Traits with Genomic Data by Entropy-Based Methods |
作者: | 林上傑 Shang-Chieh Lin |
指導教授: | 林彥蓉 Yann-Rong Lin |
共同指導教授: | 劉力瑜 Li-Yu Daisy Liu |
關鍵字: | 熵,全基因組關聯分析, Entropy,Genome-wide association study, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 大多數農藝性狀是通過數字或類別的方式記錄。考慮到類別型數據非線性和離散的特性,我們引入了rescaled conditional entropy(RCE)來測量類別型性狀與遺傳變異之間的關係。我們的研究結果表明,即使外表型和基因型是獨立的,外表型和基因型種類的頻度及數量都會影響RCE。利用RCE的特性,我們設計了一種演算法來驗證遺傳變異的顯著性。模擬結果表明,隨著群體大小和外表型遺傳力的增加,檢測到的數量性狀基因座(QTL)的準確性和數量也隨之增加。將RCE演算法用於3K水稻基因組數據庫中時,我們發現該演算法能夠檢測到單核苷酸多態性(SNP)之間的相互作用,與皮爾森卡方檢驗相比更具優勢。考慮到RCE對基因型類別頻率的敏感性,我們設計了另一種基於每個基因型類別的熵演算法,並將其應用於3K水稻基因組(RG)群體。熵算法的結果則通過熱圖和data mechanics進行視覺化。我們的分析表明,熵演算法檢測到的基因型類別通常與每個品種的次族群相關,而僅部分與外表型相關。此外,在同一個次族群內,基因型類別的表現模式也有所不同。熵演算法也應用於53個小麥品種的最大根長(MRL)動態,其資料特性在於不同外表型類別間具有特定結構。藉由觀察成對外表型類別所偵測到的基因型類別表現型式,我們發現差異較大的外表型類別之間所偵測到的基因型類別表現型式,能更好的分辨外表型的變異。 Most agronomic traits are recorded either numerically or categorically. Considering the non-linear and discrete property of nominal data, we introduce rescaled conditional entropy (RCE) to measure the relationship between dependency between nominal trait and genetic variants. Our findings demonstrate that both the number and frequency of phenotypic and genotypic levels affect RCE, even when the phenotype and genotypic variants are independent. Leveraging the property of RCE, we designed an algorithm to validate the significance of genetic variants. Simulation results indicated that the accuracy and number of detected quantitative trait locus (QTL) increased as the population size and heritability increased. When applying RCE algorithm in 3K Rice Genome database, we found the algorithm could detect the interaction between single nucleotide polymorphisms (SNPs) comparing to Pearson’s chi-squared test. Considering RCE is sensitive to genotypic level frequency, we designe another algorithm based on the entropy of each genotypic level and applied it to the 3K rice genome (RG) population. The result of entropy algorithm is visualized with heatmap and data mechanics. Our analysis reveal that genotypic levels detected by the entropy algorithm are typically associated with subpopulations of each variety, while only partially with phenotype. Moreover, within the same subpopulation, the pattern of genetic levels varies. The entropy algorithm is also applied to the maximum root length (MRL) dynamics of 53 wheat varieties, which has a hierarchical structure between different MRL dynamics types. Pairwise comparisons between different MRL dynamics types demonstrate that types with greater distance can be clearly distinguished by the presence-absence pattern of genetic levels. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92795 |
DOI: | 10.6342/NTU202401263 |
全文授權: | 未授權 |
顯示於系所單位: | 農藝學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf 目前未授權公開取用 | 12.8 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。