在高度結構化族群中進行全基因組選拔之訓練集最佳化

林寬諺; Kuan Yan-Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94332

標題:	在高度結構化族群中進行全基因組選拔之訓練集最佳化 Training Set Optimization in Genomic Selection for the Highly Structured Populations
作者:	林寬諺 Kuan Yan-Lin
指導教授:	廖振鐸 Chen-Tuo Liao
關鍵字:	全基因組選拔,相關係數,廣義判定係數,訓練集最佳化,高度結構化族群, Genomic Selection,Correlation Coefficient,Generalized Coefficient of Determination,Training Set Optimization,Highly Structured Populations,
出版年 :	2024
學位:	碩士
摘要:	全基因組選拔(genomic selection; GS)已被廣泛地利用於作物育種上，其需靠同時具有作物外表型及作物基因型之訓練集來建立其所需之統計模型，故訓練集之選擇將會影響統計模型之表現。本篇研究針對三個具有高度族群結構之作物族群，分別是水稻、大豆及高粱，進行兩種不同方法，分別是使用測試集資訊之方法（targeted method）及無使用測試集資訊之方法（untargeted method）。利用四種不同的廣義判定係數（Generalized Coefficient of Determination; CD）作為訓練集選拔之指標，來進行訓練集最佳化。訓練集中不同的次族群數目則由三種不同的抽樣方法(Sampling method)決定，統計模型則由最佳化訓練集建構，並計算作物實際外表型值與統計模型所得之估計值之相關係數（Correlation Coefficient）來作為統計模型表現好壞之指標，相關係數之值越高則模型表現越好。結果顯示，透過本篇研究方法所產生的訓練集，能以較少的訓練集個數達到較好的模型表現，同時使用測試集資訊之方法也優於無使用測試集資訊之方法。另外，不同的抽樣方法則對模型表現沒有顯著差異。 Genomic selection (GS) has been widely utilized in crop breeding, relying on training sets that contain both phenotypic and genotypic data of crops to establish the necessary statistical models. Thus, the choice of the training set significantly impacts the performance of these models. This study evaluates two different methods for optimizing training sets for three highly structured crop populations: rice, soybean, and sorghum. The methods assessed include the targeted method, which utilizes information from the testing set, and the untargeted method, which does not. The optimization process employs the 4 different Generalized Coefficients of Determination (CD) as the criteria for training set selection. The study explores the effect of varying subpopulation numbers within the training set, determined through three different sampling methods. Optimized training sets are used to construct the statistical models, and the performance of these models is evaluated by calculating the correlation coefficient between the observed phenotypic values and the predicted values from the models. A higher correlation coefficient indicates better model performance. Results indicate that the training sets generated through the methods described in this study achieve superior model performance with fewer training set individuals. Additionally, the targeted method outperforms the untargeted method. However, the different sampling methods did not result in significant differences in model performance.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94332
DOI:	10.6342/NTU202403154
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2029-08-02
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	1.74 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。