在高度結構化族群中進行全基因組選拔之訓練集最佳化

林寬諺; Kuan Yan-Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94332

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖振鐸	zh_TW
dc.contributor.advisor	Chen-Tuo Liao	en
dc.contributor.author	林寬諺	zh_TW
dc.contributor.author	Kuan Yan-Lin	en
dc.date.accessioned	2024-08-15T16:51:47Z	-
dc.date.available	2024-08-16	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-06	-
dc.identifier.citation	O. A. Montesinos-López J. Isidro y Sánchez J. Fernández-Gónzalez W. Tadesse-et al. A. Alemu, J. Åstrand. Genomic selection in plant breeding: Key factors shaping twodecades of progress. Molecular Plant, 17(4):552–578, 2024. Sanchez J. I. Jannink-J.-L. Akdemir, D. Optimization of genomic selection training populations with a genetic algorithm. Genetics Selection Evolution, 47(38), 2015. Sanchez J. I. Jannink-J.-L. Akdemir, D. Design of training populations for selective phenotyping in genomic prediction. Scientific Reports, 9(1446), 2019. Anthony Atkinson, Alexander Donev, and Randall Tobias. Optimum Experimental Designs, with SAS. 2007. Justin N Vaughn Zenglu Li Benjamin B Stewart-Brown, Qijian Song. Genomic selection for yield and seed composition traits within an applied soybean breeding program. G3 Genes\|Genomes\|Genetics, 9:2253–2265, 2019. Liao C. T. Chung, P. Y. Identification of superior parental lines for biparental crossing via genomic prediction. PloS one, 15(12), 2020. Laloë D. Precision and information in linear models of genetic evaluation. genetics, selection, evolution. Genetics Selection Evolution, 25(6):557–576, 1993. Dias K.O.G. Ferreira D.F. et al. Fernandes, S.B. Efficiency of multi-trait, indirect, and trait-assisted genomic selection for improvement of biomass sorghum. Theoretical and Applied Genetics, 131:747–755, 2018. Lorenz A.J. Jannink J.-L. Heffner, E.L. and M.E. Sorrells. Plant breeding with genomic selection: Gain per unit time and cost. Crop Science, 50(5):1681–1690, 2010. C.R. Henderson. Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2):423–447, 1975. C.R. Henderson. Best linear unbiased prediction of breeding values not in the model for records. Journal of Dairy Science, 60:783–787, 1977. J. H. Holland. Genetic algorithms. Scientific American, 267(1):66–73, 1992. Jannink J. L. Akdemir-D. Poland J. Heslot N. Sorrells M. E. Isidro, J. Training set optimization under population structure in genomic selection. Theoretical and applied genetics., 128(1):145–158, 2015. Goddard ME. Meuwissen TH, Hayes BJ. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4):1819–29, 2001. de los Campos-G. Pérez, P. Genome-wide regression and prediction with the bglr statistical package. Genetics, 198:483–495, 2014. S Nicolas-T Altmann D Brunel P Revilla V M Rodríguez J Moreno-Gonzalez A Melchinger E Bauer C-C Schoen N Meyer C Giauffret C Bauland P Jamin J Laborde H Monod P Flament A Charcosset L Moreau R Rincent, D Laloë. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.). Genetics, 192(2):715–728, 2012. Shayle R. Searle. Matrix Algebra Useful For Statistics. John Wiley Sons, 1982. Tung CW. Eizenga G. et al. Zhao, K. Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa. Nature Communications, 467(2), 2011.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94332	-
dc.description.abstract	全基因組選拔(genomic selection; GS)已被廣泛地利用於作物育種上，其需靠同時具有作物外表型及作物基因型之訓練集來建立其所需之統計模型，故訓練集之選擇將會影響統計模型之表現。本篇研究針對三個具有高度族群結構之作物族群，分別是水稻、大豆及高粱，進行兩種不同方法，分別是使用測試集資訊之方法（targeted method）及無使用測試集資訊之方法（untargeted method）。利用四種不同的廣義判定係數（Generalized Coefficient of Determination; CD）作為訓練集選拔之指標，來進行訓練集最佳化。訓練集中不同的次族群數目則由三種不同的抽樣方法(Sampling method)決定，統計模型則由最佳化訓練集建構，並計算作物實際外表型值與統計模型所得之估計值之相關係數（Correlation Coefficient）來作為統計模型表現好壞之指標，相關係數之值越高則模型表現越好。結果顯示，透過本篇研究方法所產生的訓練集，能以較少的訓練集個數達到較好的模型表現，同時使用測試集資訊之方法也優於無使用測試集資訊之方法。另外，不同的抽樣方法則對模型表現沒有顯著差異。	zh_TW
dc.description.abstract	Genomic selection (GS) has been widely utilized in crop breeding, relying on training sets that contain both phenotypic and genotypic data of crops to establish the necessary statistical models. Thus, the choice of the training set significantly impacts the performance of these models. This study evaluates two different methods for optimizing training sets for three highly structured crop populations: rice, soybean, and sorghum. The methods assessed include the targeted method, which utilizes information from the testing set, and the untargeted method, which does not. The optimization process employs the 4 different Generalized Coefficients of Determination (CD) as the criteria for training set selection. The study explores the effect of varying subpopulation numbers within the training set, determined through three different sampling methods. Optimized training sets are used to construct the statistical models, and the performance of these models is evaluated by calculating the correlation coefficient between the observed phenotypic values and the predicted values from the models. A higher correlation coefficient indicates better model performance. Results indicate that the training sets generated through the methods described in this study achieve superior model performance with fewer training set individuals. Additionally, the targeted method outperforms the untargeted method. However, the different sampling methods did not result in significant differences in model performance.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:51:47Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T16:51:47Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements iii 摘要v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Materials 3 2.1 Rice Dataset with Highly Structured Populations . . . . . . . . . . . 3 2.2 Soybean Dataset with Highly Structured Populations . . . . . . . . . 4 2.3 Sorghum Dataset with Highly Structured Populations . . . . . . . . . 4 Chapter 3 Methods 7 3.1 A GBLUP Model with Additive Effects . . . . . . . . . . . . . . . . 7 3.2 Training Set Optimization Criteria . . . . . . . . . . . . . . . . . . . 9 3.3 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5 Evaluation Metrics for Model Performance . . . . . . . . . . . . . . 13 Chapter 4 Results 15 4.1 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.1 Rice Dataset with Highly Structured Populations . . . . . . . . . . 16 4.1.2 Soybean Dataset with Highly Structured Populations . . . . . . . . 17 4.1.3 Sorghum Dataset with Highly Structured Populations . . . . . . . . 17 Chapter 5 Discussion 25 5.1 Different Sampling Approaches to decide the Number of Subpopulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 The Relationship between CD2 and CDmean . . . . . . . . . . . . . 26 References 35 Appendix A — The Relationship between CD2 and CDmean 39 A.1 CD2 and CDmean are the same . . . . . . . . . . . . . . . . . . . . 39 Appendix B — The derivation of three versions of CD and the relationship between V ar(gˆn0) and Cov(gˆn0 , gn0) 44 B.1 The derivation of (3.2) . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.2 The derivation of three versions of CD . . . . . . . . . . . . . . . . . 45 Appendix C — The results of different sampling methods 49	-
dc.language.iso	en	-
dc.subject	高度結構化族群	zh_TW
dc.subject	廣義判定係數	zh_TW
dc.subject	訓練集最佳化	zh_TW
dc.subject	全基因組選拔	zh_TW
dc.subject	相關係數	zh_TW
dc.subject	Genomic Selection	en
dc.subject	Correlation Coefficient	en
dc.subject	Generalized Coefficient of Determination	en
dc.subject	Training Set Optimization	en
dc.subject	Highly Structured Populations	en
dc.title	在高度結構化族群中進行全基因組選拔之訓練集最佳化	zh_TW
dc.title	Training Set Optimization in Genomic Selection for the Highly Structured Populations	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蔡欣甫;高振宏	zh_TW
dc.contributor.oralexamcommittee	Shin-Fu Tsai;Chen-Hung Kao	en
dc.subject.keyword	全基因組選拔,相關係數,廣義判定係數,訓練集最佳化,高度結構化族群,	zh_TW
dc.subject.keyword	Genomic Selection,Correlation Coefficient,Generalized Coefficient of Determination,Training Set Optimization,Highly Structured Populations,	en
dc.relation.page	53	-
dc.identifier.doi	10.6342/NTU202403154	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-08-09	-
dc.contributor.author-college	生物資源暨農學院	-
dc.contributor.author-dept	農藝學系	-
dc.date.embargo-lift	2029-08-02	-
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	1.74 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。