用於從候選族群中選拔最佳基因型A-最適與 D-最適訓練集之研究

宋文修; Wen-Hsiu Sung

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88673

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖振鐸	zh_TW
dc.contributor.advisor	Chen-Tuo Liao	en
dc.contributor.author	宋文修	zh_TW
dc.contributor.author	Wen-Hsiu Sung	en
dc.date.accessioned	2023-08-15T17:18:59Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-07	-
dc.identifier.citation	Akdemir, D., Rio, S., & Isidro y Sánchez, J. (2021). Trainsel: an r package for selection of training populations. Frontiers in genetics, 12, 655287. Akdemir, D., Sanchez, J. I., & Jannink, J.-L. (2015). Optimization of genomic selection training populations with a genetic algorithm. Genetics Selection Evolution, 47, 1-10. Blondel, M., Onogi, A., Iwata, H., & Ueda, N. (2015). A ranking approach to genomic selection. Plos one, 10(6), e0128570. Chung, P.-Y., & Liao, C.-T. (2020). Identification of superior parental lines for biparental crossing via genomic prediction. Plos one, 15(12), e0243159. Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. Plos one, 11(6), e0156744. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1-22. Fernandes, S. B., Dias, K. O., Ferreira, D. F., & Brown, P. J. (2018). Efficiency of multi-trait, indirect, and trait-assisted genomic selection for improvement of biomass sorghum. Theoretical and Applied Genetics, 131, 747-755. Fernández-González, J., Akdemir, D., & Isidro y Sánchez, J. (2023). A comparison of methods for training population optimization in genomic selection. Theoretical and Applied Genetics, 136(3), 30. Forni, S., Aguilar, I., & Misztal, I. (2011). Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution, 43, 1-7. González-Camacho, J., de Los Campos, G., Pérez, P., Gianola, D., Cairns, J., Mahuku, G., Babu, R., & Crossa, J. (2012). Genome-enabled prediction of genetic values using radial basis function neural networks. Theoretical and Applied Genetics, 125, 759-771. Hayes, B. J., Bowman, P. J., Chamberlain, A. J., & Goddard, M. E. (2009). Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of dairy science, 92(2), 433-443. Heslot, N., Yang, H. P., Sorrells, M. E., & Jannink, J. L. (2012). Genomic selection in plant breeding: a comparison of models. Crop science, 52(1), 146-160. Jarvelin, K. (2000). IR evaluation methods for retrieving highly relevant documents. Proc. International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), July 2000, Kristensen, P. S., Jensen, J., Andersen, J. R., Guzmán, C., Orabi, J., & Jahoor, A. (2019). Genomic prediction and genome-wide association studies of flour yield and alveograph quality traits using advanced winter wheat breeding material. Genes, 10(9), 669. Laloë, D. (1993). Precision and information in linear models of genetic evaluation. Genetics Selection Evolution, 25(6), 557-576. Laloë, D., Phocas, F., & Menissier, F. (1996). Considerations on measures of precision and connectedness in mixed linear models of genetic evaluation. Genetics Selection Evolution, 28(4), 359-378. Meuwissen, T. H., Hayes, B. J., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. Ou, J.-H., & Liao, C.-T. (2019). Training set determination for genomic selection. Theoretical and Applied Genetics, 132, 2781-2792. Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., Rodriguez, V. M., Moreno-Gonzalez, J., Melchinger, A., & Bauer, E. (2012). Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics, 192(2), 715-728. Spindel, J., Begum, H., Akdemir, D., Virk, P., Collard, B., Redona, E., Atlin, G., Jannink, J.-L., & McCouch, S. R. (2015). Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS genetics, 11(2), e1004982. Tsai, S.-F., Shen, C.-C., & Liao, C.-T. (2021). Bayesian optimization approaches for identifying the best genotype from a candidate population. Journal of Agricultural, Biological and Environmental Statistics, 26, 519-537. VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of dairy science, 91(11), 4414-4423. Wu, P.-Y., Ou, J.-H., & Liao, C.-T. (2023). Sample size determination for training set optimization in genomic prediction. Theoretical and Applied Genetics, 136(3), 57. Zhao, K., Tung, C.-W., Eizenga, G. C., Wright, M. H., Ali, M. L., Price, A. H., Norton, G. J., Islam, M. R., Reynolds, A., & Mezey, J. (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature communications, 2(1), 467.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88673	-
dc.description.abstract	隨著分子生物學的進步，基因體選拔 (genomic selection，GS)廣泛用於動物或作物育種計畫中，並成為一項重要的工具。儘管基因型分析 (genotyping)的成本降低，外表型分析 (phenotyping)仍然是要花相對較高的成本以及時間，因此希望透過基因型 (genotype)推測外表型(phenotype)，以此加速育種計畫。基因體選拔透過遍布整個基因體 (genome)的基因標誌 (gene markers)以及已知的連續型性狀外表型，建立統計模型，進而憑藉基因型推測出育種價估計值 (genomic estimated breeding values，GEBVs)，從中選拔出適合的自交系 (inbred lines)或育種計畫中的雜交組合 (hybrids)。統計模型的建構中，如何只透過基因型資料，選擇適當的個體當作訓練集 (training set)進行外表型分析，建構出表現好的預測模型，在基因體選拔是個重要的議題。在本文的研究中，分析兩種方法：A-最適準則 (A-optimality)與D-最適準則 (D-optimality)兩種判斷方法，原理是試圖挑出最大變異的個體作為適合的訓練集。我們使用四組不同的作物基因資料，分別使用模擬結果與實際資料，並與之前研究的其他方法相比較，兩者相較於隨機訓練集有比較好的表現。	zh_TW
dc.description.abstract	Genomic selection (GS) has become a powerful tool in the domains of plant and animal breeding with advanced and cheaper molecular genetic technology. Despite substantial reduction in genotyping costs, phenotyping still remains a time-consuming and expensive process. As a result, phenotype estimation through genotypic information can accelerate the breeding cycle. In GS, markers of the whole genome are used to estimate genomic estimated breeding values (GEBVs) by statistical models, which are built with genotype and phenotype. These GEBVs facilitate the selection of desirable inbred lines or hybrids for further breeding programs. In the construction of statistical models, selecting appropriate individuals as the training set based on genotype data and building effective prediction models is a crucial topic in genomic selection. In this study, we evaluated two methods: A-optimality and D-optimality, which are criteria aimed at selecting individuals with the highest level of variation. We utilized four different crop genomic datasets and compared the results with previous studies, using both simulated and real data. Both A-optimality and D-optimality demonstrated better performance compared to random training sets.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:18:59Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T17:18:59Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents 口試委員會審定書 i 致謝 ii 中文摘要 iii Abstract iv Chapter 1 Introduction 1 Chapter 2 Materials and Methods 3 2.1 Genome datasets 3 2.2 GBLUP model 6 2.3 A-optimality and D-optimality 6 2.4 Training set evaluation 8 2.5 Scenarios of this study 10 2.6 Simulation study for validating A-opt and D-opt methods 11 2.7 Analysis of phenotypic values 12 2.8 Comparison with other optimality criteria 12 Chapter 3 Results 13 3.1 The variance pattern of a candidate set in A-optimality 13 3.2 Simulation results 15 3.3 Results of phenotypic value analysis 20 3.4 Comparison results of different training set optimality criteria 24 Chapter 4 Discussion 26 4.1 Coding of marker score matrix 26 4.2 Normalization of the marker score matrix 26 4.3 The influence of subpopulation 27 4.4 The influence of heritability in phenotypic analysis 30 4.5 Robustness in different estimation methods 31 Chapter 5 Conclusion 36 Appendix 1 37 Appendix 2 Source code in R 39 Bibliography 46 List of Figures Figure 1 Bar chart representing the variance in A-opt. 14 Figure 2. The average NDCG values for the Tropical Rice dataset across three heritability levels and various values of k. 16 Figure 3. The average NDCG values for the wheat dataset across three heritability levels and various values of k. 17 Figure 4. The average NDCG values for the sorghum dataset across three heritability levels and various values of k. 18 Figure 5. The average NDCG values for the 44K rice dataset across three heritability levels and various values of k. 19 Figure 6. The average mean of NDCGk@10 for the phenotypic data in the tropical rice dataset. 20 Figure 7. The average mean of NDCGk@10 for the phenotypic data in the wheat dataset. 21 Figure 8. The average mean of NDCGk@10 for the phenotypic data in the sorghum dataset. 22 Figure 9. The average mean of NDCGk@10 for the phenotypic data in the 44K rice dataset. 23 Figure 10. The average mean of NDCGk@10 for the dataset with subpopulation structure, sorghum dataset and 44K rice dataset. 29 Figure 11. The comparison between different estimation methods for the phenotypic data of the tropical rice dataset. 32 Figure 12. The comparison between different estimation methods for the phenotypic data of the wheat dataset. 33 Figure 13. The comparison between different estimation methods for the phenotypic data of the sorghum dataset. 34 Figure 14. The comparison between different estimation methods for the phenotypic data of the 44K rice dataset. 35 List of Tables Table 1. The summary of the datasets 5 Table 2. The training set size for each dataset. 11 Table 3. The comparison between optimality criteria with different training set size across each dataset. 25 Table 4. The heritability of each phenotypic data 30	-
dc.language.iso	en	-
dc.subject	基因體選拔	zh_TW
dc.subject	植物育種	zh_TW
dc.subject	訓練集選擇	zh_TW
dc.subject	基因演算法	zh_TW
dc.subject	混合線性模型	zh_TW
dc.subject	training set selection	en
dc.subject	plant breeding	en
dc.subject	genetic algorithm	en
dc.subject	genomic selection	en
dc.subject	linear mixed effect model	en
dc.title	用於從候選族群中選拔最佳基因型A-最適與 D-最適訓練集之研究	zh_TW
dc.title	A-optimal and D-optimal training sets for identifying the best genotypes for a candidate population	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蔡欣甫;高振宏	zh_TW
dc.contributor.oralexamcommittee	Shin-Fu Tsai;Chen-Hung Kao	en
dc.subject.keyword	基因體選拔,訓練集選擇,植物育種,基因演算法,混合線性模型,	zh_TW
dc.subject.keyword	genomic selection,training set selection,plant breeding,genetic algorithm,linear mixed effect model,	en
dc.relation.page	48	-
dc.identifier.doi	10.6342/NTU202301253	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-08	-
dc.contributor.author-college	生物資源暨農學院	-
dc.contributor.author-dept	農藝學系	-
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	2.98 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。