Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 農藝學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88673
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor廖振鐸zh_TW
dc.contributor.advisorChen-Tuo Liaoen
dc.contributor.author宋文修zh_TW
dc.contributor.authorWen-Hsiu Sungen
dc.date.accessioned2023-08-15T17:18:59Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-15-
dc.date.issued2023-
dc.date.submitted2023-08-07-
dc.identifier.citationAkdemir, D., Rio, S., & Isidro y Sánchez, J. (2021). Trainsel: an r package for selection of training populations. Frontiers in genetics, 12, 655287.
Akdemir, D., Sanchez, J. I., & Jannink, J.-L. (2015). Optimization of genomic selection training populations with a genetic algorithm. Genetics Selection Evolution, 47, 1-10.
Blondel, M., Onogi, A., Iwata, H., & Ueda, N. (2015). A ranking approach to genomic selection. Plos one, 10(6), e0128570.
Chung, P.-Y., & Liao, C.-T. (2020). Identification of superior parental lines for biparental crossing via genomic prediction. Plos one, 15(12), e0243159.
Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. Plos one, 11(6), e0156744.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1-22.
Fernandes, S. B., Dias, K. O., Ferreira, D. F., & Brown, P. J. (2018). Efficiency of multi-trait, indirect, and trait-assisted genomic selection for improvement of biomass sorghum. Theoretical and Applied Genetics, 131, 747-755.
Fernández-González, J., Akdemir, D., & Isidro y Sánchez, J. (2023). A comparison of methods for training population optimization in genomic selection. Theoretical and Applied Genetics, 136(3), 30.
Forni, S., Aguilar, I., & Misztal, I. (2011). Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution, 43, 1-7.
González-Camacho, J., de Los Campos, G., Pérez, P., Gianola, D., Cairns, J., Mahuku, G., Babu, R., & Crossa, J. (2012). Genome-enabled prediction of genetic values using radial basis function neural networks. Theoretical and Applied Genetics, 125, 759-771.
Hayes, B. J., Bowman, P. J., Chamberlain, A. J., & Goddard, M. E. (2009). Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of dairy science, 92(2), 433-443.
Heslot, N., Yang, H. P., Sorrells, M. E., & Jannink, J. L. (2012). Genomic selection in plant breeding: a comparison of models. Crop science, 52(1), 146-160.
Jarvelin, K. (2000). IR evaluation methods for retrieving highly relevant documents. Proc. International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), July 2000,
Kristensen, P. S., Jensen, J., Andersen, J. R., Guzmán, C., Orabi, J., & Jahoor, A. (2019). Genomic prediction and genome-wide association studies of flour yield and alveograph quality traits using advanced winter wheat breeding material. Genes, 10(9), 669.
Laloë, D. (1993). Precision and information in linear models of genetic evaluation. Genetics Selection Evolution, 25(6), 557-576.
Laloë, D., Phocas, F., & Menissier, F. (1996). Considerations on measures of precision and connectedness in mixed linear models of genetic evaluation. Genetics Selection Evolution, 28(4), 359-378.
Meuwissen, T. H., Hayes, B. J., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829.
Ou, J.-H., & Liao, C.-T. (2019). Training set determination for genomic selection. Theoretical and Applied Genetics, 132, 2781-2792.
Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., Rodriguez, V. M., Moreno-Gonzalez, J., Melchinger, A., & Bauer, E. (2012). Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics, 192(2), 715-728.
Spindel, J., Begum, H., Akdemir, D., Virk, P., Collard, B., Redona, E., Atlin, G., Jannink, J.-L., & McCouch, S. R. (2015). Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS genetics, 11(2), e1004982.
Tsai, S.-F., Shen, C.-C., & Liao, C.-T. (2021). Bayesian optimization approaches for identifying the best genotype from a candidate population. Journal of Agricultural, Biological and Environmental Statistics, 26, 519-537.
VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of dairy science, 91(11), 4414-4423.
Wu, P.-Y., Ou, J.-H., & Liao, C.-T. (2023). Sample size determination for training set optimization in genomic prediction. Theoretical and Applied Genetics, 136(3), 57.
Zhao, K., Tung, C.-W., Eizenga, G. C., Wright, M. H., Ali, M. L., Price, A. H., Norton, G. J., Islam, M. R., Reynolds, A., & Mezey, J. (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature communications, 2(1), 467.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88673-
dc.description.abstract隨著分子生物學的進步,基因體選拔 (genomic selection,GS)廣泛用於動物或作物育種計畫中,並成為一項重要的工具。儘管基因型分析 (genotyping)的成本降低,外表型分析 (phenotyping)仍然是要花相對較高的成本以及時間,因此希望透過基因型 (genotype)推測外表型(phenotype),以此加速育種計畫。基因體選拔透過遍布整個基因體 (genome)的基因標誌 (gene markers)以及已知的連續型性狀外表型,建立統計模型,進而憑藉基因型推測出育種價估計值 (genomic estimated breeding values,GEBVs),從中選拔出適合的自交系 (inbred lines)或育種計畫中的雜交組合 (hybrids)。
統計模型的建構中,如何只透過基因型資料,選擇適當的個體當作訓練集 (training set)進行外表型分析,建構出表現好的預測模型,在基因體選拔是個重要的議題。在本文的研究中,分析兩種方法:A-最適準則 (A-optimality)與D-最適準則 (D-optimality)兩種判斷方法,原理是試圖挑出最大變異的個體作為適合的訓練集。我們使用四組不同的作物基因資料,分別使用模擬結果與實際資料,並與之前研究的其他方法相比較,兩者相較於隨機訓練集有比較好的表現。
zh_TW
dc.description.abstractGenomic selection (GS) has become a powerful tool in the domains of plant and animal breeding with advanced and cheaper molecular genetic technology. Despite substantial reduction in genotyping costs, phenotyping still remains a time-consuming and expensive process. As a result, phenotype estimation through genotypic information can accelerate the breeding cycle. In GS, markers of the whole genome are used to estimate genomic estimated breeding values (GEBVs) by statistical models, which are built with genotype and phenotype. These GEBVs facilitate the selection of desirable inbred lines or hybrids for further breeding programs.
In the construction of statistical models, selecting appropriate individuals as the training set based on genotype data and building effective prediction models is a crucial topic in genomic selection. In this study, we evaluated two methods: A-optimality and D-optimality, which are criteria aimed at selecting individuals with the highest level of variation. We utilized four different crop genomic datasets and compared the results with previous studies, using both simulated and real data. Both A-optimality and D-optimality demonstrated better performance compared to random training sets.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:18:59Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-15T17:18:59Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsContents
口試委員會審定書 i
致謝 ii
中文摘要 iii
Abstract iv
Chapter 1 Introduction 1
Chapter 2 Materials and Methods 3
2.1 Genome datasets 3
2.2 GBLUP model 6
2.3 A-optimality and D-optimality 6
2.4 Training set evaluation 8
2.5 Scenarios of this study 10
2.6 Simulation study for validating A-opt and D-opt methods 11
2.7 Analysis of phenotypic values 12
2.8 Comparison with other optimality criteria 12
Chapter 3 Results 13
3.1 The variance pattern of a candidate set in A-optimality 13
3.2 Simulation results 15
3.3 Results of phenotypic value analysis 20
3.4 Comparison results of different training set optimality criteria 24
Chapter 4 Discussion 26
4.1 Coding of marker score matrix 26
4.2 Normalization of the marker score matrix 26
4.3 The influence of subpopulation 27
4.4 The influence of heritability in phenotypic analysis 30
4.5 Robustness in different estimation methods 31
Chapter 5 Conclusion 36
Appendix 1 37
Appendix 2 Source code in R 39
Bibliography 46

 
List of Figures
Figure 1 Bar chart representing the variance in A-opt. 14
Figure 2. The average NDCG values for the Tropical Rice dataset across three heritability levels and various values of k. 16
Figure 3. The average NDCG values for the wheat dataset across three heritability levels and various values of k. 17
Figure 4. The average NDCG values for the sorghum dataset across three heritability levels and various values of k. 18
Figure 5. The average NDCG values for the 44K rice dataset across three heritability levels and various values of k. 19
Figure 6. The average mean of NDCGk@10 for the phenotypic data in the tropical rice dataset. 20
Figure 7. The average mean of NDCGk@10 for the phenotypic data in the wheat dataset. 21
Figure 8. The average mean of NDCGk@10 for the phenotypic data in the sorghum dataset. 22
Figure 9. The average mean of NDCGk@10 for the phenotypic data in the 44K rice dataset. 23
Figure 10. The average mean of NDCGk@10 for the dataset with subpopulation structure, sorghum dataset and 44K rice dataset. 29
Figure 11. The comparison between different estimation methods for the phenotypic data of the tropical rice dataset. 32
Figure 12. The comparison between different estimation methods for the phenotypic data of the wheat dataset. 33
Figure 13. The comparison between different estimation methods for the phenotypic data of the sorghum dataset. 34
Figure 14. The comparison between different estimation methods for the phenotypic data of the 44K rice dataset. 35

 
List of Tables
Table 1. The summary of the datasets 5
Table 2. The training set size for each dataset. 11
Table 3. The comparison between optimality criteria with different training set size across each dataset. 25
Table 4. The heritability of each phenotypic data 30
-
dc.language.isoen-
dc.title用於從候選族群中選拔最佳基因型A-最適與 D-最適訓練集之研究zh_TW
dc.titleA-optimal and D-optimal training sets for identifying the best genotypes for a candidate populationen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蔡欣甫;高振宏zh_TW
dc.contributor.oralexamcommitteeShin-Fu Tsai;Chen-Hung Kaoen
dc.subject.keyword基因體選拔,訓練集選擇,植物育種,基因演算法,混合線性模型,zh_TW
dc.subject.keywordgenomic selection,training set selection,plant breeding,genetic algorithm,linear mixed effect model,en
dc.relation.page48-
dc.identifier.doi10.6342/NTU202301253-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-08-08-
dc.contributor.author-college生物資源暨農學院-
dc.contributor.author-dept農藝學系-
顯示於系所單位:農藝學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf2.98 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved