多重對應分析應用於核心種原篩選之研究

Nien-Lun Wu; 吳念倫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21614

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蔡政安(Chen-An Tsai)
dc.contributor.author	Nien-Lun Wu	en
dc.contributor.author	吳念倫	zh_TW
dc.date.accessioned	2021-06-08T03:39:49Z	-
dc.date.copyright	2019-07-17
dc.date.issued	2019
dc.date.submitted	2019-07-08
dc.identifier.citation	黃麗蒨 (2000) 臺灣地區地下水品質之統計研究，國立中央大學統計學研究所碩士論文王群山、張心怡、胡凱康 (2017) 種苗產業新趨勢研討會專刊 Abdi1 H., ValentinN D. (2007). Multiple Correspondence Analysis. Encyclopedia of Measurement and Statistics. DOI: http://dx.doi.org/10.4135/9781412952644. n299 De Beukelaer H., Smykal P., Davenport G.F, Fack V. (2012) Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search. BMC Bioinformatics. 13:312. DOI: 10.1186/1471210513312 De Beukelaer H., Davenport G.F, Fack V. (2018) Core Hunter 3: flexible core subset selection. BMC Bioinformatics. 19:203 DOI: 10.1186/s128590182209z Franco J., Crossa J., Taba S., Shands H. (2005) A Sampling Strategy for Conserving Genetic Diversity when Forming Core Subsets. Crop Sci. 45: 1035-1044. DOI: 10.2135/cropsci2004.0292 Gouesnard B., Bataillon T. M., Decoux G., Rozale C., Schoen D. J., David J. L. (2001) MSTRAT: An Algorithm for Building Germ Plasm Core Collections by Maxim-izing Allelic or Phenotypic Richness. Journal of Heredity. Volume 92, Issue 1, Pages 93–94. DOI: https://doi.org/10.1093/jhered/92.1.93 Greenacre M. (1993) Correspondence analysis in practice (Second Edition), 137–144. DOI: https://doi.org/10.1201/9781420011234 Jeong S., Kim J.Y., Jeong S.C., Kang S.T., Moon J.K., Kim N. (2017) GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets. PLoS ONE 12(7). DOI: https://doi.org/10.1371/journal.pone.0181420 Kim K.W., Chung H.K., Cho G.T., Ma K.H., Chandrabalan D., Gwag J.G., Kim T.S., Cho E.G., Park Y.J. (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics. Volume 23, Issue 16, 15, Pages 2155–2162. DOI:https://doi.org/10.1093/ bioinformatics/ btm313 Kohonen T. (1990) The self-organizing map. Proceedings of the IEEE, Volume: 78 , Issue: 9. DOI: 10.1109/5.58325 Kohonen T. (1980). Self-Organizing Maps. Springer, pages 106–115. DOI: 10.1007 /978-3-642-56927-2 Marler R. T., Arora J. S. (2005) Function-transformation methods for multi-objective optimization. Engineering Optimization, 37:6, 551-570. DOI: 10.1080/03052150500114289 Odong T.L., Jansen J., van Eeuwijk F.A., van Hintum T.J. (2013). Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. TAG. Theoretical and applied genetics. Theoretische und ange-wandte Genetik, 126(2), 289–305. DOI:10.1007/s00122-012-1971-y Olteanu M., Villa-Vialaneix N. (2015) Using SOMbrero for clustering and visualizing graphs. Journal de la Societe Française de Statistique, Societe Française de Statistique et Societe Mathematique de France. 156 (3), pp.95-119. HAL Id: hal-01232672 Reif J. C., Melchinger A. E., Frisch M. (2005) Genetical and Mathematical Properties of Similarity and Dissimilarity Coefficients Applied in Plant Breeding and Seed Bank Management. Crop Sci. 45:1-7. DOI:10.2135/cropsci2005.0001 Rousseeuw P.J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, pages 53-65. DOI: 10.1016/0377-0427(87)90125-7 Thachuk C., Crossa J., Franco J., Dreisigacker S., Warburton M., Davenport G.F (2009) Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinformatics. 10:243. DOI: https://doi.org/10.1186 /1471210510243 Vargas A. M., de Andrés M. T., Ibáñez J. (2016). Maximization of minority classes in core collections designed for association studies. Tree Genetics & Genomes, 12: 28. DOI: https://doi.org/10.1007/s1129501609889
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21614	-
dc.description.abstract	自1984年Frankel提出了核心種原（core collection）這一概念之後，對於如何篩選核心種原以利用少量的蒐集系（accession）來達到最大的遺傳多樣性這點，就不斷有研究者提出想法及篩選方式。而隨著近代次世代基因定序技術的快速發展，在篩選基因型數據上所需要面對的資料量愈來愈龐大，使一部分曾經可行的方式變得難以有效執行。本論文目的為找出一個能夠應用於大量基因型數據的篩選方法，結合了多重對應分析（multiple correspondence analysis, MCA）中的多維度座標計算來進行集群分析，並參考了具有相同目標的GenoCore（Jeong et al., 2017）的篩選方式，組成了一個新的核心種原的篩選方法。為了比較該方法的效果，以覆蓋率（coverage）、香農多樣性指數（Shannon's diversity index）、平均改良型羅傑斯指數（mean modified Rogers value）、及兩兩間最小改良型羅傑斯指數（minimum modified Rogers value），這四項指標作為核心種原品質的評估；此外，也加入了資料分析需要的時間作為篩選方式評估的一部分。研究中利用了四份資料量不一的數據，1.5K SNP的水稻、14K SNP的小麥、37K SNP的水稻及 820K SNP的小麥，以GenoCore及Core Hunter 3這兩項常見的核心種原篩選方法進行分析，將他們的結果與本研究之核心種原結果相互比較。而從本論文所使用的模擬資料結果來看，已經可說是達到了目標：找到能夠應用於大量基因型數據的核心種原挑選方式。且與現有方法的比較中，對比GenoCore，本研究方法能夠在維持覆蓋率達到 99%的情況下，改善核心種原的其餘指標；而相對於Core Hunter 3，本研究方法能更有效率的達到高覆蓋率，並且皆能在合理的時間範圍內，完成四份資料的運算。對於未來應改善的方向，由於從目前的研究結果中，推測核心種原的品質可能受集群分析精確度影響，故可以從改善分群方面著手；以及從Core Hunter 3在結果中的優勢來看，篩選過程中或許能考慮加入：單元（entry）與最近單元距離，及蒐集系與最近單元距離這兩項標準，以進一步提高核心種原中距離以及遺傳多樣性為目標。	zh_TW
dc.description.abstract	Since Frankel proposed the concept of core collection in 1984, many researchers have proposed different methods on how to choose core collection which is aiming to achieve maximum genetic diversity with a small number of accessions. With the rapid development of next generation sequencing technology, genotype data easily reaches to enormous amount, and hence some effective methods in the past have difficulty to execute analyses. The goal of this paper is to propose an algorithm to select a core set of lines using a large genotype data that maximizes possible genetic diversity with a given user-defined number of lines. We apply the multiple correspondence analysis for cluster analysis, and borrow the algorithm of GenoCore which is with the same goal as a reference to constitute a new selecting method for core collection. We demonstrate the ability of our proposed method by using four evaluative criteria of the quality of core collection: coverage rate, Shannon's diversity index, mean modified Rogers value, and minimum modified Rogers value. In addition, the computing time is included as a part of evaluative criteria. We compare our approach to two previously developed methods, GenoCore and Core Hunter 3, by using four SNP datasets with amounts of 1.5K SNPs, 14K SNPs, 37K SNPs and 820K SNPs, respectively. From the results of the simulation used in this paper, our proposed method often exhibits good performances in terms of evaluative criteria. We found that while maintaining coverage of 99%, our method has higher value than GenoCore for the quality of core collection, and can efficiently achieve a higher coverage rate than Core Hunter 3. In addition, our method can finish analyses in a reasonable time range for all of four data-sets. The direction of methodology improvement in the future includes more precise cluster analysis. In addition, the choosing process should be considered adding the distance between the entry and the nearest entry, and the distance between accessions and the nearest entry to selecting standards for further increasing the distance between each entry in a core collection and genetic diversity.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:39:49Z (GMT). No. of bitstreams: 1 ntu-108-R06621202-1.pdf: 4491409 bytes, checksum: 3db96ac0153251d0ecbfd53bea07a488 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	中文摘要 .......................................................................................................................... i ABSTRACT ................................................................................................................... .ii 目錄 .................................................................................................................................iii 表目錄 ..............................................................................................................................v 圖目錄 .............................................................................................................................vi 專有名詞中英對照表....................................................................................................viii 第一章前言............................................................................................................1 第一節研究背景及目的...........................................................................1 第二節前人研究方法...............................................................................3 一、 GenoCore...............................................................................................3 二、 Core Hunter 3........................................................................................5 第二章材料及方法................................................................................................8 第一節使用數據介紹...............................................................................8 第二節方法...............................................................................................9 一、多重對應分析.......................................................................................9 二、平均側影法.........................................................................................12 三、分群方式.............................................................................................14 四、挑選方法.............................................................................................18 第三節評估指標.....................................................................................20 一、覆蓋率.................................................................................................20 二、香農多樣性指數.................................................................................20 三、平均改良型羅傑斯指數.....................................................................21 四、倆倆最小改良型羅傑斯指數.............................................................21 第三章結果..........................................................................................................23 第一節 1.5K SNP 水稻............................................................................23 第二節 14K SNP 小麥.............................................................................25 第三節 37K SNP 水稻.............................................................................27 第四節 820K SNP 小麥...........................................................................28 第五節總結.............................................................................................31 第四章討論..........................................................................................................32 參考文獻.........................................................................................................................35
dc.language.iso	zh-TW
dc.title	多重對應分析應用於核心種原篩選之研究	zh_TW
dc.title	A Study on the Selection of Core Collection based on Multiple Correspondence Analysis	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	劉力瑜,蔡欣甫,陳凱儀
dc.subject.keyword	核心種原,多重對應分析,集群分析,基因型數據,巨量資料,	zh_TW
dc.subject.keyword	core collection,multiple correspondence analysis,cluster analysis,genotype data,big data,	en
dc.relation.page	79
dc.identifier.doi	10.6342/NTU201901294
dc.rights.note	未授權
dc.date.accepted	2019-07-09
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	4.39 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。