Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 農藝學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21614
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡政安(Chen-An Tsai)
dc.contributor.authorNien-Lun Wuen
dc.contributor.author吳念倫zh_TW
dc.date.accessioned2021-06-08T03:39:49Z-
dc.date.copyright2019-07-17
dc.date.issued2019
dc.date.submitted2019-07-08
dc.identifier.citation黃麗蒨 (2000) 臺灣地區地下水品質之統計研究,國立中央大學統計學研究所碩士論文
王群山、張心怡、胡凱康 (2017) 種苗產業新趨勢研討會專刊
Abdi1 H., ValentinN D. (2007). Multiple Correspondence Analysis. Encyclopedia of Measurement and Statistics. DOI: http://dx.doi.org/10.4135/9781412952644. n299
De Beukelaer H., Smykal P., Davenport G.F, Fack V. (2012) Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search. BMC Bioinformatics. 13:312. DOI: 10.1186/1471210513312
De Beukelaer H., Davenport G.F, Fack V. (2018) Core Hunter 3: flexible core subset selection. BMC Bioinformatics. 19:203 DOI: 10.1186/s128590182209z
Franco J., Crossa J., Taba S., Shands H. (2005) A Sampling Strategy for Conserving Genetic Diversity when Forming Core Subsets. Crop Sci. 45: 1035-1044. DOI: 10.2135/cropsci2004.0292
Gouesnard B., Bataillon T. M., Decoux G., Rozale C., Schoen D. J., David J. L. (2001) MSTRAT: An Algorithm for Building Germ Plasm Core Collections by Maxim-izing Allelic or Phenotypic Richness. Journal of Heredity. Volume 92, Issue 1, Pages 93–94. DOI: https://doi.org/10.1093/jhered/92.1.93
Greenacre M. (1993) Correspondence analysis in practice (Second Edition), 137–144. DOI: https://doi.org/10.1201/9781420011234
Jeong S., Kim J.Y., Jeong S.C., Kang S.T., Moon J.K., Kim N. (2017) GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets. PLoS ONE 12(7). DOI: https://doi.org/10.1371/journal.pone.0181420
Kim K.W., Chung H.K., Cho G.T., Ma K.H., Chandrabalan D., Gwag J.G., Kim T.S., Cho E.G., Park Y.J. (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics. Volume 23, Issue 16, 15, Pages 2155–2162. DOI:https://doi.org/10.1093/ bioinformatics/ btm313
Kohonen T. (1990) The self-organizing map. Proceedings of the IEEE, Volume: 78 , Issue: 9. DOI: 10.1109/5.58325
Kohonen T. (1980). Self-Organizing Maps. Springer, pages 106–115. DOI: 10.1007 /978-3-642-56927-2
Marler R. T., Arora J. S. (2005) Function-transformation methods for multi-objective optimization. Engineering Optimization, 37:6, 551-570. DOI: 10.1080/03052150500114289
Odong T.L., Jansen J., van Eeuwijk F.A., van Hintum T.J. (2013). Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. TAG. Theoretical and applied genetics. Theoretische und ange-wandte Genetik, 126(2), 289–305. DOI:10.1007/s00122-012-1971-y
Olteanu M., Villa-Vialaneix N. (2015) Using SOMbrero for clustering and visualizing graphs. Journal de la Societe Française de Statistique, Societe Française de Statistique et Societe Mathematique de France. 156 (3), pp.95-119. HAL Id: hal-01232672
Reif J. C., Melchinger A. E., Frisch M. (2005) Genetical and Mathematical Properties of Similarity and Dissimilarity Coefficients Applied in Plant Breeding and Seed Bank Management. Crop Sci. 45:1-7. DOI:10.2135/cropsci2005.0001
Rousseeuw P.J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, pages 53-65. DOI: 10.1016/0377-0427(87)90125-7
Thachuk C., Crossa J., Franco J., Dreisigacker S., Warburton M., Davenport G.F (2009) Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinformatics. 10:243. DOI: https://doi.org/10.1186 /1471210510243
Vargas A. M., de Andrés M. T., Ibáñez J. (2016). Maximization of minority classes in core collections designed for association studies. Tree Genetics & Genomes, 12: 28. DOI: https://doi.org/10.1007/s1129501609889
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21614-
dc.description.abstract自1984年Frankel提出了核心種原(core collection)這一概念之後,對於如何篩選核心種原以利用少量的蒐集系(accession)來達到最大的遺傳多樣性這點,就不斷有研究者提出想法及篩選方式。而隨著近代次世代基因定序技術的快速發展,在篩選基因型數據上所需要面對的資料量愈來愈龐大,使一部分曾經可行的方式變得難以有效執行。本論文目的為找出一個能夠應用於大量基因型數據的篩選方法,結合了多重對應分析(multiple correspondence analysis, MCA)中的多維度座標計算來進行集群分析,並參考了具有相同目標的GenoCore(Jeong et al., 2017)的篩選方式,組成了一個新的核心種原的篩選方法。為了比較該方法的效果,以覆蓋率(coverage)、香農多樣性指數(Shannon's diversity index)、平均改良型羅傑斯指數(mean modified Rogers value)、及兩兩間最小改良型羅傑斯指數(minimum modified Rogers value),這四項指標作為核心種原品質的評估;此外,也加入了資料分析需要的時間作為篩選方式評估的一部分。研究中利用了四份資料量不一的數據,1.5K SNP的水稻、14K SNP的小麥、37K SNP的水稻及 820K SNP的小麥,以GenoCore及Core Hunter 3這兩項常見的核心種原篩選方法進行分析,將他們的結果與本研究之核心種原結果相互比較。而從本論文所使用的模擬資料結果來看,已經可說是達到了目標:找到能夠應用於大量基因型數據的核心種原挑選方式。且與現有方法的比較中,對比GenoCore,本研究方法能夠在維持覆蓋率達到 99%的情況下,改善核心種原的其餘指標;而相對於Core Hunter 3,本研究方法能更有效率的達到高覆蓋率,並且皆能在合理的時間範圍內,完成四份資料的運算。對於未來應改善的方向,由於從目前的研究結果中,推測核心種原的品質可能受集群分析精確度影響,故可以從改善分群方面著手;以及從Core Hunter 3在結果中的優勢來看,篩選過程中或許能考慮加入:單元(entry)與最近單元距離,及蒐集系與最近單元距離這兩項標準,以進一步提高核心種原中距離以及遺傳多樣性為目標。zh_TW
dc.description.abstractSince Frankel proposed the concept of core collection in 1984, many researchers have proposed different methods on how to choose core collection which is aiming to achieve maximum genetic diversity with a small number of accessions. With the rapid development of next generation sequencing technology, genotype data easily reaches to enormous amount, and hence some effective methods in the past have difficulty to execute analyses. The goal of this paper is to propose an algorithm to select a core set of lines using a large genotype data that maximizes possible genetic diversity with a given user-defined number of lines. We apply the multiple correspondence analysis for cluster analysis, and borrow the algorithm of GenoCore which is with the same goal as a reference to constitute a new selecting method for core collection. We demonstrate the ability of our proposed method by using four evaluative criteria of the quality of core collection: coverage rate, Shannon's diversity index, mean modified Rogers value, and minimum modified Rogers value. In addition, the computing time is included as a part of evaluative criteria. We compare our approach to two previously developed methods, GenoCore and Core Hunter 3, by using four SNP datasets with amounts of 1.5K SNPs, 14K SNPs, 37K SNPs and 820K SNPs, respectively.
From the results of the simulation used in this paper, our proposed method often exhibits good performances in terms of evaluative criteria. We found that while maintaining coverage of 99%, our method has higher value than GenoCore for the quality of core collection, and can efficiently achieve a higher coverage rate than Core Hunter 3. In addition, our method can finish analyses in a reasonable time range for all of four data-sets. The direction of methodology improvement in the future includes more precise cluster analysis. In addition, the choosing process should be considered adding the distance between the entry and the nearest entry, and the distance between accessions and the nearest entry to selecting standards for further increasing the distance between each entry in a core collection and genetic diversity.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T03:39:49Z (GMT). No. of bitstreams: 1
ntu-108-R06621202-1.pdf: 4491409 bytes, checksum: 3db96ac0153251d0ecbfd53bea07a488 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents中文摘要 .......................................................................................................................... i
ABSTRACT ................................................................................................................... .ii
目錄 .................................................................................................................................iii
表目錄 ..............................................................................................................................v
圖目錄 .............................................................................................................................vi
專有名詞中英對照表....................................................................................................viii
第一章 前言............................................................................................................1
第一節 研究背景及目的...........................................................................1
第二節 前人研究方法...............................................................................3
一、 GenoCore...............................................................................................3
二、 Core Hunter 3........................................................................................5
第二章 材料及方法................................................................................................8
第一節 使用數據介紹...............................................................................8
第二節 方法...............................................................................................9
一、 多重對應分析.......................................................................................9
二、 平均側影法.........................................................................................12
三、 分群方式.............................................................................................14
四、 挑選方法.............................................................................................18
第三節 評估指標.....................................................................................20
一、 覆蓋率.................................................................................................20
二、 香農多樣性指數.................................................................................20
三、 平均改良型羅傑斯指數.....................................................................21
四、 倆倆最小改良型羅傑斯指數.............................................................21
第三章 結果..........................................................................................................23
第一節 1.5K SNP 水稻............................................................................23
第二節 14K SNP 小麥.............................................................................25
第三節 37K SNP 水稻.............................................................................27
第四節 820K SNP 小麥...........................................................................28
第五節 總結.............................................................................................31
第四章 討論..........................................................................................................32
參考文獻.........................................................................................................................35
dc.language.isozh-TW
dc.title多重對應分析應用於核心種原篩選之研究zh_TW
dc.titleA Study on the Selection of Core Collection based on Multiple Correspondence Analysisen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee劉力瑜,蔡欣甫,陳凱儀
dc.subject.keyword核心種原,多重對應分析,集群分析,基因型數據,巨量資料,zh_TW
dc.subject.keywordcore collection,multiple correspondence analysis,cluster analysis,genotype data,big data,en
dc.relation.page79
dc.identifier.doi10.6342/NTU201901294
dc.rights.note未授權
dc.date.accepted2019-07-09
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept農藝學研究所zh_TW
顯示於系所單位:農藝學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  目前未授權公開取用
4.39 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved