請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7612
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 曹承礎 | |
dc.contributor.author | Chung-Yen Chen | en |
dc.contributor.author | 陳中彥 | zh_TW |
dc.date.accessioned | 2021-05-19T17:47:52Z | - |
dc.date.available | 2023-03-01 | |
dc.date.available | 2021-05-19T17:47:52Z | - |
dc.date.copyright | 2018-03-01 | |
dc.date.issued | 2017 | |
dc.date.submitted | 2018-02-19 | |
dc.identifier.citation | [1] Hayssam Soueidan, Macha Nikolski, 'Machine learning for metagenomics: methods and tools,' Quantitative Biology, 2016. https://arxiv.org/abs/1510.06621v2
[2] Wang Q, Garrity GM, Tiedje JM, Cole JR. 'Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.' Appl. Environ. Microbiol., 73, 5261-7, 2007 [3] Robert C. Edgar. 'SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences,' bioRxiv 074161, 2016. doi: https://doi.org/10.1101/074161 [4] Nikhil Chaudhary, Ashok K. Sharma, Piyush Agarwa, Ankit Gupta, Vineet K. Sharma. '16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets' PLoS ONE 10, e0116106, 2015. [5] Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje. 'Ribosomal Database Project: data and tools for high throughput rRNA analysis' Nucl. Acids Res. 42(Database issue):D633-D642, 2013. doi: 10.1093/nar/gkt1244 [6] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO 'The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.' Nucl. Acids Res. 41 (D1): D590-D596, 2012. [7] DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G. L. Andersen. 'Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB.' Appl Environ Microbiol 72:5069-72, 2006. [8] Hilde Vinje, Kristian Hovde Liland, Trygve Almøy and Lars Snipen. 'Comparing K-mer based methods for improved classification of 16S sequences' BMC Bioinformatics 16:205, 2015. DOI 10.1186/s12859-015-0647-4. [9] Monika Balvoci ˇ ut¯ e˙, Daniel H. Huson. 'SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare?' BMC Genomics 201718(Suppl 2):114, 2017. DOI: 10.1186/s12864-017-3501-4 [10] Robert G. Beiko. Microbial Malaise, 'How Can We Classify the Microbiome?' Trends Microbiol. 23, 671-679, 2015. DOI:10.1016/j.tim.2015.08.009 [11] Francisco J. Valverde-Albacete, Carmen Peláez-Moreno. '100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox', PLoS ONE 9:e84217. 10.1371/journal.pone.0084217, 2014. [12] X. Zhu, Davidson I 'Knowledge discovery and data mining: challenges and realities', IGI Global, pp. 118-119, 2007. [13] Oded Maimon, Lior Rokach. 'Data Mining and Knowledge Discovery Handbook' Springer, p858, 2010. [14] Schloss, P.D., et al., 'Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities.' Appl Environ Microbiol. 75(23):7537-41, 2009 [15] Jiawei Han, Micheline Kamber, Jian Pei, 'Data Mining: Concepts and Techniques', Third Edition, Elsevier Inc, p371, 2012 [16] George Forman, Martin Scholz. 'Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement', SIGKDD Explorations, Volume 12, Issue 1, pp49-59, 2010 [17] Kuan-Liang Liu, Andrea Porras-Alfaro, Cheryl R. Kuske, Stephanie A. Eichorst, and Gary Xie. 'Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes,' Appl. Environ. Microbiol. , vol. 78 (pg. 1523-1533), 2012 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7612 | - |
dc.description.abstract | 總體基因體學實驗通常通過測序16S和18S rRNA來推斷微生物群落。 分類指派(Taxonomic Assignment)是這些研究的基本步驟。 先前研究中用於測量現有生物分類方法性能的準確性或其他指標有兩個主要問題:基於序列計數和二元誤差量測。 這些使得評估結果具有誤導性,且缺乏完整資訊。
在這項研究中,我們調查兩個問題的不利影響,然後提出新的性能指標:平均分類距離(ATD)和ATD_by_Taxa 以及ATD圖來解決上述兩個問題。 通過比較舊指標和新指標的評估結果,我們發現新的指標於三個測試資料的結果更具信息性,可比性和可靠性。 | zh_TW |
dc.description.abstract | Metagenomics experiments often make inference on microbial communities by sequencing the 16S and 18S rRNA. Taxonomic assignment is a fundamental step in such studies. The accuracy or other metrics used by previous studies for measuring performance of existing taxonomic assignment methods had two major problems: Sequence count based metrics and Binary error measurement. These made the evaluation results misleading and less informative.
In this study, we investigate the bad influences of two problems and then purposed new performance metrics, Average Taxonomy Distance(ATD) and ATD_by_Taxa together with the ATD plot to deal with the problems. By comparing the evaluation results in old metrics and in our new metrics, we found the results more informative, comparable and robust across three test data sets. | en |
dc.description.provenance | Made available in DSpace on 2021-05-19T17:47:52Z (GMT). No. of bitstreams: 1 ntu-106-R04725046-1.pdf: 15267455 bytes, checksum: 5dfad41935e1e19648cd92570fccb82c (MD5) Previous issue date: 2017 | en |
dc.description.tableofcontents | 誌謝 i
ABSTRACT ii CONTENTS iii LIST OF FIGURES v LIST OF TABLES vi Chapter 1 Background of Research 1 1.1 Diversity Profiling and Taxonomic Assignment Using 16S and 18S rRNA gene Classification 1 1.2 Challenges in Evaluating Classification Method Performance 2 1.3 Two Problems for Performance Metrics Used by Previous Studies 5 1.3.1 Sequence Count Based Metrics 6 1.3.2 Binary Error Measurement 11 1.3.3 Summary for Two Problems in Previous Performance Metrics 19 Chapter 2 Methods 20 2.1 Performance Metrics 20 2.1.1 Taxonomy Distance 20 2.1.2 Taxa count based metrics 20 2.2 Data 21 2.3 Stratified 10-fold Cross-Validation and Aggregating 22 2.4 Classification Methods 23 Chapter 3 Results and Discussion 25 3.1 The Effects of Taxa count based metrics and Taxonomy Distance When Estimating Method's Performance 25 3.2 Best Performance and Method Performance 28 3.3 Method Performance Comparison 30 Chapter 4 Conclusion and Future Work 34 4.1 Conclusion 34 4.1.1 Advantages of Taxa count based metrics and Taxonomy Distance 34 4.2 Future Work 34 4.2.1 Potential Risks and limitations 34 4.2.2 Taxonomic Assignment Using Other Methods or Biomarkers and Detailed Interpretation 35 REFERENCE 37 Supplementary Figures 40 Part I Plot tables for validating other methods on other databases 40 Part II ATD plots for plateau-other method difference on other databases 53 | |
dc.language.iso | en | |
dc.title | 以基於生物分類之效能評估 16S 和 18S rRNA 基因分類方法 | zh_TW |
dc.title | Evaluating 16S and 18S rRNA Gene Classification Methods
Using Taxonomy Based Performance Metrics | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 湯森林,盧信銘 | |
dc.subject.keyword | 總體基因體學,分類,效能量度,資料分析, | zh_TW |
dc.subject.keyword | Metagenomics,Classification,Performance Evaluation,Data analysis, | en |
dc.relation.page | 59 | |
dc.identifier.doi | 10.6342/NTU201800519 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2018-02-20 | |
dc.contributor.author-college | 管理學院 | zh_TW |
dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-1.pdf | 14.91 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。