Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7612
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor曹承礎
dc.contributor.authorChung-Yen Chenen
dc.contributor.author陳中彥zh_TW
dc.date.accessioned2021-05-19T17:47:52Z-
dc.date.available2023-03-01
dc.date.available2021-05-19T17:47:52Z-
dc.date.copyright2018-03-01
dc.date.issued2017
dc.date.submitted2018-02-19
dc.identifier.citation[1] Hayssam Soueidan, Macha Nikolski, 'Machine learning for metagenomics: methods and tools,' Quantitative Biology, 2016. https://arxiv.org/abs/1510.06621v2
[2] Wang Q, Garrity GM, Tiedje JM, Cole JR. 'Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.' Appl. Environ. Microbiol., 73, 5261-7, 2007
[3] Robert C. Edgar. 'SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences,' bioRxiv 074161, 2016. doi: https://doi.org/10.1101/074161
[4] Nikhil Chaudhary, Ashok K. Sharma, Piyush Agarwa, Ankit Gupta, Vineet K. Sharma. '16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets' PLoS ONE 10, e0116106, 2015.
[5] Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje. 'Ribosomal Database Project: data and tools for high throughput rRNA analysis' Nucl. Acids Res. 42(Database issue):D633-D642, 2013. doi: 10.1093/nar/gkt1244
[6] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO 'The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.' Nucl. Acids Res. 41 (D1): D590-D596, 2012.
[7] DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G. L. Andersen. 'Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB.' Appl Environ Microbiol 72:5069-72, 2006.
[8] Hilde Vinje, Kristian Hovde Liland, Trygve Almøy and Lars Snipen. 'Comparing K-mer based methods for improved classification of 16S sequences' BMC Bioinformatics 16:205, 2015. DOI 10.1186/s12859-015-0647-4.
[9] Monika Balvoci ˇ ut¯ e˙, Daniel H. Huson. 'SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare?' BMC Genomics 201718(Suppl 2):114, 2017. DOI: 10.1186/s12864-017-3501-4
[10] Robert G. Beiko. Microbial Malaise, 'How Can We Classify the Microbiome?' Trends Microbiol. 23, 671-679, 2015. DOI:10.1016/j.tim.2015.08.009
[11] Francisco J. Valverde-Albacete, Carmen Peláez-Moreno. '100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox', PLoS ONE 9:e84217. 10.1371/journal.pone.0084217, 2014.
[12] X. Zhu, Davidson I 'Knowledge discovery and data mining: challenges and realities', IGI Global, pp. 118-119, 2007.
[13] Oded Maimon, Lior Rokach. 'Data Mining and Knowledge Discovery Handbook' Springer, p858, 2010.
[14] Schloss, P.D., et al., 'Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities.' Appl Environ Microbiol. 75(23):7537-41, 2009
[15] Jiawei Han, Micheline Kamber, Jian Pei, 'Data Mining: Concepts and Techniques', Third Edition, Elsevier Inc, p371, 2012
[16] George Forman, Martin Scholz. 'Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement', SIGKDD Explorations, Volume 12, Issue 1, pp49-59, 2010
[17] Kuan-Liang Liu, Andrea Porras-Alfaro, Cheryl R. Kuske, Stephanie A. Eichorst, and Gary Xie. 'Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes,' Appl. Environ. Microbiol. , vol. 78 (pg. 1523-1533), 2012
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7612-
dc.description.abstract總體基因體學實驗通常通過測序16S和18S rRNA來推斷微生物群落。 分類指派(Taxonomic Assignment)是這些研究的基本步驟。 先前研究中用於測量現有生物分類方法性能的準確性或其他指標有兩個主要問題:基於序列計數和二元誤差量測。 這些使得評估結果具有誤導性,且缺乏完整資訊。
在這項研究中,我們調查兩個問題的不利影響,然後提出新的性能指標:平均分類距離(ATD)和ATD_by_Taxa 以及ATD圖來解決上述兩個問題。 通過比較舊指標和新指標的評估結果,我們發現新的指標於三個測試資料的結果更具信息性,可比性和可靠性。
zh_TW
dc.description.abstractMetagenomics experiments often make inference on microbial communities by sequencing the 16S and 18S rRNA. Taxonomic assignment is a fundamental step in such studies. The accuracy or other metrics used by previous studies for measuring performance of existing taxonomic assignment methods had two major problems: Sequence count based metrics and Binary error measurement. These made the evaluation results misleading and less informative.
In this study, we investigate the bad influences of two problems and then purposed new performance metrics, Average Taxonomy Distance(ATD) and ATD_by_Taxa together with the ATD plot to deal with the problems. By comparing the evaluation results in old metrics and in our new metrics, we found the results more informative, comparable and robust across three test data sets.
en
dc.description.provenanceMade available in DSpace on 2021-05-19T17:47:52Z (GMT). No. of bitstreams: 1
ntu-106-R04725046-1.pdf: 15267455 bytes, checksum: 5dfad41935e1e19648cd92570fccb82c (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents誌謝 i
ABSTRACT ii
CONTENTS iii
LIST OF FIGURES v
LIST OF TABLES vi
Chapter 1 Background of Research 1
1.1 Diversity Profiling and Taxonomic Assignment Using 16S and 18S rRNA gene Classification 1
1.2 Challenges in Evaluating Classification Method Performance 2
1.3 Two Problems for Performance Metrics Used by Previous Studies 5
1.3.1 Sequence Count Based Metrics 6
1.3.2 Binary Error Measurement 11
1.3.3 Summary for Two Problems in Previous Performance Metrics 19
Chapter 2 Methods 20
2.1 Performance Metrics 20
2.1.1 Taxonomy Distance 20
2.1.2 Taxa count based metrics 20
2.2 Data 21
2.3 Stratified 10-fold Cross-Validation and Aggregating 22
2.4 Classification Methods 23
Chapter 3 Results and Discussion 25
3.1 The Effects of Taxa count based metrics and Taxonomy Distance When Estimating Method's Performance 25
3.2 Best Performance and Method Performance 28
3.3 Method Performance Comparison 30
Chapter 4 Conclusion and Future Work 34
4.1 Conclusion 34
4.1.1 Advantages of Taxa count based metrics and Taxonomy Distance 34
4.2 Future Work 34
4.2.1 Potential Risks and limitations 34
4.2.2 Taxonomic Assignment Using Other Methods or Biomarkers and Detailed Interpretation 35
REFERENCE 37
Supplementary Figures 40
Part I Plot tables for validating other methods on other databases 40
Part II ATD plots for plateau-other method difference on other databases 53
dc.language.isoen
dc.title以基於生物分類之效能評估 16S 和 18S rRNA 基因分類方法zh_TW
dc.titleEvaluating 16S and 18S rRNA Gene Classification Methods
Using Taxonomy Based Performance Metrics
en
dc.typeThesis
dc.date.schoolyear106-1
dc.description.degree碩士
dc.contributor.oralexamcommittee湯森林,盧信銘
dc.subject.keyword總體基因體學,分類,效能量度,資料分析,zh_TW
dc.subject.keywordMetagenomics,Classification,Performance Evaluation,Data analysis,en
dc.relation.page59
dc.identifier.doi10.6342/NTU201800519
dc.rights.note同意授權(全球公開)
dc.date.accepted2018-02-20
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf14.91 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved