Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 農藝學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52139
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡政安
dc.contributor.authorYu-Shiang Zengen
dc.contributor.author曾禹翔zh_TW
dc.date.accessioned2021-06-15T16:08:32Z-
dc.date.available2015-08-25
dc.date.copyright2015-08-25
dc.date.issued2015
dc.date.submitted2015-08-19
dc.identifier.citation[1] Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating
inhibitors. Proc. Natl. Acad. Sci. U. S. A. 74, 5463–5467 (1977).
[2] Heiger, D. N., Cohen, A. S. & Karger, B. L. Separation of DNA restriction fragments
by high performance capillary electrophoresis with low and zero crosslinked polyacrylamide
using continuous and pulsed electric fields. J. Chromatogr. 516, 33–48
(1990).
[3] Mani, U., Mukund, S. & Ravisankar, S. Sanger method of DNA sequencing (2014).
URL http://www.bioindians.org/index.html.
[4] Drmanac, R., Labat, I., Brukner, I. & Crkvenjakov, R. Sequencing of megabase plus
DNA by hybridization: theory of the method. Genomics 4, 114–128 (1989).
[5] Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proc. Natl. Acad.
Sci. U. S. A. 74, 560–564 (1977).
[6] Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26,
1135–1145 (2008).
[7] Barba, M., Czosnek, H. & Hadidi, A. Historical perspective, development and applications
of next-generation sequencing in plant virology. Viruses 6, 106–36 (2014).
[8] Branton, D. et al. The potential and challenges of nanopore sequencing. Nat. Biotechnol.
26, 1146–1153 (2008).
[9] Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers.
Nature 457, 854–858 (2009).
[10] Wang, L., Li, P. & Brutnell, T. P. Exploring plant transcriptomes using ultra highthroughput
sequencing. Briefings Funct. Genomics Proteomics 9, 118–128 (2010).
[11] Oshlack, A., Robinson, M. D. & Young, M. D. From RNA-seq reads to differential
expression results. Genome Biol. 11, 220 (2010).
[12] Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. comparison
with gene expression arrays RNA-seq : An assessment of technical reproducibility
and comparison with gene expression arrays. Genome Res. 1509–1517 (2008).
[13] Robinson, M. D. & Smyth, G. K. Moderated statistical tests for assessing differences
in tag abundance. Bioinformatics 23, 2881–2887 (2007).
[14] Anders, S. & Huber, W. Differential expression analysis for sequence count data.
Genome Biol. 11, R106 (2010).
[15] Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of
RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7,
1009–1015 (2010).
[16] Glaus, P., Honkela, A. & Rattray, M. Identifying differentially expressed transcripts
from RNA-seq data with biological variation. Bioinformatics 28, 1721–1728 (2012).
[17] Hastings, W. K. Monte Carlo sampling methods using Markov chains and their
applications. Biometrika 57, 97–109 (1970).
[18] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E.
Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21,
1087–1092 (1953).
[19] Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution
with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
[20] Feng, J. et al. GFOLD: A generalized fold change for ranking differentially expressed
genes from RNA-seq data. Bioinformatics 28, 2782–2788 (2012).
[21] Seesi, S. A., Tiagueu, Y. T. & Zelikovsky, A. Bootstrap-based differential gene
expression analysis for RNA-Seq data with and without replicates. BMC Genomic
15, 1–6 (2014).
[22] Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves
differential expression detection in RNA-seq data. Biostatistics 14, 232–243 (2013).
[23] Lu, J., Tomfohr, J. K. & Kepler, T. B. Identifying differential expression in multiple
SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics
6, 165 (2005).
[24] Robinson, M. D. & Smyth, G. K. Small-sample estimation of negative binomial
dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2008).
[25] Landau, W. M. & Liu, P. Dispersion estimation and its effect on test performance in
RNA-seq data analysis: A simulation-based comparison of methods. PLoS One 8
(2013).
[26] Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression
variation with RNA sequencing. Nature 464, 768–772 (2010).
[27] Hammer, P. et al. mRNA-seq with agnostic splice site discovery for nervous system
transcriptomics tested in chronic pain. Genome Res. 20, 847–860 (2010).
[28] ’t Hoen, P. et al. Deep sequencing-based expression analysis shows major advances
in robustness, resolution and inter-lab portability over five microarray platforms.
Nucleic Acids Res. 36 (2008).
[29] Lai, Y. Differential expression analysis of digital gene expression data: RNA-tag
filtering, comparison of t-type tests and their genomewide co-expression based adjustments.
Changes 29, 997–1003 (2012).
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52139-
dc.description.abstract近來隨著次世代定序技術發展愈來愈快速以及日趨成熟,這項科技
已經在各個領域廣泛的被使用到,如醫學、農業、生物科技等等。次世
代定序技術可以用來做全基因體定序,也可以將一些已知的物種重新
定序,更可以探討在生物性上的理論,而其中一項重要的應用就是轉
錄體定序(RNA-seq) 資料。轉錄體定序資料常被用來檢定基因表現量,
近年來,轉錄體定序資料已漸漸取代微陣列資料(Microarray) 成為研究基因表現量的一個指標。然而在探討轉錄體定序資料時,由於它是屬
於離散型變數,且資料會發生變異數大於平均值的現象,這種現象我
們稱作過度離異(over-dispersion)。我們通常會用負二項分配(Negative
Binomial Model) 解決過度離異問題,但如何估計模型中的參數,這其
中又牽涉到許多統計方法。近來常見的如DESeq、edgeR 跟DSS 都是
在分析上常用的方法。但這幾種方法都是用點估計來估計參數,並沒
有將不確定性考慮進去。在本論文中,我們建立了兩個模型,分別為
對數線性模型,以及貝氏階層模型,利用馬可夫鏈蒙地卡羅(MCMC)
的方法得到我們有興趣的參數,進而可以找出表現量不同的基因。最
後我們分別利用模擬資料以及實際資料來評估DESeq、edgeR、DSS 以
及我們方法的好壞。其中我們發現當各組的重複數接近甚至相同的時
候,我們的線性對數模型相較於其他方法是表現較好的;而當重複數
如果是極端不平衡的情況之下,我們會建議利用中位數估計法來進行
檢定。
zh_TW
dc.description.abstractWith the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes
whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific
gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts,
the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur.
To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS
only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data.
Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR
and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting
differentially expressed genes.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T16:08:32Z (GMT). No. of bitstreams: 1
ntu-104-R02621208-1.pdf: 4843119 bytes, checksum: dd113e6a98dcda218806bc2966988583 (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents摘要ii
Abstract iv
1 Introduction 1
1.1 Brief Overview of RNA-seq Studies . . . . . . . . . . . . . . . . . . . . 1
1.2 Challenges in Analysis Methods for RNA-seq Data . . . . . . . . . . . . 5
1.3 Contributions of Our Proposed Method . . . . . . . . . . . . . . . . . . 7
2 Review of Current Methods 9
2.1 The Dispersion Shrinkage for Sequencing (DSS) . . . . . . . . . . . . . 9
2.2 Moderated Statistical Tests for Assessing Differences in Tag Abundance
(edgeR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Differential Expression Analysis for Sequence Count Data (DESeq) . . . 14
3 The Proposed Methods 18
3.1 Gamma-Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Log-Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Numerical Studies 23
4.1 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Control of Type I Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Estimation of ϕ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Accuracy of DE Detection . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 FDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Discussion and Conclusions 34
Bibliography 81
dc.language.isozh-TW
dc.subject轉錄體定序資料zh_TW
dc.subject基因表現量zh_TW
dc.subject貝氏分析zh_TW
dc.subject對數線性模型zh_TW
dc.subjectGene expressionen
dc.subjectLog-linear modelen
dc.subjectBayesian inferenceen
dc.subjectRNA-seqen
dc.title以貝氏分析方法來偵測轉錄體定序資料之顯著基因zh_TW
dc.titleIdentification of Differentially Expressed Genes of
RNA-Seq Data based on Bayesian Approaches
en
dc.typeThesis
dc.date.schoolyear103-2
dc.description.degree碩士
dc.contributor.oralexamcommittee劉仁沛,劉力瑜,謝叔蓉
dc.subject.keyword轉錄體定序資料,基因表現量,貝氏分析,對數線性模型,zh_TW
dc.subject.keywordRNA-seq,Gene expression,Bayesian inference,Log-linear model,en
dc.relation.page84
dc.rights.note有償授權
dc.date.accepted2015-08-19
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept農藝學研究所zh_TW
顯示於系所單位:農藝學系

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf
  未授權公開取用
4.73 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved