Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 生醫電子與資訊學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98038
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor莊曜宇zh_TW
dc.contributor.advisorEric Y. Chuangen
dc.contributor.author莊聰賢zh_TW
dc.contributor.authorTsung-Hsien Chuangen
dc.date.accessioned2025-07-23T16:33:09Z-
dc.date.available2025-07-24-
dc.date.copyright2025-07-23-
dc.date.issued2025-
dc.date.submitted2025-07-14-
dc.identifier.citationZhang, Y., et al., Single‐cell RNA sequencing in cancer research. Journal of Experimental & Clinical Cancer Research, 2021. 40: p. 1-17.
Zheng, H., et al., Single‐cell analysis reveals cancer stem cell heterogeneity in hepatocellular carcinoma. Hepatology, 2018. 68(1): p. 127-140.
Lawson, D.A., et al., Single-cell analysis reveals a stem-cell program in human metastatic breast cancer cells. Nature, 2015. 526(7571): p. 131-135.
Chen, G., B. Ning, and T. Shi, Single-cell RNA-seq technologies and related computational data analysis. Frontiers in genetics, 2019. 10: p. 317.
Azizi, E., et al., Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell, 2018. 174(5): p. 1293-1308. e36.
Ren, X., et al., Insights gained from single-cell analysis of immune cells in the tumor microenvironment. Annual review of immunology, 2021. 39(1): p. 583-609.
Zhang, Y., et al., Single-cell analyses of renal cell cancers reveal insights into tumor microenvironment, cell of origin, and therapy response. Proceedings of the National Academy of Sciences, 2021. 118(24): p. e2103240118.
Abdelfattah, N., et al., Single-cell analysis of human glioma and immune cells identifies S100A4 as an immunotherapy target. Nature communications, 2022. 13(1): p. 767.
Su, Y., et al., Single-cell analysis resolves the cell state transition and signaling dynamics associated with melanoma drug-induced resistance. Proceedings of the National Academy of Sciences, 2017. 114(52): p. 13679-13684.
Reyfman, P.A., et al., Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis. American journal of respiratory and critical care medicine, 2019. 199(12): p. 1517-1536.
Sahai, E., et al., A framework for advancing our understanding of cancer- associated fibroblasts. Nature Reviews Cancer, 2020. 20(3): p. 174-186.
Xing, F., J. Saidou, and K. Watabe, Cancer associated fibroblasts (CAFs) in tumor microenvironment. Frontiers in bioscience: a journal and virtual library, 2010. 15: p. 166.
Luo, H., et al., Pan-cancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nature communications, 2022. 13(1): p. 6619.
Lavie, D., et al., Cancer-associated fibroblasts in the single-cell era. Nature cancer, 2022. 3(7): p. 793-807.
Ren, X., B. Kang, and Z. Zhang, Understanding tumor ecosystems by single-cell sequencing: promises and limitations. Genome biology, 2018. 19(1): p. 211.
Yuan, G.-C., et al., Challenges and emerging directions in single-cell analysis. Genome biology, 2017. 18: p. 1-8.
Hicks, S.C., et al., Missing data and technical variability in single-cell RNA- sequencing experiments. Biostatistics, 2018. 19(4): p. 562-578.
Fan, J., K. Slowikowski, and F. Zhang, Single-cell transcriptomics in cancer: computational challenges and opportunities. Experimental & Molecular Medicine, 2020. 52(9): p. 1452-1465.
Heumos, L., et al., Best practices for single-cell analysis across modalities. Nature Reviews Genetics, 2023. 24(8): p. 550-572.
Lun, A.T., D.J. McCarthy, and J.C. Marioni, A step-by-step workflow for low- level analysis of single-cell RNA-seq data with Bioconductor. F1000Research, 2016. 5.
Amezquita, R.A., et al., Orchestrating single-cell analysis with Bioconductor. Nature methods, 2020. 17(2): p. 137-145.
Hao, Y., et al., Integrated analysis of multimodal single-cell data. Cell, 2021. 184(13): p. 3573-3587. e29.
Hao, Y., et al., Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature biotechnology, 2024. 42(2): p. 293-304.
Wolf, F.A., P. Angerer, and F.J. Theis, SCANPY: large-scale single-cell gene expression data analysis. Genome biology, 2018. 19: p. 1-5.
Korsunsky, I., et al., Fast, sensitive and accurate integration of single-cell data with Harmony. Nature methods, 2019. 16(12): p. 1289-1296.
Lopez, R., et al., Deep generative modeling for single-cell transcriptomics. Nature methods, 2018. 15(12): p. 1053-1058.
Blondel, V.D., et al., Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008. 2008(10): p. P10008.
Traag, V.A., L. Waltman, and N.J. Van Eck, From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 2019. 9(1): p. 1-12.
Franzén, O., L.-M. Gan, and J.L. Björkegren, PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database, 2019. 2019: p. baz046.
Domínguez Conde, C., et al., Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science, 2022. 376(6594): p. eabl5197.
Nofech-Mozes, I., et al., Pan-cancer classification of single cells in the tumour microenvironment. Nature Communications, 2023. 14(1): p. 1615.
Shao, X., et al., scDeepSort: a pre-trained cell-type annotation method for single- cell transcriptomics using deep learning with a weighted graph neural network. Nucleic acids research, 2021. 49(21): p. e122-e122.
Yang, F., et al., scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence, 2022 .4(10): p. 852-866.
Cui, H., et al., scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 2024: p. 1-11.
Ke, G., et al., Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 2017. 30.
Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
Doersch, C., Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016.
Grønbech, C.H., et al., scVAE: variational auto-encoders for single-cell gene expression data. Bioinformatics, 2020. 36(16): p. 4415-4422.
Wang, D. and J. Gu, VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder. Genomics, Proteomics and Bioinformatics, 2018. 16(5): p. 320-331.
Wu, L., et al. Graph neural networks: foundation, frontiers and applications. in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.
Zhou, J., et al., Graph neural networks: A review of methods and applications. AI open, 2020. 1: p. 57-81.
Scarselli, F., et al., The graph neural network model. IEEE transactions on neural networks, 2008. 20(1): p. 61-80.
Wang, J., et al., scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nature communications, 2021. 12(1): p. 1882.
Im Im, D., et al. Denoising criterion for variational auto-encoding framework. in Proceedings of the AAAI conference on artificial intelligence. 2017.
Virshup, I., et al., anndata: Access and store annotated data matrices. Journal of Open Source Software, 2024. 9(101): p. 4371.
LemaÃŽtre, G., F. Nogueira, and C.K. Aridas, Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of machine learning research, 2017. 18(17): p. 1-5.
Akiba, T., et al. Optuna: A next-generation hyperparameter optimization framework. in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.
Oord, A.v.d., Y. Li, and O. Vinyals, Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
Odaibo, S., Tutorial: Deriving the standard variational autoencoder (vae) loss function. arXiv preprint arXiv:1907.08956, 2019.
Zhuang, Z., et al., Understanding adamw through proximal methods and scale- freeness. Transactions on machine learning research, 2022.
Clough, E. and T. Barrett, The gene expression omnibus database. Statistical Genomics: Methods and Protocols, 2016: p. 93-110.
Wen, L., et al., Single-cell technologies: From research to application. The Innovation, 2022. 3(6).
Wolock, S.L., R. Lopez, and A.M. Klein, Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell systems, 2019. 8(4): p. 281-291. e9.
Peterson, L.E., K-nearest neighbor. Scholarpedia, 2009. 4(2): p. 1883.
Kim, T.K., T test as a parametric statistic. Korean journal of anesthesiology, 2015. 68(6): p. 540-546.
Veličković, P., et al., Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
Xu, J., et al. Reluplex made more practical: Leaky ReLU. in 2020 IEEE Symposium on Computers and communications (ISCC). 2020. IEEE.
Wilcoxon, F., S. Katti, and R.A. Wilcox, Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics, 1970. 1: p. 171-259.
Aran, D., et al., Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature immunology, 2019. 20(2): p. 163-172.
De Kanter, J.K., et al., CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic acids research, 2019. 47(16): p. e95-e95.
Gao, R., et al., Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nature biotechnology, 2021. 39(5): p. 599-608.
Chen, X., et al., A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data. bioRxiv, 2024: p. 2024.09.09.612120.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98038-
dc.description.abstract單細胞RNA定序技術已成為解析細胞異質性及RNA轉錄本複雜性的變革性工具。透過提供高解析度的轉錄組資料,顯著地提升了人類對細胞生物學的理解。然而,目前單細胞RNA定序資料的細胞類型標註仍主要依賴傳統的手動方法。此類方法耗時、操作繁瑣且易受研究者的主觀認知影響。這些限制突顯了開發自動化、高效且準確的標註方法之重要性,以應對日益增長的單細胞RNA定序分析需求。
本研究提出了一個整合式框架,結合了輕型梯度提升機與自注意力圖神經網路,能自動的對單細胞RNA定序資料進行細胞類型標註,並展現了高準確性與優秀的計算效率。具體來說,本研究從Gene Expression Omnibus(GEO)資料庫提取單細胞RNA定序資料集,經標準化預處理(如去除低質量細胞與基因、篩選高度變異基因、資料正規化)以提升資料品質。接著,應用結合對比學習的降噪變分自動編碼器提取潛在特徵,並結合差異基因表達分析中識別出的標記基因進行特徵融合。融合後的特徵將做為輕型梯度提升機的輸入以生成初步的細胞類型預測,隨後再利用自注意力圖神經網路捕捉細胞間的關聯訊息,進一步優化預測結果。
實驗結果顯示,所提出的整合框架在五個獨立測試資料集中,均能準確地標註主要的免疫和非免疫細胞類型,平均準確率達到0.962,優於大多數的自動化標註工具。在癌症資料集中,針對較難標註的非免疫細胞部分,所提出的整合框架的預測表現尤為突出,平均準確率達到0.951,顯著超越了過往的標註工具(最高者為0.901)。值得注意的是,此整合框架能精準分類七種癌症相關纖維母細胞亞型。這些細胞對腫瘤增殖、轉移和免疫逃逸具有重要作用,但長期以來因為在分類上極具挑戰性,鮮少有標註工具能成功進行分類。此外,所提出的整合框架因主要使用輕型梯度提升機模型,在訓練效率上相對優於類似的自動標註工具。
總的來說,此整合框架為單細胞RNA定序資料提供了一個穩定、高效且高準確率的細胞類型標註工具,並在多個GEO資料集中展示了出色的跨資料集泛化能力。此整合框架作為單細胞分析領域的應用工具,可望為癌症研究與精準醫學的發展提供幫助。
zh_TW
dc.description.abstractSingle-cell RNA sequencing (scRNA-seq) technology has emerged as a transformative tool for investigating cellular heterogeneity and the intricate complexity of RNA transcripts at the single-cell level. By providing high-resolution transcriptomic data, scRNA-seq has significantly advanced our understanding of cellular biology. However, cell type annotation in scRNA-seq data predominantly relies on traditional manual methods, which are time-consuming, labor-intensive, and prone to researcher bias. These limitations underscore the need for automated, efficient, and accurate annotation approaches to meet the growing demands of single-cell analysis.
This study introduces LightGBM-GAT, an integrated framework that combines Light Gradient Boosting Machine (LightGBM) and Graph Attention Network (GAT) to automatically annotate cell types in scRNA-seq data with high accuracy and computational efficiency. Specifically, scRNA-seq datasets from the Gene Expression Omnibus (GEO) database undergo standardized preprocessing, including low-quality cell removal and the selection of highly variable genes. Latent features are extracted using a Contrastive Denoising Variational Autoencoder (C-DVAE) with a contrastive learning setting and enriched with marker genes identified through differential gene expression analysis. The fused feature set is used as input to LightGBM to generate initial cell-type annotation predictions. These predictions are then refined by GAT, which incorporates cell-to-cell relational information to produce final annotations.
Experimental results demonstrate that LightGBM-GAT framework effectively annotates major immune and non-immune cell types in five independent testing datasets with an average accuracy of 0.962, outperforming most prior studies. In cancer datasets, where non-immune cells are particularly challenging to annotate accurately, the framework achieves an impressive average accuracy of 0.961, significantly surpassing previous benchmarks (highest of 0.901). Notably, the framework successfully identifies seven subtypes of cancer-associated fibroblasts (CAFs), which are crucial for tumor proliferation, metastasis, and immune evasion but notoriously difficult to classify. Furthermore, the use of LightGBM as the classification model enables the integrated framework to demonstrate superior training efficiency compared to similar automated annotation tools.
Overall, the LightGBM-GAT provides a robust, efficient, and highly accurate tool for cell-type annotation in scRNA-seq data across diverse GEO datasets. Its strong cross-dataset generalization capability positions it as a valuable support tool for advancing cancer research and precision medicine in single-cell analysis.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-23T16:33:09Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-07-23T16:33:09Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝 ii
摘要 iv
Abstract v
Contents vii
List of Abbreviations xii
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Single Cell Analysis in Cancer Biology 1
1.1.1 Dissecting the Tumor Microenvironment with Single-Cell Analysis 2
1.1.2 Applications of Single Cell Analysis in Cancer Research 3
1.1.3 Cancer-Associated Fibroblasts and Tumor Progression 5
1.1.4 Challenges and Future Directions in Single Cell Analysis of Cancer 7
1.2 Single Cell Analysis 9
1.2.1 Quality Filtering and Noise Removal 9
1.2.2 Data Normalization and Scaling 10
1.2.3 Addressing Batch Effects and Technical Variability 11
1.2.4 Feature Selection and Dimensionality Reduction 11
1.2.5 Cell Clustering and Annotation 12
1.2.6 Downstream Analysis 13
1.3 Artificial Intelligence in Single Cell Analysis 14
1.3.1 Deep Learning and Machine Learning Models for Cell Type Classification 14
1.3.2 Leveraging Tree-Based Models for Feature Selection and Classification 16
1.3.3 Leveraging Variational Autoencoder for Feature Extraction 18
1.3.4 Exploring Cellular Relationships with Graph Neural Networks (GNNs) 20
1.4 Motivation 22
1.5 Specific Aims 24
Chapter 2 Materials and Methods 27
2.1 General Description of the LightGBM-GAT Framework 27
2.2 Framework Overview 29
2.2.1 LightGBM-GAT Framework Architecture and Key Components 29
2.2.2 Data Integration and Sampling Strategy 31
2.2.3 Feature Fusion Strategy in LightGBM-GAT Framework 33
2.2.4 Hyperparameter Optimization in LightGBM-GAT Framework 35
2.2.5 Hardware and Software 37
2.3 Materials 39
2.3.1 Gene Expression Omnibus 39
2.3.2 Data Integration for Training Dataset 40
2.3.3 Independent Testing Dataset 42
2.3.4 Data Specific to CAFs 44
2.3.5 PanglaoDB 46
2.4 Single Cell Analysis Packages 48
2.4.1 Scanpy 48
2.4.2 Scrublet 49
2.4.3 Harmony 50
2.4.4 Celltypist 52
2.5 Single Cell Data Preprocessing 53
2.5.1 Initial Quality Assessment 53
2.5.2 Doublet Detection and Removal 55
2.5.3 Normalization and Scaling 57
2.5.4 Identification of Highly Variable Genes (HVGs) 58
2.5.5 Feature Selection in Differential Gene Expression (DGE) 59
2.5.6 Batch Effect Correction with Harmony 60
2.5.7 Cell Clustering with Leiden Algorithm 62
2.5.8 Refining Cell Type Annotations 63
2.6 LightGBM in the LightGBM-GAT Framework 64
2.6.1 LightGBM: A Leaf-Wise Gradient Boosting Decision Tree Model for High-Dimensional scRNA-seq Data 64
2.7 Neural Network Modules in the LightGBM-GAT Framework 67
2.7.1 C-DVAE: Contrastive Denoising Variational Autoencoder for Robust Feature Extraction 67
2.7.2 GAT: Cell Type Refinement in the LightGBM-GAT Framework 70
Chapter 3 Results 75
3.1 Results of Data Preprocessing 75
3.1.1 Results of Quality Control 75
3.1.2 Results of Doublet Analysis 77
3.2 Batch Effect Correction 81
3.2.1 Batch Effect Correction in Training Dataset 81
3.2.2 Batch Effect Correction in Testing Datasets 82
3.3 Gene Ranking and Cell Type Annotation Results 84
3.3.1 Gene Ranking Results for Training Dataset 84
3.3.2 Leiden Clustering Results for Testing Datasets 86
3.3.3 Final Cell Type Annotation for Testing Dataset 88
3.4 Latent Feature Representation with Contrastive DVAE 91
3.4.1 Latent Feature visualization 91
3.4.2 Training Dynamics of Contrastive DVAE 93
3.4.3 Comparison of Contrastive DVAE implements 94
3.5 Cell Type Initial Prediction with LightGBM 97
3.5.1 Performance of LightGBM 97
3.5.2 Feature Importance Analysis 98
3.6 Results of GAT Refinement 100
3.6.1 Refinement Performance 100
3.6.2 Training Dynamics of GAT 101
3.7 LightGBM-GAT Specific Performance 102
3.7.1 Performance on Cell Subtype 102
3.7.2 CAF-Specific Performance 106
3.7.3 Independent Validation of LightGBM-GAT on CAF Subtype Prediction 110
3.8 Comparison with Existing Annotation Methods 112
3.8.1 Summary for Comparative Performance 112
3.8.2 Comparative Details on Independent Testing Datasets 116
3.8.3 Computational Efficiency Comparison 129
Chapter 4 Discussion 133
4.1.1 Characteristics of LightGBM-GAT 133
4.1.2 Feature Extraction Optimization and the Development of C-DVAE 134
4.1.3 Evaluating the Computational Trade-Off of Integrating C-DVAE 136
4.1.4 Evaluating the Performance and Efficiency of LightGBM-GAT for CAF Classification 138
4.1.5 Comparison with Existing Methods 140
4.1.6 Limitations 141
4.1.7 Framework Extension 142
4.1.8 Application in Cancer Research 144
Chapter 5 Conclusion 145
References 147
-
dc.language.isozh_TW-
dc.subject細胞類型註釋zh_TW
dc.subject圖神經網絡zh_TW
dc.subject癌症相關纖維母細胞zh_TW
dc.subject輕型梯度提升機zh_TW
dc.subject單細胞RNA測序zh_TW
dc.subject降噪變分自動編碼器zh_TW
dc.subject自注意力機制zh_TW
dc.subject對比式學習zh_TW
dc.subjectGraph Neural Networksen
dc.subjectCancer-associated Fibroblastsen
dc.subjectCell Type Annotationen
dc.subjectSelf-attention Mechanismen
dc.subjectContrastive Learningen
dc.subjectDenoising Variational Autoencoderen
dc.subjectSingle-cell RNA Sequencingen
dc.subjectLightGBMen
dc.title基於自注意力圖神經網路與輕型梯度提升機的單細胞RNA定序細胞類型註釋之整合框架開發zh_TW
dc.titleLightGBM-GAT: An Integrative Framework for Single-Cell RNA-Seq Annotation Tasksen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.coadvisor陳翔瀚zh_TW
dc.contributor.coadvisorHsiang-Han Chenen
dc.contributor.oralexamcommittee賴亮全;李建樂;陳弘昕zh_TW
dc.contributor.oralexamcommitteeLiang-Chuan Lai;Chien-Yueh Lee;Hung-Hsin Chenen
dc.subject.keyword單細胞RNA測序,輕型梯度提升機,圖神經網絡,降噪變分自動編碼器,對比式學習,自注意力機制,細胞類型註釋,癌症相關纖維母細胞,zh_TW
dc.subject.keywordSingle-cell RNA Sequencing,LightGBM,Graph Neural Networks,Denoising Variational Autoencoder,Contrastive Learning,Self-attention Mechanism,Cell Type Annotation,Cancer-associated Fibroblasts,en
dc.relation.page149-
dc.identifier.doi10.6342/NTU202501699-
dc.rights.note未授權-
dc.date.accepted2025-07-16-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept生醫電子與資訊學研究所-
dc.date.embargo-liftN/A-
顯示於系所單位:生醫電子與資訊學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
23.55 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved