Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 生醫電子與資訊學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98038
Title: 基於自注意力圖神經網路與輕型梯度提升機的單細胞RNA定序細胞類型註釋之整合框架開發
LightGBM-GAT: An Integrative Framework for Single-Cell RNA-Seq Annotation Tasks
Authors: 莊聰賢
Tsung-Hsien Chuang
Advisor: 莊曜宇
Eric Y. Chuang
Co-Advisor: 陳翔瀚
Hsiang-Han Chen
Keyword: 單細胞RNA測序,輕型梯度提升機,圖神經網絡,降噪變分自動編碼器,對比式學習,自注意力機制,細胞類型註釋,癌症相關纖維母細胞,
Single-cell RNA Sequencing,LightGBM,Graph Neural Networks,Denoising Variational Autoencoder,Contrastive Learning,Self-attention Mechanism,Cell Type Annotation,Cancer-associated Fibroblasts,
Publication Year : 2025
Degree: 碩士
Abstract: 單細胞RNA定序技術已成為解析細胞異質性及RNA轉錄本複雜性的變革性工具。透過提供高解析度的轉錄組資料,顯著地提升了人類對細胞生物學的理解。然而,目前單細胞RNA定序資料的細胞類型標註仍主要依賴傳統的手動方法。此類方法耗時、操作繁瑣且易受研究者的主觀認知影響。這些限制突顯了開發自動化、高效且準確的標註方法之重要性,以應對日益增長的單細胞RNA定序分析需求。
本研究提出了一個整合式框架,結合了輕型梯度提升機與自注意力圖神經網路,能自動的對單細胞RNA定序資料進行細胞類型標註,並展現了高準確性與優秀的計算效率。具體來說,本研究從Gene Expression Omnibus(GEO)資料庫提取單細胞RNA定序資料集,經標準化預處理(如去除低質量細胞與基因、篩選高度變異基因、資料正規化)以提升資料品質。接著,應用結合對比學習的降噪變分自動編碼器提取潛在特徵,並結合差異基因表達分析中識別出的標記基因進行特徵融合。融合後的特徵將做為輕型梯度提升機的輸入以生成初步的細胞類型預測,隨後再利用自注意力圖神經網路捕捉細胞間的關聯訊息,進一步優化預測結果。
實驗結果顯示,所提出的整合框架在五個獨立測試資料集中,均能準確地標註主要的免疫和非免疫細胞類型,平均準確率達到0.962,優於大多數的自動化標註工具。在癌症資料集中,針對較難標註的非免疫細胞部分,所提出的整合框架的預測表現尤為突出,平均準確率達到0.951,顯著超越了過往的標註工具(最高者為0.901)。值得注意的是,此整合框架能精準分類七種癌症相關纖維母細胞亞型。這些細胞對腫瘤增殖、轉移和免疫逃逸具有重要作用,但長期以來因為在分類上極具挑戰性,鮮少有標註工具能成功進行分類。此外,所提出的整合框架因主要使用輕型梯度提升機模型,在訓練效率上相對優於類似的自動標註工具。
總的來說,此整合框架為單細胞RNA定序資料提供了一個穩定、高效且高準確率的細胞類型標註工具,並在多個GEO資料集中展示了出色的跨資料集泛化能力。此整合框架作為單細胞分析領域的應用工具,可望為癌症研究與精準醫學的發展提供幫助。
Single-cell RNA sequencing (scRNA-seq) technology has emerged as a transformative tool for investigating cellular heterogeneity and the intricate complexity of RNA transcripts at the single-cell level. By providing high-resolution transcriptomic data, scRNA-seq has significantly advanced our understanding of cellular biology. However, cell type annotation in scRNA-seq data predominantly relies on traditional manual methods, which are time-consuming, labor-intensive, and prone to researcher bias. These limitations underscore the need for automated, efficient, and accurate annotation approaches to meet the growing demands of single-cell analysis.
This study introduces LightGBM-GAT, an integrated framework that combines Light Gradient Boosting Machine (LightGBM) and Graph Attention Network (GAT) to automatically annotate cell types in scRNA-seq data with high accuracy and computational efficiency. Specifically, scRNA-seq datasets from the Gene Expression Omnibus (GEO) database undergo standardized preprocessing, including low-quality cell removal and the selection of highly variable genes. Latent features are extracted using a Contrastive Denoising Variational Autoencoder (C-DVAE) with a contrastive learning setting and enriched with marker genes identified through differential gene expression analysis. The fused feature set is used as input to LightGBM to generate initial cell-type annotation predictions. These predictions are then refined by GAT, which incorporates cell-to-cell relational information to produce final annotations.
Experimental results demonstrate that LightGBM-GAT framework effectively annotates major immune and non-immune cell types in five independent testing datasets with an average accuracy of 0.962, outperforming most prior studies. In cancer datasets, where non-immune cells are particularly challenging to annotate accurately, the framework achieves an impressive average accuracy of 0.961, significantly surpassing previous benchmarks (highest of 0.901). Notably, the framework successfully identifies seven subtypes of cancer-associated fibroblasts (CAFs), which are crucial for tumor proliferation, metastasis, and immune evasion but notoriously difficult to classify. Furthermore, the use of LightGBM as the classification model enables the integrated framework to demonstrate superior training efficiency compared to similar automated annotation tools.
Overall, the LightGBM-GAT provides a robust, efficient, and highly accurate tool for cell-type annotation in scRNA-seq data across diverse GEO datasets. Its strong cross-dataset generalization capability positions it as a valuable support tool for advancing cancer research and precision medicine in single-cell analysis.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98038
DOI: 10.6342/NTU202501699
Fulltext Rights: 未授權
metadata.dc.date.embargo-lift: N/A
Appears in Collections:生醫電子與資訊學研究所

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
23.55 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved