請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100942| 標題: | 利用深度學習透過質譜嵌入和分子化學潛在空間加速對全氟烷基與多氟烷基物質的非靶向註釋研究 DeePFAS: Deep Learning-Enabled Rapid Annotation of PFAS for Enhancing Non-Targeted Screening through Spectral Encoding and Latent Space Analysis |
| 作者: | 王恆 Heng Wang |
| 指導教授: | 曾宇鳳 Yufeng Jane Tseng |
| 關鍵字: | 全氟與多氟烷基物質,深度學習液相層析串聯質譜非目標 PFAS 篩檢化學潛在空間 PFAS (Per- and Polyfluoroalkyl Substances),deep learningunsupervised learningLC-HRMSNTS (non-targeted screening)chemical latent space |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 對所有 PFAS(全氟與多氟烷基物質)化合物進行全面偵測在分析上仍是一項重大挑戰,其原因包括:化學結構多樣性高、可取得的參考標準品有限、環境與生物樣本基質複雜,以及需依賴高靈敏度儀器以定量極微量濃度。這些挑戰進一步受到背景污染風險與 PFAS 化合物種類龐大的加劇,使得開發一套能夠廣泛應用於各類環境樣本的通用檢測方法變得極為困難。
目前液相色譜-高分辨率質譜法(LC-HRMS)是 PFAS 分析中最主要的技術,因其能夠在水體、土壤與生物組織等複雜基質中檢測多種 PFAS 化合物,並被各國監管機構廣泛採用於法規監測。然而,LC-HRMS 分析流程仍面臨多項限制,包括易受污染、樣品前處理需求嚴謹,以及需達成極低的偵測極限。此外,數據處理階段亦相當繁瑣且耗時,需仰賴進階的計算工具與特定領域的專業知識,才能準確區分化學結構相近的 PFAS 化合物。 為了克服上述限制,我們提出 DeePFAS,一種基於深度學習的新穎方法,可用於快速註解 PFAS 化合物。DeePFAS 結合卷積神經網路(CNN)與 Transformer 架構之光譜編碼器,將原始 MS2 光譜投影至一個捕捉化學結構資訊的潛在特徵空間。該潛在表示來自於在大量未標註化合物上透過無監督學習所訓練的特徵萃取模型。透過比對光譜嵌入向量與多個候選分子的嵌入表示,模型可推斷其結構相似性,從而實現於大規模非目標 PFAS 篩檢中對MS2 質譜快速註解。此方法可大幅降低分析流程的複雜度,並提升 PFAS 鑑定工作流程的可擴展性與效率。該方法已公開於 https://github.com/CMDM-Lab/DeePFAS。 Comprehensively detecting all PFAS compounds remains a considerable analytical challenge due to their structural diversity, the limited availability of reference standards, the complexity of environmental and biological sample matrices, and the need for highly sensitive instrumentation capable of quantifying ultra-trace concentrations. These challenges are further exacerbated by the risk of background contamination and the vast number of PFAS substances, making developing a single, universal detection method applicable across diverse environmental contexts extremely difficult. Liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is currently the predominant technique for PFAS analysis, as it enables the detection of a broad spectrum of compounds within complex matrices such as water, soil, and biological tissues, and is widely adopted by regulatory agencies for compliance monitoring. However, LC-HRMS workflows face several limitations, including susceptibility to contamination, the need for meticulous sample preparation, and the stringent requirements for low detection limits. Furthermore, the data processing stage is labor-intensive and time-consuming, requiring advanced computational tools and domain-specific expertise to accurately resolve structurally similar PFAS compounds. To overcome these limitations, we introduce DeePFAS, a novel deep learning-based approach for rapidly annotating PFAS compounds. DeePFAS employs a spectral encoder integrating convolutional neural networks (CNNs) and transformer architectures to project raw MS2 spectra into a latent feature space that captures structural information. This latent representation is learned through unsupervised training on a large corpus of unlabeled compounds. The model infers structural similarity by comparing the spectral embeddings against those of multiple candidate molecules, facilitating efficient MS2 spectra annotation in large-scale, non-targeted PFAS screening. This approach substantially reduces analytical complexity and enhances the scalability and efficiency of PFAS identification workflows. The implementation is available at https://github.com/CMDM-Lab/DeePFAS. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100942 |
| DOI: | 10.6342/NTU202504611 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2025-11-27 |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-114-1.pdf | 2.83 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
