Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98900| Title: | 提升機器學習模型在資料稀缺下預測反應活化能及混合物性質的表現 Improving machine learning models for activation energy and mixture property prediction under data scarcity |
| Authors: | 蔡明軒 Ming-Hsuan Tsai |
| Advisor: | 李奕霈 Yi-Pei Li |
| Keyword: | 資料稀缺,差值學習,特徵工程,遷移學習,化學性質預測,燃料混合物, data scarcity,machine learning,delta learning,feature engineering,,transfer learning,chemical property prediction,fuel mixtures, |
| Publication Year : | 2025 |
| Degree: | 碩士 |
| Abstract: | 高品質化學資料取得不易,嚴重限制了機器學習模型的應用。本論文以三項策略為核心,針對「反應活化能」與「多成分燃料混合物性質」兩項代表性任務,提出一套整合式方法,展示在資料稀缺情境下仍可獲得準確預測。
特徵工程透過快速且低成本的描述子將化學知識導入模型。此策略在混合物任務中特別有效,因為分子量、柔韌度等巨觀相關特徵能大幅降低閃火點與黏度預測誤差;但在活化能任務中,只有反應熱力學描述子帶來有限改進,顯示簡單分子特徵難以捕捉過渡態細節。 遷移學習藉由大量低擬真資料預訓練模型,再微調少量高擬真實驗數據,可提升泛化能力。然而其成效高度依賴於預訓練和微調領域相似度:若預訓練資料與目標資料反應類型或混合物組成相近,誤差可大幅降低;反之則可能引入負遷移。 差值學習在兩項任務中均展現最穩定與顯著的效益。模型或預測高、低擬真值之差,或將低擬真值作為輔助輸入,以修正物理近似的系統性偏差。結果顯示,活化能模型僅用一成高階資料即可優於訓練在完整資料集上的基準模型;混合物模型達到與數據集原作者相當甚至更穩定的表現,且在推論階段幾乎無額外計算成本。 綜上所述,透過結合化學直覺與多擬真度學習技術,本研究在高階資料有限的前提下,建立了兼具精度與資料效率的模型。後續可將此框架拓展至催化活性、環境性質等領域,並結合主動學習以優先計算最具價值的高擬真樣本,或採用更先進的圖網路與自監督預訓練,進一步縮短資料與模型需求的落差,促進化學工程與永續燃料開發的研究進程。 關鍵詞:資料稀缺、差值學習、特徵工程、遷移學習、化學性質預測、燃料混合物 The scarcity of high-quality chemical data severely limits the practical applications of machine learning (ML) models. This thesis presents an integrated framework built upon three key strategies—feature engineering, transfer learning, and delta learning—to address two representative challenges: predicting reaction activation energies and the properties of multi-component fuel mixtures. We demonstrate that accurate predictions can still be achieved under data-constrained conditions. Feature engineering incorporates chemical knowledge into the model using fast and low-cost descriptors. This approach is particularly effective for mixture property prediction, where macroscopic attributes such as molecular weight and flexibility help reduce prediction errors for flash point and viscosity. However, in activation energy tasks, only thermodynamic descriptors provide limited improvements, suggesting that simple molecular features are insufficient to capture the intricacies of transition states. Transfer learning improves generalization by pretraining on abundant low-fidelity data and fine-tuning with limited high-fidelity experimental data. Its effectiveness, however, strongly depends on the similarity between pretraining and target domains: substantial error reductions are possible when reaction types or mixture compositions are closely aligned, but performance may degrade due to negative transfer if they diverge. Delta learning demonstrates the most consistent and significant benefits across both tasks. By either predicting the difference between high- and low-fidelity values or incorporating the low-fidelity estimate as an auxiliary input, the model effectively corrects systematic biases from physical approximations. In activation energy prediction, delta learning outperforms full-data baselines using only 10% of the high-level data. In mixture prediction, it achieves accuracy on par with or better than the dataset originators, with minimal additional inference cost. In summary, by combining chemical intuition with multi-fidelity learning strategies, this study establishes data-efficient models that maintain high accuracy even with limited high-level data. This framework can be extended to other domains such as catalytic activity and environmental property prediction. Future work may incorporate active learning to prioritize the computation of the most informative high-fidelity samples, or leverage more advanced graph neural networks and self-supervised pretraining to further bridge the gap between data availability and model performance, accelerating progress in chemical engineering and sustainable fuel development. Key words: data scarcity, machine learning, delta learning, feature engineering, transfer learning, chemical property prediction, fuel mixtures |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98900 |
| DOI: | 10.6342/NTU202504073 |
| Fulltext Rights: | 同意授權(限校園內公開) |
| metadata.dc.date.embargo-lift: | 2025-08-21 |
| Appears in Collections: | 化學工程學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-113-2.pdf Access limited in NTU ip range | 5.9 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
