機器學習應用於土壤氣體抽除法污染物濃度之預測

吳立芹; Li-Chin Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98345

標題:	機器學習應用於土壤氣體抽除法污染物濃度之預測 Application of Machine Learning for Predicting Pollutant Concentration in Soil Vapor Extraction Processes
作者:	吳立芹 Li-Chin Wu
指導教授:	駱尚廉 Shang-Lien Lo
關鍵字:	機器學習,支持向量機,隨機森林,土壤整治,土壤氣體抽除法, Machine Learning,Support Vector Machine,Random Forest,Soil Remediation,Soil Vapor Extraction,
出版年 :	2025
學位:	碩士
摘要:	本研究旨在將機器學習應用於污染場址中土壤氣體抽除法之效能評估，建立一套可預測經由土壤氣體抽除法整治後殘留濃度之模型，以協助整治策略判斷與污染場址管理。研究中分別以隨機森林(Random Forest，RF)、XGBoost(eXtreme Gradient Boosting)與支持向量機(Support Vector Machine，SVM)進行回歸及分類預測，並藉由SHAP(Shapley Additive Explanations)分析與皮爾遜相關係數評估各特徵對模型預測結果之貢獻與影響回歸模型方面，經參數調整後以XGBoost模型表現最佳，R2達0.86，RMSE為137.73ppm，整體的表現都比隨機森林優秀。而特徵分析指出，整治總時長及鄰近觀測井水位都對模型預測具關鍵影響。而在分類模型方面，針對污染濃度設定為150ppm及750ppm分類門檻去進行預測。模型以150ppm為分類門檻時準確率達83.3%，雖略低於750ppm之模型，但整體而言更具穩定性及實用性。三種模型表現都在中上程度但還有進步的空間，根據研究後得出改進方法，在回歸模型方面，需要收集更多特徵，如土壤孔隙率、粒徑及天然有機質含量等影響土壤整治關鍵因素。在分類方面有較嚴重的數據分佈不均問題，導致分類成效不佳，不論在回歸或分類方面，都需要增加更多的數據量，供模型更好地學習與判斷。 This study aims to apply machine learning techniques to evaluate the effectiveness of soil vapor extraction (SVE) at contaminated sites. A predictive model was developed to estimate residual contaminant concentrations after remediation via SVE, with the goal of supporting strategic decision-making and site management. Three machine learning models—Random Forest, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM)—were used to perform both regression and classification tasks. SHAP (SHapley Additive Explanations) analysis and Pearson correlation coefficients were utilized to assess the contribution and influence of each feature on the model's predictive performance. For the regression task, after parameter optimization, the XGBoost model demonstrated the best performance, achieving an R² of 0.86 and an RMSE of 137.73ppm, outperforming the Random Forest model. Feature analysis indicated that total remediation duration and groundwater levels in nearby monitoring wells were key factors influencing model predictions. In the classification task, concentration thresholds of 150 ppm and 750 ppm were used to evaluate model performance. When using 150 ppm as the classification threshold, the model achieved an accuracy of 83.3%. Although slightly lower than the performance under the 750 ppm threshold, the 150 ppm model exhibited greater overall stability and practical applicability. All three models demonstrated moderate to strong performance, though there remains room for improvement. Based on the findings, future enhancement strategies are proposed. For regression tasks, incorporating additional features—such as soil porosity, particle size distribution, and natural organic matter content—could improve model accuracy, as these are critical factors affecting SVE performance. In classification tasks, imbalanced data distribution was identified as a major limitation, leading to suboptimal classification accuracy. Therefore, increasing the volume and balance of training data is essential for improving model learning and judgment in both regression and classification applications.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98345
DOI:	10.6342/NTU202502204
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	環境工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	8.35 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。