Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90028
Title: | 以機器學習方法預測颱風生成及其SHAP詮釋 Prediction of Tropical Cyclogenesis Based on Machine Learning Methods and its SHAP interpretation |
Authors: | 呂智樂 Loi Chi Lok |
Advisor: | 吳俊傑 Chun-Chieh Wu |
Co-Advisor: | 梁禹喬 Yu-Chiao Liang |
Keyword: | 熱帶氣旋,熱帶氣旋生成,機器學習,SHAP值, Tropical Cyclones,Tropical Cyclone Genesis,Machine Learning,SHAP values, |
Publication Year : | 2023 |
Degree: | 碩士 |
Abstract: | 摘要
由於缺少統一的理論,預測熱帶氣旋生成一直都是相當困難的研究議題。目前實作主要用動力模式預測熱帶氣旋生成,但機器學習方式最近被提出可作為低成本之替代品,能活用大量再分析資料。這份研究用再分析資料中的大氣及海洋變數,訓練了隨機森林、支持向量機、和神經網絡三個機器學習模型,以預測24小時內熱帶擾動生成能否發展為熱帶氣旋。機器學習模型總體表現不俗,f1-分數達0.8,可比擬前人研究。召回率(約0.9)普遍比精確率(約0.7)高。作業用分析資料則進一步用來測試模型實用性。 其後,SHAP值分析發現中層(500百帕)渦度是影響熱帶氣旋在24小時內生成的最關鍵因素。風切及渦管傾斜也有一定重要性。敏感度測試確認了中層渦度及傾斜比起低層的更重要。此結果鼓勵更多物理模式實驗探討中層動力如何引致熱帶氣旋生成。SHAP值也增加了機器學習模型的可解釋性。本研究以颱風哈隆為例,展示各變數對其生成預測機率之影響。如此可以增加機器學習模型的可靠度,並提升熱帶氣旋生成預警之準確度。 最後,本論文提出目前以機器學習方式預測熱帶氣旋生成的一些問題。其中之一為:忽略熱帶擾動於預測期間外生成的樣本。同時,亦提出針對各問題未來研究的可改善方向。 Abstract Predicting Tropical Cyclone Genesis (TCG) events has been a challenging research topic due to a lack of conclusive theory which unifies different hypotheses about TCG mechanisms. In practice, dynamical models are used to forecast TCG occurrence, but given some of its limitations in recent years machine learning has been proposed as an alternative low-cost approach that can utilize the abundance of reanalysis data. In this study, we attempt to train three machine learning models with varying complexity: Random Forest, Support Vector Machine, and Artificial Neural Network, by feeding various atmospheric and oceanic, dynamic and thermodynamic variables extracted from reanalysis data, to predict cyclogenesis at a forecast lead time of 24 hours for candidate tropical disturbances, identified by an optimized Kalman Filter algorithm. The overall performance is competent in terms of the f1-scores (~0.8) compared to previous researches of the same kind, with recalls (~0.9) generally higher than precisions (~0.7). Operational analysis data is used to further verify the practicality of the models. An assessment by SHapley Additive exPlanations (SHAP) values reveals that mid-level (500 hPa) vorticity is the most influential factor in deriving the genesis probability at the lead time of 24 hours. Wind shear and tilting are found to possess a considerable level of importance as well. A sensitivity test is done to reaffirm the role of mid-level vorticity and tilting compared to the lower-level ones. These results encourage further experiments that use physical models to explore the dynamical, mid-level pathway to TCG. Nevertheless, some of the thermodynamic variables are also influential, with outer core humidity becoming significant when the forecast lead time is changed to 48 hours. Another usage of SHAP values in this work is providing extra interpretability for the machine learning models, by listing out the contribution of each feature to the output genesis probability, illustrated by a case study of Typhoon Halong. This increases their reliability and forecasters can take advantage of such information to issue tropical cyclone formation warnings more accurately. Finally, several caveats of current machine learning applications in TCG, including this work, are discussed. One of the main problems is the negligence of presumably negative samples from developing tropical disturbances that only reaches tropical cyclone status long after the required forecast lead time. Several potential improvements for future research are suggested correspondingly. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90028 |
DOI: | 10.6342/NTU202303630 |
Fulltext Rights: | 同意授權(全球公開) |
Appears in Collections: | 大氣科學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-111-2.pdf | 2.25 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.