請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99220| 標題: | 基於資訊理論與相關性準則之兩階段確定性演算法以 用於貝氏網路結構學習 A Two-Phase Deterministic Algorithm for Bayesian Network Structure Learning Using Information-Theoretic and Correlation-Based Criteria |
| 作者: | 支昱丹 Yu-Tan Chih |
| 指導教授: | 藍俊宏 Jakey Blue |
| 關鍵字: | 因果推論,貝氏網路結構學習,相互資訊,偏相關係數,BIC 準則, Causal-Effect Inference,Bayesian Network Structure Learning,Mutual Information,Partial Correlation Coefficient,Bayesian Information Criterion (BIC), |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 本論文旨在解決貝氏網路結構學習中準確性不足、重現性差及高維度資料下學習效率低落等挑戰。貝氏網路因具備直觀的可視化能力與強大的機率推理特性,廣泛應用於機器故障診斷、生物資訊、可解釋性人工智慧等領域。然而,其結構學習屬於NP-hard問題,面對高維資料與有限樣本時,常產生錯誤連結或遺漏關係,加上傳統爬山演算法的非確定性特質,導致結果難以重現且解釋性不足。
為提升學習結果之穩定性與準確性,本文提出一套兩階段之確定性結構學習演算法。第一階段以資訊理論與統計關聯性為基礎,建構核心模型,藉由相互資訊、卡方p值檢定與K2分數評估,搭配d-分離原則判定因果方向,確保變數間連結具統計顯著性與推論合理性;第二階段則採用偏相關係數與BIC準則,對核心網路進行擴增與修剪,剔除冗餘連結並補足可能遺漏的因果邊,強化網路之精度與可解釋性。 本研究另設計四種演算法,分別結合不同排序策略(如p值搭配相互資訊、Spearman相關係數)與網路修正方法(如爬山演算法與偏相關係數法),並使用多組模擬資料與實際資料集(ALARM、HailFinder、Win95pts與ai4i2020)進行比較實驗。透過結構評估指標(T/R/M/F)與執行時間分析,證實本論文所提出的兩階段演算法不僅具備較高的準確率與F1分數,且在結構重現性上顯著優於傳統爬山法,展現其穩定與可靠的學習性能。 總結而言,本研究提出之方法在兼顧可解釋性與效能之下,有效改善傳統架構學習面臨的問題,對未來因果推論模型的設計與實務應用具高度潛力與貢獻。 This thesis aims to address several critical challenges in Bayesian network structure learning, including limited accuracy, poor reproducibility, and low computational efficiency in high-dimensional settings. Bayesian networks, known for their intuitive visual representation and powerful probabilistic reasoning capabilities, have been widely applied in domains such as fault diagnosis, bioinformatics, and explainable artificial intelligence. However, structure learning in Bayesian networks is an NP-hard problem; as the dimensionality increases and data samples remain limited, learning algorithms often produce spurious or missing connections. Moreover, traditional hill-climbing algorithms suffer from non-determinism, leading to inconsistent results and diminished interpretability. To enhance the stability and accuracy of structure learning, this study proposes a novel two-phase deterministic algorithm. The first phase constructs a core network model by integrating information-theoretic and statistical relevance criteria, including mutual information, chi-squared p-value tests, and K2 scores, combined with d-separation rules to determine causal directions. This ensures that identified connections are both statistically significant and theoretically interpretable. The second phase augments and prunes the core network using partial correlation coefficients and the Bayesian Information Criterion (BIC), thereby removing redundant edges and recovering potentially missing causal links to improve structural fidelity and interpretability. Furthermore, four algorithmic variants were developed by combining different edge selection strategies (e.g., p-value with mutual information, Spearman correlation) and network refinement methods (e.g., hill-climbing, partial correlation analysis). A series of benchmark experiments were conducted on both synthetic and real-world datasets (ALARM, HailFinder, Win95pts, and ai4i2020). Evaluation using structural accuracy metrics (T/R/M/F scores) and runtime analysis demonstrates that the proposed two-phase algorithm consistently outperforms conventional hill-climbing approaches in terms of accuracy, F1 score, and reproducibility. In summary, the proposed framework effectively addresses longstanding limitations in Bayesian network structure learning by balancing interpretability and performance. It offers promising potential for future development and application of causal inference models in practical and data-constrained scenarios. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99220 |
| DOI: | 10.6342/NTU202503379 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2030-08-01 |
| 顯示於系所單位: | 奈米工程與科學學位學程 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 4.43 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
