請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99669| 標題: | 從偏誤標記中實現公平且穩健之學習——最小化最大風險的統計框架 Fair-and-Robust Learning from Biased Labels: A Minimax Statistical Framework |
| 作者: | 劉衡謙 Heng-Chien Liou |
| 指導教授: | 王奕翔 I-Hsiang Wang |
| 關鍵字: | 機器學習,統計學習,公平性,穩健性,標記噪音, Machine Learning,Statistical Learning,Fairness,Robustness,Label Noise, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著機器學習應用日益拓展至高風險領域,確保其可信度與可問責性——特別是在種族、性別與身心障礙等敏感屬性上的公平性——已成為一項關鍵議題。儘管已有大量文獻探討演算法公平性,多數方法仍仰賴可能存在偏誤的資料進行模型效能評估,從而導致公平性與準確率之間的不可避免取捨。本研究藉由明確建模資料偏誤的生成過程,假設觀察資料原先來自一理想且無偏的分布,後受到與群體(group)及類別(class)標記相關的噪音所污染。我們主張,應在無偏分布下評估預測器表現,而非僅針對觀察到的偏誤資料進行準確率最佳化,以回應公平決策所依循的倫理承諾。為因應未知的標記偏誤率,我們提出一套最小化最大風險的學習架構,將傳統風險函數替換為新穎的「公平且穩健風險(FaR risk)」。該風險同時納入公平性約束,並對最壞情境下的偏誤具備抵抗能力。在假設真實分布中群體與標記類別相互獨立的前提下,我們證明 FaR risk 可精確地分解為在偏誤資料下的風險函數加上一修正項,後者綜合反映了公平性與穩健性。基於此理論基礎,我們進一步設計兩種適用於有限樣本情境的高效率資料導向演算法:第一為前處理方法,當某一群體受到偏誤影響顯著高於其他群體時,可透過簡單統計量估計最適重加權係數;第二為內處理方法,將 FaR risk 表示為一個平滑的鞍點問題,並以 Saddle Point Mirror Prox 演算法求解。我們在理論上將所提出的方法與兩種基準模型進行比較:其一為僅使用偏誤標記進行訓練的預測器,其二為可直接存取理想無偏分布的預測器,此比較為後續建立泛化能力與公平性保證奠定基礎。實驗方面,我們於合成、半合成與真實資料集上進行評估,結果顯示在無偏分布下的評估標準中,所提方法在準確率與公平性皆優於未加限制的訓練策略,且其表現可與現有公平性方法相當甚至更為出色。本研究提出一套具普遍性且具理論基礎的統計學習框架,專為處理偏誤標記下的公平學習問題設計,並進一步挑戰傳統對於公平性與準確率之間不可避免取捨的既定觀點。 As machine learning applications are increasingly deployed in high-stakes domains, ensuring their trustworthiness and accountability—particularly fairness to sensitive attributes such as race, gender, and disability status—is critical. Although a large body of work has addressed algorithmic fairness, most approaches evaluate performance on potentially biased datasets, often yielding an undesirable fairness–accuracy trade-off. In this work, we explicitly model data bias by assuming that examples are drawn from an ideal, unbiased distribution but are then corrupted by group-dependent, class-conditional label noise. Rather than optimizing accuracy on the observed biased data, we evaluate the performance of predictors on the underlying unbiased distribution, aligning with ethical commitments to fair decision-making. To account for unknown noise rates, we introduce a minimax formulation that replaces the standard risk with a novel fair-and-robust (FaR) risk, which simultaneously enforces fairness constraints and guards against worst-case noise scenarios. Under a group–label independence assumption on the true distribution, we show that the FaR risk decomposes exactly into the observed risk plus a regularization term that captures both fairness and robustness. Building on this insight, we develop two efficient, data-driven algorithms for finite samples—first, a pre-processing intervention that estimates optimal reweighting factors via simple statistics when one group's labels are predominantly noisy; and second, an in-processing intervention that solves the FaR risk minimax problem via a smooth saddle-point representation. We theoretically compare our minimax solution against two baselines—a predictor trained directly on biased labels and an oracle with access to the unbiased distribution—laying the groundwork for future bounds on generalization and fairness guarantees. Empirically, on synthetic, semi-synthetic, and real-world datasets, we demonstrate that our methods improve both accuracy and fairness over unconstrained training and perform comparably to or better than existing fairness interventions when evaluated on the unbiased distribution. This work thus offers a general, principled framework for fair learning under biased labels and challenges the presumed inevitability of fundamental fairness–accuracy trade-offs. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99669 |
| DOI: | 10.6342/NTU202503227 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 14.69 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
