Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99669
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王奕翔zh_TW
dc.contributor.advisorI-Hsiang Wangen
dc.contributor.author劉衡謙zh_TW
dc.contributor.authorHeng-Chien Liouen
dc.date.accessioned2025-09-17T16:19:23Z-
dc.date.available2025-09-18-
dc.date.copyright2025-09-17-
dc.date.issued2025-
dc.date.submitted2025-08-05-
dc.identifier.citation[1] J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias. In Ethics of data and analytics, pages 254–264. Auerbach Publications, 2022.
[2] S. Barocas, M. Hardt, and A. Narayanan. Fairness and machine learning: Limitations and opportunities. MIT Press, 2023.
[3] A. Bell, L. Bynum, N. Drushchak, T. Zakharchenko, L. Rosenblatt, and J. Stoyanovich. The possibility of fairness: Revisiting the impossibility theorem in practice. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 400–422, 2023.
[4] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine Learning, 79:151–175, 2010.
[5] A. Blum and K. Stangl. Recovering from biased data: Can fairness constraints improve accuracy? In 1st Symposium on Foundations of Responsible Computing, 2020.
[6] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29, 2016.
[7] Z. I. Botev and D. P. Kroese. The generalized cross entropy method, with applications to probability density estimation. Methodology and Computing in Applied Probability, 13(1):1–27, 2011.
[8] S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
[9] T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13–18. IEEE, 2009.
[10] C.-j. Chen. Catharine a. mackinnon and equality theory. In Research handbook on feminist jurisprudence, pages 44–64. Edward Elgar Publishing, 2019.
[11] E. Chzhen, C. Denis, M. Hebiri, L. Oneto, and M. Pontil. Fair regression with wasserstein barycenters. Advances in Neural Information Processing Systems, 33:7321–7331, 2020.
[12] E. Chzhen and N. Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression. The Annals of Statistics, 50(4):2416–2442, 2022.
[13] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2006.
[14] S. Dutta, D. Wei, H. Yueksel, P.-Y. Chen, S. Liu, and K. Varshney. Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In International Conference on Machine Learning, pages 2803–2813. PMLR, 2020.
[15] M. P. Friedlander and M. R. Gupta. On minimizing distortion and relative entropy. IEEE Transactions on Information Theory, 52(1):238–245, 2005.
[16] S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM, 64(4):136–143, 2021.
[17] T. L. Gouic, J.-M. Loubes, and P. Rigollet. Projection to fairness in statistical learning. arXiv preprint arXiv:2005.11720, 2020.
[18] B. Green. Escaping the impossibility of fairness: From formal to substantive algorithmic fairness. Philosophy & Technology, 35(4):90, 2022.
[19] M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 2016.
[20] P. J. Huber. A robust version of the probability ratio test. The Annals of Mathematical Statistics, pages 1753–1758, 1965.
[21] P. J. Huber and E. M. Ronchetti. Robust statistics. John Wiley & Sons Hoboken, NJ, USA, 2009.
[22] H. Jiang and O. Nachum. Identifying and correcting label bias in machine learning. In International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, 2020.
[23] F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.
[24] E. L. Lehmann and J. P. Romano. Testing statistical hypotheses. Springer, 4 edition, 2022.
[25] N. Levit and R. R. Verchick. Feminist legal theory: A primer, volume 74 of Critical America. NYU Press, 2016.
[26] A. Menon, B. Van Rooyen, C. S. Ong, and B. Williamson. Learning from corrupted binary labels via class-probability estimation. In International Conference on Machine Learning, pages 125–134. PMLR, 2015.
[27] N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari. Learning with noisy labels. Advances in Neural Information Processing Systems, 26, 2013.
[28] A. Nemirovski. Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
[29] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005.
[30] A. Nichani, H. Hsu, and H. Jeong. Can we catch the two birds of fairness and privacy? In Advances in Financial AI: Opportunities, Innovations, and Responsible AI, International Conference on Learning Representations, 2025.
[31] S. Niu, Y. Liu, J. Wang, and H. Song. A decade survey of transfer learning (2010–2020). IEEE Transactions on Artificial Intelligence, 1(2):151–166, 2021.
[32] Y. Polyanskiy and Y. Wu. Information theory: From coding to learning. Cambridge University Press, 2025.
[33] I. Redko, E. Morvant, A. Habrard, M. Sebban, and Y. Bennani. Advances in domain adaptation theory. Elsevier, 2019.
[34] C. Scott, G. Blanchard, and G. Handy. Classification with asymmetric label noise: Consistency and maximal denoising. In Conference on Learning Theory, pages 489–511. PMLR, 2013.
[35] A. D. Selbst, D. Boyd, S. A. Friedler, S. Venkatasubramanian, and J. Vertesi. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 59–68, 2019.
[36] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
[37] K. R. Vashney. Trustworthy machine learning. Independently published, 2022.
[38] R. Xian, L. Yin, and H. Zhao. Fair and optimal classification via post-processing. In Proceedings of the 40th International Conference on Machine Learning, 2023.
[39] R. Xian and H. Zhao. A unified post-processing framework for group fairness, 2024.
[40] H. Zhao and G. J. Gordon. Inherent tradeoffs in learning fair representations. Journal of Machine Learning Research, 23(57):1–26, 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99669-
dc.description.abstract隨著機器學習應用日益拓展至高風險領域,確保其可信度與可問責性——特別是在種族、性別與身心障礙等敏感屬性上的公平性——已成為一項關鍵議題。儘管已有大量文獻探討演算法公平性,多數方法仍仰賴可能存在偏誤的資料進行模型效能評估,從而導致公平性與準確率之間的不可避免取捨。本研究藉由明確建模資料偏誤的生成過程,假設觀察資料原先來自一理想且無偏的分布,後受到與群體(group)及類別(class)標記相關的噪音所污染。我們主張,應在無偏分布下評估預測器表現,而非僅針對觀察到的偏誤資料進行準確率最佳化,以回應公平決策所依循的倫理承諾。為因應未知的標記偏誤率,我們提出一套最小化最大風險的學習架構,將傳統風險函數替換為新穎的「公平且穩健風險(FaR risk)」。該風險同時納入公平性約束,並對最壞情境下的偏誤具備抵抗能力。在假設真實分布中群體與標記類別相互獨立的前提下,我們證明 FaR risk 可精確地分解為在偏誤資料下的風險函數加上一修正項,後者綜合反映了公平性與穩健性。基於此理論基礎,我們進一步設計兩種適用於有限樣本情境的高效率資料導向演算法:第一為前處理方法,當某一群體受到偏誤影響顯著高於其他群體時,可透過簡單統計量估計最適重加權係數;第二為內處理方法,將 FaR risk 表示為一個平滑的鞍點問題,並以 Saddle Point Mirror Prox 演算法求解。我們在理論上將所提出的方法與兩種基準模型進行比較:其一為僅使用偏誤標記進行訓練的預測器,其二為可直接存取理想無偏分布的預測器,此比較為後續建立泛化能力與公平性保證奠定基礎。實驗方面,我們於合成、半合成與真實資料集上進行評估,結果顯示在無偏分布下的評估標準中,所提方法在準確率與公平性皆優於未加限制的訓練策略,且其表現可與現有公平性方法相當甚至更為出色。本研究提出一套具普遍性且具理論基礎的統計學習框架,專為處理偏誤標記下的公平學習問題設計,並進一步挑戰傳統對於公平性與準確率之間不可避免取捨的既定觀點。zh_TW
dc.description.abstractAs machine learning applications are increasingly deployed in high-stakes domains, ensuring their trustworthiness and accountability—particularly fairness to sensitive attributes such as race, gender, and disability status—is critical. Although a large body of work has addressed algorithmic fairness, most approaches evaluate performance on potentially biased datasets, often yielding an undesirable fairness–accuracy trade-off. In this work, we explicitly model data bias by assuming that examples are drawn from an ideal, unbiased distribution but are then corrupted by group-dependent, class-conditional label noise. Rather than optimizing accuracy on the observed biased data, we evaluate the performance of predictors on the underlying unbiased distribution, aligning with ethical commitments to fair decision-making. To account for unknown noise rates, we introduce a minimax formulation that replaces the standard risk with a novel fair-and-robust (FaR) risk, which simultaneously enforces fairness constraints and guards against worst-case noise scenarios. Under a group–label independence assumption on the true distribution, we show that the FaR risk decomposes exactly into the observed risk plus a regularization term that captures both fairness and robustness. Building on this insight, we develop two efficient, data-driven algorithms for finite samples—first, a pre-processing intervention that estimates optimal reweighting factors via simple statistics when one group's labels are predominantly noisy; and second, an in-processing intervention that solves the FaR risk minimax problem via a smooth saddle-point representation. We theoretically compare our minimax solution against two baselines—a predictor trained directly on biased labels and an oracle with access to the unbiased distribution—laying the groundwork for future bounds on generalization and fairness guarantees. Empirically, on synthetic, semi-synthetic, and real-world datasets, we demonstrate that our methods improve both accuracy and fairness over unconstrained training and perform comparably to or better than existing fairness interventions when evaluated on the unbiased distribution. This work thus offers a general, principled framework for fair learning under biased labels and challenges the presumed inevitability of fundamental fairness–accuracy trade-offs.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-17T16:19:23Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-09-17T16:19:23Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List ofFigures xi
List ofTables xiii
Denotation xv
Chapter 1 Introduction 1
1.1 Overview of Our Contributions 4
1.2 How to Read this Thesis 5
Chapter 2 Scientific Background 7
2.1 Statistical Learning 7
2.2 Fairness in Statistical Learning 8
2.3 Limitations of Existing Approaches 11
Chapter 3 Modeling and Problem Formulations 13
3.1 Modeling Biases as Labeling Noise 14
3.2 A Minimax Formulation to Learn with Biased Data 17
3.3 Social and Ethical Implications 19
3.4 Related Works and Problems 19
Chapter 4 Theoretical Analysis of the Minimax Problem 25
4.1 Revisiting the Minimax Problem 25
4.2 Analysis of the Fair-and-Robust Risk 27
4.3 Interpreting the Fair-and-Robust Risk 30
Chapter 5 Algorithmic Interventions and Experiments 35
5.1 Pre-processing Intervention 36
5.2 In-processing Intervention 37
5.3 Experiments on Synthetic Data 38
5.4 Experiments on Real-world Data 43
5.5 Experiemens on Semi-synthetic Data 44
Chapter 6 Conclusion and Discussion 49
6.1 Main Contributions 49
6.2 Limitations 50
6.3 Future Works 51
References 53
Appendix A — On the Guarantees of the Minimax Solution 59
A.1 Comparing Risk between Predictors 59
A.2 Comparing Unfairness between Predictors 62
Appendix B — Proofs 65
Appendix C — Additional Experiments 73
-
dc.language.isoen-
dc.subject機器學習zh_TW
dc.subject統計學習zh_TW
dc.subject公平性zh_TW
dc.subject穩健性zh_TW
dc.subject標記噪音zh_TW
dc.subjectRobustnessen
dc.subjectFairnessen
dc.subjectMachine Learningen
dc.subjectStatistical Learningen
dc.subjectLabel Noiseen
dc.title從偏誤標記中實現公平且穩健之學習——最小化最大風險的統計框架zh_TW
dc.titleFair-and-Robust Learning from Biased Labels: A Minimax Statistical Frameworken
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳尚澤;于天立zh_TW
dc.contributor.oralexamcommitteeShang-Tse Chen;Tian-Li Yuen
dc.subject.keyword機器學習,統計學習,公平性,穩健性,標記噪音,zh_TW
dc.subject.keywordMachine Learning,Statistical Learning,Fairness,Robustness,Label Noise,en
dc.relation.page77-
dc.identifier.doi10.6342/NTU202503227-
dc.rights.note未授權-
dc.date.accepted2025-08-11-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
dc.date.embargo-liftN/A-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
14.69 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved