隱私、效用與公平：基於差分隱私與k-匿名的混合隱私保護

張凱惇; Kai-Tun Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101796

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃乾綱	zh_TW
dc.contributor.advisor	Chien-Kang Huang	en
dc.contributor.author	張凱惇	zh_TW
dc.contributor.author	Kai-Tun Chang	en
dc.date.accessioned	2026-03-04T16:38:09Z	-
dc.date.available	2026-03-05	-
dc.date.copyright	2026-03-04	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-09	-
dc.identifier.citation	[1] 中華民國數位發展部 (2024). 隱私強化技術應用指引. 台北：數位發展部.檢索自 https://moda.gov.tw/press/press-releases/6497 [2] Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570. [3] Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). ldiversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es. [4] Dwork, C. (2006). Differential privacy. In International Colloquium on Automata, Languages, and Programming (pp. 1-12). Springer. [5] Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4), 211-407. [6] Erlingsson, Ú., Pihur, V., & Korolova, A. (2014). RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security (pp. 1054-1067). [7] McMahan, H. B., Ramage, D., Talwar, K., & Zhang, L. (2018). Learning differentially private recurrent language models. In International Conference on Learning Representations (ICLR). [8] Zhai, F., Liang, X., Qin, Y., Li, B., Shen, L., & Xie, J. (2024). Privacy-preserving method for sensitive partitions of electricity consumption data based on hybrid differential privacy and k-anonymity. Journal of Physics: Conference Series, 2806(1), 012010. [9] Lin, Y., Fang, H., & Yang, P. (2022). A framework combining differential privacy and k-anonymity for distributed databases. Information Sciences, 589, 634-649. [10] Kobayashi, R., Shishido, H., & Sugiyama, M. (2024). On the relationship between probabilistic k-anonymity and differential privacy. Journal of Privacy and Confidentiality, 14(2), 45-67. [11] Bayardo, R. J., & Agrawal, R. (2005). Data privacy through optimal kanonymization. In Proceedings of the 21st International Conference on Data Engineering (pp. 217-228). IEEE. [12] Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp. 223-228). [13] Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in neural information processing systems, 29. [14] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1-35. [15] Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. John Wiley & Sons. [16] El Emam, K., & Dankar, F. K. (2008). Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association, 15(5), 627-637. [17] Mendes, R., & Vilela, J. P. (2017). Privacy-preserving data mining: methods, metrics, and applications. IEEE Access, 5, 10562-10582. [18] Kohavi, R., & Becker, B. (1996). Adult data set. UCI Machine Learning Repository. [19] Bagdasaryan, E., Poursaeed, O., & Shmatikov, V. (2023). Differential privacy has disparate impact on model accuracy. In Neural Information Processing Systems (Vol. 32). [20] Tang, Y., Wang, K., Chen, Z., & Zhang, Y. (2022). Investigating the fairness impacts of differential privacy. In Proceedings of the ACM Web Conference 2022 (pp. 2284-2293). [21] Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. In STOC (pp. 169-178). [22] Yao, A. C. (1982). Protocols for secure computations. In FOCS (pp. 160-164). [23] Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672-2680). [24] de Oliveira, A. S., et al. (2023). An empirical analysis of fairness notions under differential privacy. arXiv preprint arXiv:2302.02910. [25] Majeed, A., & Hwang, S. O. (2024). Differential privacy and k-anonymity-based privacy preserving data publishing scheme with minimal loss of statistical information. IEEE Transactions on Computational Social Systems, 11(3), 3753-3765. [26] Bargh, M. S., & Choenni, S. (2022). Towards an integrated approach for preserving data utility, privacy and fairness. In 2022 International Conference on Multidisciplinary Research (pp. 290-306). [27]Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. https://www.fairmlbook.org/ [28]金融監督管理委員會 (2024)。金融機構資料共享之資料治理諮詢文件。取自 https://www.fsc.gov.tw/ch/home.jsp?id=96&parentpath=0,2&mcustomize=news_view.jsp&dataserno=202405160001&dtable=News [29] Bird, S., Dudík, M., Edgar, R., Horn, B., Lutz, R., Milan, V., ... & Walker, K. (2020). Fairlearn: A toolkit for assessing and improving fairness in AI. Microsoft Technical Report MSR-TR-2020-32. https://fairlearn.org/ [30] Wood, A., Altman, M., Bembenek, A., Bun, M., Gaboardi, M., Honaker, J., Vadhan, S. (2018). Differential privacy: A primer for a non-technical audience. Vanderbilt Journal of Entertainment & Technology Law, 21(1), 209-276. [31] Xiong, X., Liu, S., Li, D., Cai, Z., & Niu, X. (2025). Differential Privacy Configurations in the Real World: A Comparative Analysis. IEEE Transactions on Knowledge and Data Engineering. DOI: 10.1109/TKDE.2025.3603731 [32] Fung, B. C., Wang, K., Chen, R., & Yu, P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), 42(4), 1-53. https://doi.org/10.1145/1749603.1749605 [33] Apple Inc. (2017). Differential Privacy Overview. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf [34] Ding, B., Kulkarni, J., & Yekhanin, S. (2017). Collecting telemetry data privately. In Advances in Neural Information Processing Systems (pp. 3571-3580). [35]Enhancing Data Privacy: A Comprehensive Survey of Privacy- Enabling Technologies. IEEE Access, 2024. https://ieeexplore.ieee.org/document/10908383	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101796	-
dc.description.abstract	在數位時代，資料分析能帶來有價值的洞察，但也可能侵犯個人隱私。傳統隱私保護技術面臨兩難：k-匿名雖直觀易懂，但難以抵禦具有背景知識的攻擊；差分隱私雖提供嚴謹的數學保障，但可能過度降低資料效用性。此外，現有研究多聚焦於隱私與效用的雙維度平衡，較少關注隱私保護機制對不同社會群體的差異性影響。由於資料收集過程中的固有不平衡，隱私保護機制可能無意中放大這種不平衡，對少數群體造成不成比例的負面影響，進而引發公平性問題。針對此挑戰，本研究提出一套創新的三維隱私分析框架，目標在於：（1）整合 k-匿名與差分隱私的互補優勢；（2）量化隱私參數對隱私保護性、資料效用性與資料公平性的影響；（3）尋找三個維度間的最佳平衡點。本研究採用 Adult 資料集進行實驗驗證。首先，透過泛化處理建立目標 k=5 的匿名群組；其次，對等價類計數應用拉普拉斯機制，測試 11 個不同的隱私參數 ε 值（0.1 至 10.0）；最後，建立標準化評分體系，以總變異距離（TVD）衡量資訊損失，以均等勝算差異評估公平性，並透過權重敏感度分析驗證參數選擇的穩健性。實驗結果顯示：（1）在均衡權重配置（0.4/0.3/0.3）下，ε=1.0 獲得最高整合評分（0.764），在隱私保護性（評分 0.800，k=5）、資料效用性（評分 0.899，TVD=0.004）與資料公平性（評分 0.582）三個維度間達到最佳平衡；（2）與純 k-匿名或純差分隱私方法相比，本研究的混合機制在種族公平性方面改善約 20.1%，同時維持 99.09% 的下游應用準確率保持率；（3）權重敏感度分析證實 ε=1.0 在多數應用情境下表現穩健。本研究不僅整合了 k-匿名的直觀性與差分隱私的理論保障，更首次將公平性系統性地納入隱私保護評估，為需要兼顧多重目標的隱私保護應用提供了全面且靈活的決策支援框架。	zh_TW
dc.description.abstract	In the digital era, data analysis provides valuable insights but may also lead to privacy violations. Traditional privacy protection techniques face a dilemma: k-anonymity, while intuitive and easy to understand, struggles to defend against attackers with background knowledge; differential privacy offers rigorous mathematical guarantees but may excessively reduce data utility. Moreover, existing research primarily focuses on the privacy-utility trade-off, with limited attention to the differential impacts of privacy mechanisms across social groups. Due to inherent imbalances in data collection, privacy protection mechanisms may inadvertently amplify these disparities, causing disproportionate negative effects on minority groups and raising fairness concerns. To address these challenges, this study proposes an innovative three-dimensional privacy analysis framework aimed at: (1) integrating the complementary strengths of k-anonymity and differential privacy; (2) quantifying the impact of privacy parameters on privacy protection, data utility, and data fairness; and (3) identifying the optimal balance among these three dimensions. This research employs the Adult dataset for experimental validation. First, generalization processing establishes anonymization groups with a target k=5. Second, the Laplace mechanism is applied to equivalence class counts, testing 11 different privacy parameters ε (ranging from 0.1 to 10.0). Finally, a standardized scoring system is established, using Total Variation Distance (TVD) to measure information loss, Equalized Odds difference to assess fairness, and conducting weight sensitivity analysis to verify the robustness of parameter selection. Experimental results demonstrate: (1) Under balanced weight configuration (0.4/0.3/0.3), ε=1.0 achieves the highest integrated score (0.764), attaining optimal balance across privacy protection (score 0.800, k=5), data utility (score 0.899, TVD=0.004), and data fairness (score 0.582); (2) Compared with pure k-anonymity or pure differential privacy methods, the proposed hybrid mechanism improves race fairness by approximately 20.1% while maintaining 99.09% accuracy retention in downstream applications; (3) Weight sensitivity analysis confirms that ε=1.0 demonstrates robust performance across most application scenarios. This research not only integrates the intuitiveness of k-anonymity with the theoretical guarantees of differential privacy, but also systematically incorporates fairness into privacy protection evaluation for the first time, systematically incorporates fairness into privacy protection evaluation, providing a comprehensive and flexible decision support framework for privacy protection applications requiring multiple objectives.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-04T16:38:09Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-04T16:38:09Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 ………………………………………………………………………..i ABSTRACT ii 目次 ……………………………………………………………………iv 圖次 …………………………………………………………………...vii 表次 …………………………………………………………………..viii 第一章緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 第二章文獻探討 4 2.1 隱私保護技術概述 4 2.1.1 隱私強化技術分類與發展 5 2.2 k-匿名技術 6 2.3 差分隱私技術 9 2.4 混合隱私保護機制 11 2.4.1 近期混合隱私保護機制研究進展 14 2.4.2 技術選擇理由與適用性分析 14 2.4.3 本研究與現有方法的差異分析 15 2.5 公平性度量標準 16 2.5.1 公平性作為隱私保護的核心原則 16 2.5.2 公平性指標的分類與選擇 18 第三章研究方法設計 21 3.1資料前處理 22 3.2差分隱私 23 3.3指標分析 25 3.3.1 隱私保護性 25 3.3.2 資料效用性 26 3.3.3 資料公平性 29 第四章實驗結果與討論 32 4.1資料集 32 4.1.1資料集選擇理由 32 4.1.2資料預處理 33 4.1.3準識別符的識別與選擇 34 4.1.4準識別符選擇結果與分析 34 4.2實驗設計與參數設置 37 4.3隱私參數與資料效用性的關係分析 41 4.4隱私參數與資料公平性的關係分析 46 4.4.1公平性測量結果與初步觀察 46 4.4.2公平性變化趨勢的視覺化分析 49 4.4.3群體差異的根本原因分析 53 4.4.4實務啟示 54 4.5資料隱私性、資料效用性與資料公平性的綜合評估 55 4.6與其他隱私保護方法的比較分析 62 4.6.1比較方法說明 63 4.6.2比較結果分析 63 4.7下游應用性能評估 65 4.7.1評估方法設計 65 4.7.2實驗結果分析 65 4.7.3實用性啟示 67 第五章結果與未來展望 68 5.1結論 68 5.2未來展望 69 5.2.1研究限制 69 5.2.2未來研究方向 69 5.2.3結語 70 參考文獻 71 附錄：詞彙表(glossary) 74	-
dc.language.iso	zh_TW	-
dc.subject	差分隱私	-
dc.subject	k-匿名	-
dc.subject	資料隱私	-
dc.subject	隱私保護機制	-
dc.subject	均等勝算	-
dc.subject	Differential privacy	-
dc.subject	k-anonymity	-
dc.subject	Data privacy	-
dc.subject	Privacy protection mechanism	-
dc.subject	Equalized-odd	-
dc.title	隱私、效用與公平：基於差分隱私與k-匿名的混合隱私保護	zh_TW
dc.title	Privacy, Utility and Fairness: A Hybrid Privacy Protection with Differential Privacy and k-Anonymity	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張信宏;張恆華	zh_TW
dc.contributor.oralexamcommittee	Shin-Hung Chang;Herng-Hua Chang	en
dc.subject.keyword	差分隱私,k-匿名資料隱私隱私保護機制均等勝算	zh_TW
dc.subject.keyword	Differential privacy,k-anonymityData privacyPrivacy protection mechanismEqualized-odd	en
dc.relation.page	75	-
dc.identifier.doi	10.6342/NTU202600729	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2026-02-10	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	工程科學及海洋工程學系	-
dc.date.embargo-lift	2026-03-05	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	3.22 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。