多類別神經網路的訓練資料重建方法：針對表格式資 料的實證研究

廖明祐; Ming-You Liao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99667

標題:	多類別神經網路的訓練資料重建方法：針對表格式資料的實證研究 Data Reconstruction from Multi-Class Neural Networks: An Empirical Study on Tabular Data
作者:	廖明祐 Ming-You Liao
指導教授:	陳由常 Yu-Chang Chen
共同指導教授:	盧信銘 Hsin-Min Lu
關鍵字:	機器學習,資料重建,資料隱私,表格資料,KKT-based 重建方法,神經網路模型, Machine Learning,Data Reconstruction,Data Privacy,Tabular Data,KKT- based Reconstruction,Neural Network Model,
出版年 :	2025
學位:	碩士
摘要:	近年研究指出，即使神經網路無法直接存取原始輸入資料，仍可能在參數中隱含並洩漏訓練樣本，這對於常含敏感資訊的表格型資料而言，構成重大的隱私威脅。雖然已有多種資料重建攻擊方法被提出，但大多仰賴大量模型內部資訊，如梯度、邏輯值或預測結果，使其在實務應用中受到限制。本論文探討一個核心問題：是否僅依賴已訓練完成模型的最終權重，就能還原其訓練樣本。我們的研究基於近期一項理論，該理論指出，當以梯度下降法訓練 ReLU 神經網路時，模型實際上隱性地在解一個最大邊界問題，因此提供了一個利用Karush-Kuhn-Tucker (KKT) 條件的方式，從模型參數中重建訓練資料。我們將此基於KKT的方法應用於表格式資料的情境，並系統性地評估其有效性。在合成與真實資料集上的實驗結果顯示，部分資料訓練點確實能被準確地重建。我們進一步分析發現，重建效果受到多項關鍵因素影響，包括：資料本身的訊號變異數、輸出類別數量，以及神經網路的深度與寬度。特別地，在訊號變異數較低、類別數較多，且模型結構偏深時，重建攻擊的效果最為顯著。此外，我們透過統計方法描繪資料洩漏的結構性特徵，指出雖然重建結果近似原始資料，但仍存在系統性的偏差。我們也展示了此攻擊方法在現實環境中的操作可行性，即便無法存取真實資料，攻擊者仍能利用模型分類邊界，有效篩選出準確度高的重建樣本。總體而言，本研究探討將基於 KKT 條件的資料重建方法應用於表格型資料的可行性，並透過系統性實驗評估其成效與限制，進一步揭示神經網路最終權重中潛在的資料洩漏風險。 Recent studies have shown that neural network models can memorize and leak training data, even when the original inputs are not directly accessible. This poses a serious privacy concern for tabular datasets, which often contain sensitive information. While existing works have demonstrated various reconstruction attacks, many of these approaches require access to extensive model information, such as gradients or logits. This thesis investigates whether training records can be recovered using only the final weights of a trained model. We build on a recent framework which posits that training a ReLU neural network with gradient descent implicitly solves a maximum margin problem. This connection enables the use of the Karush-Kuhn-Tucker (KKT) conditions of this optimization problem to reconstruct potential training data from the model parameters. We adapt this KKT-based approach to the tabular data setting and systematically evaluate its effectiveness. Through controlled experiments on both synthetic and real-world datasets, we demonstrate that a subset of training instances can indeed be reconstructed with high fidelity. Our analysis reveals several key factors influencing this vulnerability: reconstruction is most effective when data has low signal variance, the number of output classes is high, and, notably, when the network architecture is deep rather than wide. Furthermore, we characterize the nature of the leakage through statistical testing, finding that the reconstructions are close approximations but contain systematic bias. Finally, we demonstrate the practical viability of this attack by showing that an attacker can use the model's classification margin to reliably identify these high-fidelity reconstructions without access to the ground truth. Overall, this study investigates the applicability of KKT-based data reconstruction methods to tabular data. Through systematic experiments, we evaluate both the feasibility and the limitations of this approach, highlighting the potential risks of data leakage embedded in the final weights of neural networks.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99667
DOI:	10.6342/NTU202501907
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2030-08-01
顯示於系所單位：	經濟學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	10.36 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。