Tests
Recent DeepMind Document on the ethical and social threats of language models, immense language models were identified leak of confidential information about their training data as a potential risk that organizations working on these models have an obligation to address. Other recent article shows that similar privacy threats can also arise in standard image classification models: the fingerprint of each individual training image can be embedded in the model parameters, and malicious actors can utilize such fingerprints to reconstruct the training data from the model.
Privacy-enhancing technologies such as differential privacy (DP) can be implemented during training to mitigate these risks, but they often result in significant reductions in model performance. In this work, we make significant progress toward unlocking high-accuracy training of image classification models under differential privacy.
Figure 1: (left) Illustration of training data leak in GPT-2 [credit: Carlini et al. “Extracting Training Data from Large Language Models”, 2021]. (right) CIFAR-10 training examples reconstructed from a 100k parameter convolutional neural network [credit: Balle et al. “Reconstructing Training Data with Informed Adversaries”, 2022]
Differential privacy was proposed as a mathematical framework for capturing the requirement to protect individual records during statistical data analysis (including training machine learning models). DP algorithms protect individuals from any inferences about the features that make them unique (including complete or partial reconstruction) by injecting carefully calibrated noise during the computation of the desired statistic or model. The utilize of DP algorithms provides resilient and demanding privacy guarantees both in theory and in practice, and has become the de facto gold standard adopted by a number of public AND private organizations.
The most popular DP algorithm for deep learning is differentially private stochastic gradient descent (DP-SGD), which is a modification of standard SGD obtained by trimming the gradients of individual examples and adding enough noise to mask the contribution of each individual to each model update:
Figure 2: Illustration of how DP-SGD processes the gradients of individual examples and adds noise to generate model updates with privatized gradients.
Unfortunately, prior work has shown that in practice, the privacy protection provided by DP-SGD often comes at the cost of significantly less true models, which is a solemn obstacle to the widespread adoption of differential privacy in the machine learning community. Consistent with empirical evidence from prior work, this utility degradation in DP-SGD becomes more severe for larger neural network models—including those regularly used to achieve top-tier performance in demanding image classification benchmarks.
The following graph summarizes two of our main results: about 10% improvement on CIFAR-10 compared to previous work when trained privately without additional data, and 86.7% accuracy on ImageNet when privately fine-tuning a model pre-trained on a different dataset, which almost closed the gap to the best performance in non-private mode.
Figure 3: (left) Our best results in training WideResNet models on CIFAR-10 without additional data. (right) Our best results in fine-tuning NFNet models on ImageNet. The best performing model was pre-trained on an internal dataset disjoint from ImageNet.
These results were obtained with ε=8, a standard setting for calibrating the strength of protection offered by differential privacy in machine learning applications. We refer to the paper for a discussion of this parameter, as well as additional experimental results with other values of ε and on other datasets. Along with the paper, we also open-source our implementation to allow other researchers to verify our findings and build on them. We hope that this contribution will lend a hand others interested in making practical DP training a reality.
Download our JAX implementation on GitHub.