Daily Paper: Antidistillation Sampling

Paper: Antidistillation Sampling

Authors: Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter (CMU)

Published: April 17, 2025 (arXiv)

Problem Background

Detailed reasoning traces generated by large language models (LLMs), while powerful, create a vulnerability. Competitors can leverage these public traces for model distillation, cheaply replicating powerful models. This leads to intellectual property leakage and potential security risks (e.g., bypassing safety alignments).

Proposed Method: Antidistillation Sampling

Core Idea: Poison the reasoning traces generated by the original (teacher) model to hinder distillation, without significantly compromising the teacher model’s own performance.
Implementation: This is a sampling strategy applied during token generation:
- It considers not only the teacher model’s original next-token probabilities but also introduces an “antidistillation” adjustment term.
- This adjustment term uses a proxy model (a smaller model) and the gradient of a loss on a downstream task to estimate which tokens are “harmful” to distillation (i.e., selecting them reduces the effectiveness of distillation).
- The next token is then sampled from this adjusted probability distribution.
Key Aspect: The original teacher model is not modified. The adjustment happens only during inference sampling, and the poisoning strength is controlled to minimize impact on the teacher’s utility.

Experimental Results

Effectiveness: Antidistillation sampling significantly reduces the effectiveness of student model distillation (measured by a large drop in accuracy) while maintaining the teacher model’s accuracy on benchmarks like GSM8K and MATH.
Superiority: Compared to simply increasing the sampling temperature (which drastically degrades teacher performance), antidistillation sampling offers a better trade-off between performance and distillation resistance.
Overhead: The main cost is two additional forward passes through the small proxy model for each generated token.