Daily Paper: Antidistillation Sampling

Proposes Antidistillation Sampling, a method to poison LLM reasoning traces during generation, hindering model distillation while preserving the original model's performance.

Paper: Antidistillation Sampling

Authors: Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter (CMU)

Published: April 17, 2025 (arXiv)


Problem Background

Detailed reasoning traces generated by large language models (LLMs), while powerful, create a vulnerability. Competitors can leverage these public traces for model distillation, cheaply replicating powerful models. This leads to intellectual property leakage and potential security risks (e.g., bypassing safety alignments).

Proposed Method: Antidistillation Sampling

  • Core Idea: Poison the reasoning traces generated by the original (teacher) model to hinder distillation, without significantly compromising the teacher model’s own performance.
  • Implementation: This is a sampling strategy applied during token generation:
    • It considers not only the teacher model’s original next-token probabilities but also introduces an “antidistillation” adjustment term.
    • This adjustment term uses a proxy model (a smaller model) and the gradient of a loss on a downstream task to estimate which tokens are “harmful” to distillation (i.e., selecting them reduces the effectiveness of distillation).
    • The next token is then sampled from this adjusted probability distribution.
  • Key Aspect: The original teacher model is not modified. The adjustment happens only during inference sampling, and the poisoning strength is controlled to minimize impact on the teacher’s utility.

Experimental Results

  • Effectiveness: Antidistillation sampling significantly reduces the effectiveness of student model distillation (measured by a large drop in accuracy) while maintaining the teacher model’s accuracy on benchmarks like GSM8K and MATH.
  • Superiority: Compared to simply increasing the sampling temperature (which drastically degrades teacher performance), antidistillation sampling offers a better trade-off between performance and distillation resistance.
  • Overhead: The main cost is two additional forward passes through the small proxy model for each generated token.
Licensed under CC BY-NC-SA 4.0
Last updated on Apr 19, 2025 00:00 UTC
Built with Hugo
Theme Stack designed by Jimmy