This guide walks you through configuring and running a distillation job using DistilKitPlus.

Step 1: Configure Your Distillation

Start by creating a configuration file:

  1. Navigate to the config/ directory
  2. Copy a template config (e.g., default_config.json or config_online_qwq_phi4_uld.json)
  3. Edit the configuration parameters according to your needs

If you are providing a path for "logits_file", you must first generate the teacher logits by following the steps in the Generating Teacher Logits section.

Here’s an example of the key sections you’ll need to modify:

{
  "project_name": "my-first-distillation",
  "dataset": {
    "name": "tatsu-lab/alpaca",  // Your dataset path
    "logits_file": null,         // Set to path of .tfrecord if using pre-computed logits
    "num_samples": 10000         
  },
  "models": {
    "teacher": "meta-llama/Llama-3.1-70B-Instruct",  // Your teacher model
    "student": "meta-llama/Llama-3.1-8B-Instruct",   // Your student model
    "teacher_vocab_size": 128256                     
  },
  "tokenizer": {
    "max_length": 2048  // Adjust based on your GPU memory
  },
  "training": {
    "output_dir": "./distilled_model",
    "per_device_train_batch_size": 1,
    "num_train_epochs": 3,
    "learning_rate": 2e-5
  },
  "distillation": {
    "temperature": 2.0,
    "alpha": 0.1,
    "loss_type": "fkl"  // Try "uld" or "multi-ot" 
  },
  "lora": {
    "enable_training": true,
    "r": 16,
    "alpha": 32
  },
  "quantization": {
    "enabled": true  // Set to false if you have sufficient GPU memory
  }
}

Using Modal for Cloud Execution: If you plan to run your jobs using Modal, you’ll need to upload the Accelerate and DeepSpeed configuration files to a Modal Volume (e.g., distillation-volume). The corresponding Modal scripts (scripts/modal/distill_logits.py or scripts/modal/generate_logits.py) must be configured to mount this volume and access the files from the volume path.

Refer to the Configuration page for detailed explanations of all available parameters.

Step 2: Run the Distillation Script

Execute the distill_logits.py script with your configuration:

# Running with local resources
python scripts/local/distill_logits.py --config config/my_config.json

# Running with Accelerate for multi-GPU training
accelerate launch scripts/local/distill_logits.py --config config/my_config.json

# Running with Modal for cloud execution
modal run scripts/modal/distill_logits.py --config config/my_config.json

Step 3: Monitor Training Progress

The script will output training logs to the console, showing:

  • Loss values (combined, distillation, and task losses)
  • Learning rate changes
  • Training speed (samples/second)

If you’ve configured Weights & Biases integration, you can also monitor these metrics in real-time via the WandB dashboard. (HuggingFace Trainer integration by default)

Step 4: Use Your Distilled Model

Once training completes, the final model will be saved to the directory specified in training.output_dir with a subdirectory called final-distilled-checkpoint. You can load it like any Hugging Face model:

from transformers import AutoModelForCausalLM, AutoTokenizer

# For full model distillation
model_path = "path/to/training.output_dir/final-distilled-checkpoint"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# For LoRA adapter
from peft import PeftModel, PeftConfig

base_model_id = "meta-llama/Llama-3.1-8B-Instruct"  # Your student model
adapter_path = "path/to/training.output_dir/final-distilled-checkpoint"

model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

Generating Teacher Logits (Optional)

This feature is currently under development and only supports the forward_kl (fkl) loss type for distillation when using pre-computed logits.
For more memory-efficient training, especially with very large teacher models, you can pre-compute teacher logits:

python scripts/local/generate_logits.py \
  --config config/logit_generation_config.json \
  --output_file path/to/save/teacher_logits.tfrecord

# Running with Modal for cloud execution
modal run scripts/modal/generate_logits.py \
  --config config/logit_generation_config.json \
  --output_file path/to/save/teacher_logits.tfrecord

Then update your distillation config to use these pre-computed logits:

{
  "dataset": {
    "logits_file": "path/to/save/teacher_logits.tfrecord"
  },
  "models": {
    "teacher": null,  // No need to load teacher model
    "teacher_vocab_size": 128256  // Must match the teacher model that generated logits
  }
}