Running a Distillation Job
This guide walks you through configuring and running a distillation job using DistilKitPlus.
Step 1: Configure Your Distillation
Start by creating a configuration file:
- Navigate to the
config/
directory - Copy a template config (e.g.,
default_config.json
orconfig_online_qwq_phi4_uld.json
) - Edit the configuration parameters according to your needs
If you are providing a path for "logits_file"
, you must first generate the teacher logits by following the steps in the Generating Teacher Logits section.
Here’s an example of the key sections you’ll need to modify:
Using Modal for Cloud Execution:
If you plan to run your jobs using Modal, you’ll need to upload the Accelerate and DeepSpeed configuration files to a Modal Volume (e.g., distillation-volume
). The corresponding Modal scripts (scripts/modal/distill_logits.py
or scripts/modal/generate_logits.py
) must be configured to mount this volume and access the files from the volume path.
Refer to the Configuration page for detailed explanations of all available parameters.
Step 2: Run the Distillation Script
Execute the distill_logits.py
script with your configuration:
Step 3: Monitor Training Progress
The script will output training logs to the console, showing:
- Loss values (combined, distillation, and task losses)
- Learning rate changes
- Training speed (samples/second)
If you’ve configured Weights & Biases integration, you can also monitor these metrics in real-time via the WandB dashboard. (HuggingFace Trainer integration by default)
Step 4: Use Your Distilled Model
Once training completes, the final model will be saved to the directory specified in training.output_dir
with a subdirectory called final-distilled-checkpoint
. You can load it like any Hugging Face model:
Generating Teacher Logits (Optional)
forward_kl
(fkl
) loss type for distillation when using pre-computed logits.Then update your distillation config to use these pre-computed logits: