This guide walks you through configuring and running a distillation job using DistilKitPlus.
Start by creating a configuration file:
config/
directorydefault_config.json
or config_online_qwq_phi4_uld.json
)If you are providing a path for "logits_file"
, you must first generate the teacher logits by following the steps in the Generating Teacher Logits section.
Here’s an example of the key sections you’ll need to modify:
Using Modal for Cloud Execution:
If you plan to run your jobs using Modal, you’ll need to upload the Accelerate and DeepSpeed configuration files to a Modal Volume (e.g., distillation-volume
). The corresponding Modal scripts (scripts/modal/distill_logits.py
or scripts/modal/generate_logits.py
) must be configured to mount this volume and access the files from the volume path.
Refer to the Configuration page for detailed explanations of all available parameters.
Execute the distill_logits.py
script with your configuration:
The script will output training logs to the console, showing:
If you’ve configured Weights & Biases integration, you can also monitor these metrics in real-time via the WandB dashboard. (HuggingFace Trainer integration by default)
Once training completes, the final model will be saved to the directory specified in training.output_dir
with a subdirectory called final-distilled-checkpoint
. You can load it like any Hugging Face model:
forward_kl
(fkl
) loss type for distillation when using pre-computed logits.Then update your distillation config to use these pre-computed logits:
This guide walks you through configuring and running a distillation job using DistilKitPlus.
Start by creating a configuration file:
config/
directorydefault_config.json
or config_online_qwq_phi4_uld.json
)If you are providing a path for "logits_file"
, you must first generate the teacher logits by following the steps in the Generating Teacher Logits section.
Here’s an example of the key sections you’ll need to modify:
Using Modal for Cloud Execution:
If you plan to run your jobs using Modal, you’ll need to upload the Accelerate and DeepSpeed configuration files to a Modal Volume (e.g., distillation-volume
). The corresponding Modal scripts (scripts/modal/distill_logits.py
or scripts/modal/generate_logits.py
) must be configured to mount this volume and access the files from the volume path.
Refer to the Configuration page for detailed explanations of all available parameters.
Execute the distill_logits.py
script with your configuration:
The script will output training logs to the console, showing:
If you’ve configured Weights & Biases integration, you can also monitor these metrics in real-time via the WandB dashboard. (HuggingFace Trainer integration by default)
Once training completes, the final model will be saved to the directory specified in training.output_dir
with a subdirectory called final-distilled-checkpoint
. You can load it like any Hugging Face model:
forward_kl
(fkl
) loss type for distillation when using pre-computed logits.Then update your distillation config to use these pre-computed logits: