This guide walks you through configuring and running a distillation job using DistilKitPlus.
Start by creating a configuration file:
- Navigate to the
config/
directory
- Copy a template config (e.g.,
default_config.json
or config_online_qwq_phi4_uld.json
)
- Edit the configuration parameters according to your needs
If you are providing a path for "logits_file"
, you must first generate the teacher logits by following the steps in the Generating Teacher Logits section.
Here’s an example of the key sections you’ll need to modify:
{
"project_name": "my-first-distillation",
"dataset": {
"name": "tatsu-lab/alpaca", // Your dataset path
"logits_file": null, // Set to path of .tfrecord if using pre-computed logits
"num_samples": 10000
},
"models": {
"teacher": "meta-llama/Llama-3.1-70B-Instruct", // Your teacher model
"student": "meta-llama/Llama-3.1-8B-Instruct", // Your student model
"teacher_vocab_size": 128256
},
"tokenizer": {
"max_length": 2048 // Adjust based on your GPU memory
},
"training": {
"output_dir": "./distilled_model",
"per_device_train_batch_size": 1,
"num_train_epochs": 3,
"learning_rate": 2e-5
},
"distillation": {
"temperature": 2.0,
"alpha": 0.1,
"loss_type": "fkl" // Try "uld" or "multi-ot"
},
"lora": {
"enable_training": true,
"r": 16,
"alpha": 32
},
"quantization": {
"enabled": true // Set to false if you have sufficient GPU memory
}
}
Using Modal for Cloud Execution:
If you plan to run your jobs using Modal, you’ll need to upload the Accelerate and DeepSpeed configuration files to a Modal Volume (e.g., distillation-volume
). The corresponding Modal scripts (scripts/modal/distill_logits.py
or scripts/modal/generate_logits.py
) must be configured to mount this volume and access the files from the volume path.
Refer to the Configuration page for detailed explanations of all available parameters.
Step 2: Run the Distillation Script
Execute the distill_logits.py
script with your configuration:
# Running with local resources
python scripts/local/distill_logits.py --config config/my_config.json
# Running with Accelerate for multi-GPU training
accelerate launch scripts/local/distill_logits.py --config config/my_config.json
# Running with Modal for cloud execution
modal run scripts/modal/distill_logits.py --config config/my_config.json
Step 3: Monitor Training Progress
The script will output training logs to the console, showing:
- Loss values (combined, distillation, and task losses)
- Learning rate changes
- Training speed (samples/second)
If you’ve configured Weights & Biases integration, you can also monitor these metrics in real-time via the WandB dashboard. (HuggingFace Trainer integration by default)
Step 4: Use Your Distilled Model
Once training completes, the final model will be saved to the directory specified in training.output_dir
with a subdirectory called final-distilled-checkpoint
. You can load it like any Hugging Face model:
from transformers import AutoModelForCausalLM, AutoTokenizer
# For full model distillation
model_path = "path/to/training.output_dir/final-distilled-checkpoint"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# For LoRA adapter
from peft import PeftModel, PeftConfig
base_model_id = "meta-llama/Llama-3.1-8B-Instruct" # Your student model
adapter_path = "path/to/training.output_dir/final-distilled-checkpoint"
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
Generating Teacher Logits (Optional)
This feature is currently under development and only supports the forward_kl
(fkl
) loss type for distillation when using pre-computed logits.
For more memory-efficient training, especially with very large teacher models, you can pre-compute teacher logits:
python scripts/local/generate_logits.py \
--config config/logit_generation_config.json \
--output_file path/to/save/teacher_logits.tfrecord
# Running with Modal for cloud execution
modal run scripts/modal/generate_logits.py \
--config config/logit_generation_config.json \
--output_file path/to/save/teacher_logits.tfrecord
Then update your distillation config to use these pre-computed logits:
{
"dataset": {
"logits_file": "path/to/save/teacher_logits.tfrecord"
},
"models": {
"teacher": null, // No need to load teacher model
"teacher_vocab_size": 128256 // Must match the teacher model that generated logits
}
}