> ## Documentation Index
> Fetch the complete documentation index at: https://distillkitplus.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration

## Detailed Configuration Parameters

Here's a complete reference for all available configuration options:

### Top-Level Configuration

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">project\_name</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Name for the distillation project (used for logging)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">dataset</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Dataset configuration (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">models</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Model configuration (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">tokenizer</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Tokenizer configuration (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">training</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Training arguments (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">distillation</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Distillation-specific settings (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">model\_config</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Model loading options (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">lora</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">LoRA/PEFT configuration (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">quantization</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Quantization settings (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">execution</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Execution environment settings (see below)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">hf\_token</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Hugging Face API token for private models/datasets</td>
      </tr>
    </tbody>
  </table>
</div>

### Dataset Configuration (`dataset`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">name</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path or Hugging Face dataset name</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">split</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Dataset split to use (e.g., "train", "validation")</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">logits\_file</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path to TFRecord file with pre-computed logits (null for on-the-fly)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">num\_samples</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Maximum number of samples to use (null for all)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">select\_range</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">\[number, number] | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Range of samples to select \[start, end] (null for all)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">format\_function</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Name of formatter function (see Formatters section)</td>
      </tr>
    </tbody>
  </table>
</div>

### Models Configuration (`models`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">teacher</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Teacher model path/ID (needed if logits\_file is null)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">student</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Student model path/ID</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">student\_adapter</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path to pre-trained student adapter (e.g., LoRA)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">teacher\_adapter</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path to pre-trained teacher adapter</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">teacher\_vocab\_size</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Vocabulary size of teacher model (required if using logits\_file)</td>
      </tr>
    </tbody>
  </table>
</div>

### Tokenizer Configuration (`tokenizer`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">max\_length</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Maximum sequence length for truncation/filtering</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">chat\_template</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Optional Jinja chat template string</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">student\_pad\_token\_id</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Pad token ID for student tokenizer</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">teacher\_pad\_token\_id</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Pad token ID for teacher tokenizer</td>
      </tr>
    </tbody>
  </table>
</div>

### Training Configuration (`training`)

This section contains standard Hugging Face `TrainingArguments` parameters. Here are the most common ones:

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">output\_dir</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Directory to save model checkpoints and results</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">num\_train\_epochs</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Number of training epochs</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">per\_device\_train\_batch\_size</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Batch size per GPU</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">gradient\_accumulation\_steps</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Number of forward passes before backward pass</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">save\_steps</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Save checkpoint every N steps</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">logging\_steps</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Log metrics every N steps</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">learning\_rate</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Initial learning rate</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">warmup\_ratio</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Ratio of steps for learning rate warmup</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">lr\_scheduler\_type</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">LR scheduler (e.g., "cosine", "linear")</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">resume\_from\_checkpoint</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path to checkpoint to resume from</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">bf16</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Enable bfloat16 mixed precision training</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">fp16</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Enable float16 mixed precision training</td>
      </tr>
    </tbody>
  </table>
</div>

### Distillation Configuration (`distillation`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">temperature</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Temperature for softening distributions (typically 2.0-4.0)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">alpha</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Weight for distillation loss (between 0-1)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">loss\_type</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Distillation loss type: "fkl", "kld", "uld", "multi-ot"</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">student\_response\_template</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Template for student response (used in uld/multi-ot)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">teacher\_response\_template</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Template for teacher response (used in uld/multi-ot)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">k</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Top-k parameter for "uld" and "multi-ot" losses</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">loss\_kwargs</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">object</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Additional parameters for "multi-ot" loss type. Parameters: "log\_loss\_weight", "sikhorn\_loss\_weight".</td>
      </tr>
    </tbody>
  </table>
</div>

### Model Configuration (`model_config`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">use\_flash\_attention</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Enable Flash Attention 2 during model loading</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">trust\_remote\_code</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Set trust\_remote\_code for model loading</td>
      </tr>
    </tbody>
  </table>
</div>

### LoRA Configuration (`lora`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">enable\_training</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Enable LoRA training for the student model</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">r</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">LoRA rank (typically 8-64)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">alpha</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">LoRA alpha scaling factor (typically 2×r)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">dropout</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">number</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Dropout probability in LoRA layers</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">bias</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">LoRA bias type: "none", "all", "lora\_only"</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">task\_type</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Type of task (usually "CAUSAL\_LM")</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">target\_modules</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">array of strings</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">List of modules to apply LoRA to (e.g., "q\_proj", "k\_proj")</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">modules\_to\_save</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">array of strings</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Additional modules to make trainable</td>
      </tr>
    </tbody>
  </table>
</div>

### Quantization Configuration (`quantization`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">enabled</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Enable 4-bit quantization (BitsAndBytes NF4)</td>
      </tr>
    </tbody>
  </table>
</div>

### Execution Configuration (`execution`)

<div className="overflow-x-auto mt-4">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Parameter</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Description</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">use\_accelerate</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">boolean</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Whether HF Accelerate is used (for distributed training)</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm font-medium text-gray-900 dark:text-gray-100">accelerate\_config</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">string | null</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Path to accelerate config file (only required when using modal) </td>
      </tr>
    </tbody>
  </table>
</div>

## Sample Configuration

Here's a complete example configuration file for a typical distillation scenario:

```json theme={null}
{
  "project_name": "llama-3.1-70b-to-8b-distillation",
  "dataset": {
    "name": "tatsu-lab/alpaca",
    "split": "train",
    "logits_file": "/vol/logits/llama-3.1-70b-alpaca.tfrecord",
    "num_samples": 10000,
    "select_range": null,
    "format_function": "default_format"
  },
  "models": {
    "teacher": null,
    "student": "meta-llama/Llama-3.1-8B-Instruct",
    "student_adapter": null,
    "teacher_adapter": null,
    "teacher_vocab_size": 128256
  },
  "tokenizer": {
    "max_length": 2048,
    "chat_template": null,
    "student_pad_token_id": 128001,
    "teacher_pad_token_id": 128001
  },
  "training": {
    "output_dir": "/vol/distilled_model",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "save_steps": 500,
    "logging_steps": 10,
    "learning_rate": 2e-5,
    "weight_decay": 0.01,
    "warmup_ratio": 0.03,
    "lr_scheduler_type": "cosine",
    "resume_from_checkpoint": null,
    "fp16": false,
    "bf16": true
  },
  "distillation": {
    "temperature": 2.0,
    "alpha": 0.1,
    "loss_type": "fkl",
    "student_response_template": "<|start_header_id|>assistant<|end_header_id|>\n\n",
    "teacher_response_template": "<|start_header_id|>assistant<|end_header_id|>\n\n",
    "k": 100,
    "loss_kwargs": {}
  },
  "model_config": {
    "use_flash_attention": true,
    "trust_remote_code": false
  },
  "lora": {
    "enable_training": true,
    "r": 16,
    "alpha": 32,
    "dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
      "q_proj", "k_proj", "v_proj", "o_proj",
      "gate_proj", "up_proj", "down_proj"
    ],
    "modules_to_save": []
  },
  "quantization": {
    "enabled": true
  },
  "execution": {
    "use_accelerate": true,
    "accelerate_config": null
  },
  "hf_token": null
}
```

<Note>
  **Important:** Paths specified in the configuration (e.g., `dataset.logits_file`, `training.output_dir`, model paths) should point to locations within your accessible storage volume. In the example above, paths like `/vol/logits/...` and `/vol/distilled_model` assume your data and output directories are mapped to `/vol` inside your execution environment (like a container or VM).
</Note>

## Configuration Tips

1. **Memory Optimization:**
   * Enable quantization (`quantization.enabled: true`) for large models
   * Use LoRA (`lora.enable_training: true`) instead of full model fine-tuning
   * Adjust `tokenizer.max_length` based on your GPU memory

2. **Training Speed:**
   * Enable Flash Attention 2 with `model_config.use_flash_attention: true`
   * Use bfloat16 mixed precision with `training.bf16: true` on compatible hardware
   * Increase `training.per_device_train_batch_size` if memory allows
   * Set up distributed training with `execution.use_accelerate: true`

3. **Result Quality:**
   * Experiment with temperature values (typically between 1.0-4.0)
   * Adjust alpha to balance between distillation and task losses
