> ## Documentation Index
> Fetch the complete documentation index at: https://distillkitplus.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

# Welcome to DistilKitPlus

`DistilKitPlus` is an open-source toolkit designed for high-performance knowledge distillation, specifically tailored for Large Language Models (LLMs). It enables you to create smaller, faster, and more efficient models that retain the capabilities of larger teacher models.

Inspired by [`acree-ai/DistillKit`](https://github.com/arcee-ai/DistillKit), this project provides an easy-to-use framework with a particular focus on supporting **offline distillation** (using pre-computed teacher logits) and **Parameter-Efficient Fine-Tuning (PEFT)** methods like LoRA, making it suitable for environments with limited computational resources.

<div className="mx-auto px-4 pt-2 pb-4 bg-gradient-to-r from-green-50 to-gray-50 dark:from-green-900/30 dark:to-gray-900/30 border border-green-100 dark:border-green-900/50 rounded-lg">
  <div className="flex flex-col md:flex-row gap-6 items-center">
    <div className="flex-1">
      <h3 className="text-lg font-medium text-green-800 dark:text-green-400 mb-2">Why DistilKitPlus?</h3>

      <ul className="list-inside space-y-1">
        <li>Lower deployment costs with smaller, quantized models</li>
        <li>Improve inference speed for production applications</li>
        <li>Maintain high-quality outputs with proven distillation techniques</li>
        <li>Fine-tune efficiently with minimal computational resources</li>
      </ul>
    </div>

    <div className="flex-1">
      <img src="https://mintcdn.com/distillkitplus/o8_yzrzo2hFiRmoM/images/distill.png?fit=max&auto=format&n=o8_yzrzo2hFiRmoM&q=85&s=288b89cf8c0269ad511dd71f9b203fd1" alt="Knowledge distillation process" className="block dark:hidden rounded-lg shadow-md w-full max-w-2xl" width="2048" height="850" data-path="images/distill.png" />

      <img src="https://mintcdn.com/distillkitplus/o8_yzrzo2hFiRmoM/images/distill.png?fit=max&auto=format&n=o8_yzrzo2hFiRmoM&q=85&s=288b89cf8c0269ad511dd71f9b203fd1" alt="Knowledge distillation process" className="hidden dark:block rounded-lg shadow-md w-full max-w-2xl" width="2048" height="850" data-path="images/distill.png" />
    </div>
  </div>
</div>

## Key Features

* **Logit Distillation**: Supports standard knowledge distillation using logits, primarily for same-architecture teacher/student pairs.
* **Pre-Computed Logits**: Train efficiently by generating teacher logits beforehand, saving memory during the student training phase (See [Generating Teacher Logits](./running-distillation#generating-teacher-logits-optional)).
* **PEFT Integration**: Leverages the `peft` library for efficient fine-tuning techniques like LoRA.
* <span title="4-bit Quantization" /> **Quantization Support**: Load base models in 4-bit precision using `bitsandbytes` for reduced memory footprint and potentially faster execution.
* **Distributed Training**: Integrates with Hugging Face `accelerate` and `deepspeed` for scaling training across multiple GPUs or nodes.
* **Flexible Loss Functions**: Includes various distillation loss options like Forward KL Divergence (`fkl`), Universal Logit Distillation (`uld`), and Multi-Level Optimal Transport (`multi-ot`).

## Getting Started

Ready to distill your own model? Head over to the [Installation](./installation) guide to set up the library and then follow the steps in [Running Distillation](./running-distillation) to run your first distillation job.

## Dive Deeper

Explore the **Essentials** section in the sidebar to understand the core components:

* **[Losses](./essentials/losses)**: Explore the available distillation loss functions.
* **[Configuration](./essentials/configuration)**: Learn how to set up your distillation runs.
* **[Formatters](./essentials/formatters)**: Learn how raw data is prepared for the models.
