> ## Documentation Index
> Fetch the complete documentation index at: https://distillkitplus.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Losses

<div className="overflow-x-auto">
  <table className="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
    <thead>
      <tr>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Loss Type</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Best For</th>
        <th className="px-4 py-3 text-left text-sm font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider">Special Requirements</th>
      </tr>
    </thead>

    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="px-4 py-3 text-sm text-gray-900 dark:text-gray-100">KL Divergence (fkl, kld)</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Same tokenizer distillation</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">None</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm text-gray-900 dark:text-gray-100">Universal Logit Distillation (uld)</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Cross-tokenizer distillation</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Requires teacher\_labels</td>
      </tr>

      <tr>
        <td className="px-4 py-3 text-sm text-gray-900 dark:text-gray-100">Multi-Level Optimal Transport (multi-ot)</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Cross-tokenizer distillation</td>
        <td className="px-4 py-3 text-sm text-gray-600 dark:text-gray-300">Requires teacher\_labels, additional parameters</td>
      </tr>
    </tbody>
  </table>
</div>

## References

1. **Distilling the Knowledge in a Neural Network**\
   Geoffrey Hinton, Oriol Vinyals, Jeff Dean\
   *arXiv preprint arXiv:1503.02531*, 2015.\
   [https://arxiv.org/abs/1503.02531](https://arxiv.org/abs/1503.02531)

2. **Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs**\
   Nicolas Boizard, Kevin El Haddad, Céline Hudelot, Pierre Colombo\
   *arXiv preprint arXiv:2402.12030*, 2025.\
   [https://arxiv.org/abs/2402.12030](https://arxiv.org/abs/2402.12030)

3. **Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models**\
   Xiao Cui, Mo Zhu, Yulei Qin, Liang Xie, Wengang Zhou, Houqiang Li\
   *arXiv preprint arXiv:2412.14528*, 2025.\
   [https://arxiv.org/abs/2412.14528](https://arxiv.org/abs/2412.14528)
