> ## Documentation Index > Fetch the complete documentation index at: https://distillkitplus.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Losses

Loss Type	Best For	Special Requirements
KL Divergence (fkl, kld)	Same tokenizer distillation	None
Universal Logit Distillation (uld)	Cross-tokenizer distillation	Requires teacher\_labels
Multi-Level Optimal Transport (multi-ot)	Cross-tokenizer distillation	Requires teacher\_labels, additional parameters

## References 1. **Distilling the Knowledge in a Neural Network**\ Geoffrey Hinton, Oriol Vinyals, Jeff Dean\ *arXiv preprint arXiv:1503.02531*, 2015.\ [https://arxiv.org/abs/1503.02531](https://arxiv.org/abs/1503.02531) 2. **Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs**\ Nicolas Boizard, Kevin El Haddad, Céline Hudelot, Pierre Colombo\ *arXiv preprint arXiv:2402.12030*, 2025.\ [https://arxiv.org/abs/2402.12030](https://arxiv.org/abs/2402.12030) 3. **Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models**\ Xiao Cui, Mo Zhu, Yulei Qin, Liang Xie, Wengang Zhou, Houqiang Li\ *arXiv preprint arXiv:2412.14528*, 2025.\ [https://arxiv.org/abs/2412.14528](https://arxiv.org/abs/2412.14528)