Loss Type | Best For | Special Requirements |
---|---|---|
KL Divergence (fkl, kld) | Same tokenizer distillation | None |
Universal Logit Distillation (uld) | Cross-tokenizer distillation | Requires teacher_labels |
Multi-Level Optimal Transport (multi-ot) | Cross-tokenizer distillation | Requires teacher_labels, additional parameters |