Available Formatters (dataset.format_function)

Formatter NameDescriptionInput FormatOutput Format
default_formatStandard chat formatExamples with messages containing chat turns (role and content)Formatted chat using the tokenizer’s template
sharegpt_formatFor ShareGPT-style dataData with conversations containing turns (from: human/gpt/system, value)Standard chat format with system message if missing
comparison_formatFor comparing two responsesData with prompt, response_a, response_b, rationale, and winnerStructured chat showing comparison and result
format_for_tokenizationSimple text extractionAny dataJust the text content or full example if no text field

Notes:

  • All formatters except format_for_tokenization use the tokenizer’s chat template
  • default_format is used if an unknown formatter is specified