Essentials
Formatters
Available Formatters (dataset.format_function
)
Formatter Name | Description | Input Format | Output Format |
---|---|---|---|
default_format | Standard chat format | Examples with messages containing chat turns (role and content) | Formatted chat using the tokenizer’s template |
sharegpt_format | For ShareGPT-style data | Data with conversations containing turns (from: human/gpt/system, value) | Standard chat format with system message if missing |
comparison_format | For comparing two responses | Data with prompt , response_a , response_b , rationale , and winner | Structured chat showing comparison and result |
format_for_tokenization | Simple text extraction | Any data | Just the text content or full example if no text field |
Notes:
- All formatters except
format_for_tokenization
use the tokenizer’s chat template default_format
is used if an unknown formatter is specified