CSV to Cucumber Data Table Converter
Convert CSV data into perfectly aligned Cucumber-style data tables for BDD testing. Supports quoted fields, auto-delimiter detection, and column alignment.
About
Cucumber data tables follow a strict pipe-delimited format defined by the Gherkin syntax specification. Manual conversion from CSV introduces alignment errors, broken quoting, and inconsistent column widths that cause step definition failures at runtime. This tool parses CSV input using an RFC 4180-compliant finite state machine, correctly handling double-quoted fields, escaped quotes (""), embedded newlines, and mixed delimiters. It auto-detects whether your CSV uses commas, semicolons, or tabs, then outputs a column-aligned Cucumber DataTable where every pipe character sits in a vertical line. The parser processes each character in O(n) time with no backtracking.
Note: this tool assumes well-formed CSV. Malformed input with mismatched quotes will be handled gracefully by treating unmatched quotes as literal characters, but the result may not match your intent. Pro tip: if your CSV originates from Excel on European locales, expect semicolon delimiters rather than commas. The auto-detection handles this, but verify the output on the first run.
Formulas
The conversion follows a two-phase pipeline: parse, then format.
Where csv is the raw input string, R is a two-dimensional array of m rows and n columns. The parser operates as a finite state machine with three states: FIELD_START, UNQUOTED, and QUOTED.
For each column j, compute the maximum cell width wj. Then each cell is right-padded with spaces to width wj.
Where pad is a left-aligned space-fill function. Any literal pipe character | inside a cell value is escaped to \| per Gherkin specification to prevent parser ambiguity. Time complexity is O(m ⋅ n) for both phases.
Reference Data
| CSV Feature | RFC 4180 Rule | This Tool |
|---|---|---|
| Comma delimiter | Default separator | Auto-detected |
| Semicolon delimiter | Not in spec (locale variant) | Auto-detected |
| Tab delimiter | Not in spec (TSV variant) | Auto-detected |
| Double-quoted fields | Fields MAY be enclosed in " | Fully supported |
| Escaped quotes | "" within quoted field | Unescaped to single " |
| Embedded newlines | Allowed inside quoted fields | Preserved as space in output |
| Trailing CRLF | Optional on last record | Trimmed |
| Empty fields | Allowed (,,) | Rendered as empty padded cell |
| Header row | Optional first record | Treated as first data row |
| Whitespace padding | Significant inside quotes | Trimmed unless quoted |
| BOM (Byte Order Mark) | Not addressed | Stripped if present |
| Mixed line endings | CRLF required | Accepts CR, LF, or CRLF |
| Column count mismatch | Should be uniform | Pads short rows with empty cells |
| Cucumber pipe escaping | N/A (Gherkin spec) | Literal | escaped to \| |
| Max columns | No limit | No limit |
| Max rows | No limit | Tested to 50000 rows |