User Rating 0.0
Total Usage 0 times
Drop CSV file here
Is this tool helpful?

Your feedback helps us improve.

About

Cucumber data tables follow a strict pipe-delimited format defined by the Gherkin syntax specification. Manual conversion from CSV introduces alignment errors, broken quoting, and inconsistent column widths that cause step definition failures at runtime. This tool parses CSV input using an RFC 4180-compliant finite state machine, correctly handling double-quoted fields, escaped quotes (""), embedded newlines, and mixed delimiters. It auto-detects whether your CSV uses commas, semicolons, or tabs, then outputs a column-aligned Cucumber DataTable where every pipe character sits in a vertical line. The parser processes each character in O(n) time with no backtracking.

Note: this tool assumes well-formed CSV. Malformed input with mismatched quotes will be handled gracefully by treating unmatched quotes as literal characters, but the result may not match your intent. Pro tip: if your CSV originates from Excel on European locales, expect semicolon delimiters rather than commas. The auto-detection handles this, but verify the output on the first run.

csv cucumber data table bdd gherkin converter testing automation

Formulas

The conversion follows a two-phase pipeline: parse, then format.

parse(csv) Rm×n

Where csv is the raw input string, R is a two-dimensional array of m rows and n columns. The parser operates as a finite state machine with three states: FIELD_START, UNQUOTED, and QUOTED.

wj = maxi=0..m1 len(Ri,j)

For each column j, compute the maximum cell width wj. Then each cell is right-padded with spaces to width wj.

row(i) = | Ri,0.pad(w0) | Ri,1.pad(w1) ||

Where pad is a left-aligned space-fill function. Any literal pipe character | inside a cell value is escaped to \| per Gherkin specification to prevent parser ambiguity. Time complexity is O(m n) for both phases.

Reference Data

CSV FeatureRFC 4180 RuleThis Tool
Comma delimiterDefault separatorAuto-detected
Semicolon delimiterNot in spec (locale variant)Auto-detected
Tab delimiterNot in spec (TSV variant)Auto-detected
Double-quoted fieldsFields MAY be enclosed in "Fully supported
Escaped quotes"" within quoted fieldUnescaped to single "
Embedded newlinesAllowed inside quoted fieldsPreserved as space in output
Trailing CRLFOptional on last recordTrimmed
Empty fieldsAllowed (,,)Rendered as empty padded cell
Header rowOptional first recordTreated as first data row
Whitespace paddingSignificant inside quotesTrimmed unless quoted
BOM (Byte Order Mark)Not addressedStripped if present
Mixed line endingsCRLF requiredAccepts CR, LF, or CRLF
Column count mismatchShould be uniformPads short rows with empty cells
Cucumber pipe escapingN/A (Gherkin spec)Literal | escaped to \|
Max columnsNo limitNo limit
Max rowsNo limitTested to 50000 rows

Frequently Asked Questions

The parser counts occurrences of comma, semicolon, and tab characters in the first 5 lines of unquoted text. The character with the highest consistent count across those lines is selected as the delimiter. If all counts are zero or tied, comma is used as the RFC 4180 default.
Per RFC 4180, newlines within double-quoted fields are part of the field value, not record separators. This tool replaces embedded newlines with a single space in the Cucumber output, because Gherkin data tables do not support multi-line cell values.
Literal pipe characters (|) are escaped to \| in the output. The Gherkin parser interprets unescaped pipes as column delimiters, so failing to escape them would break the data table structure and cause step definition binding errors.
All rows are output identically. In Cucumber, the first row of a data table is conventionally treated as a header by step definitions, but the table format itself does not distinguish headers. The tool preserves row order exactly as provided.
The tool normalizes all rows to the length of the longest row. Short rows are padded with empty cells on the right. This prevents alignment errors and ensures the Cucumber parser does not reject the table.
Yes. The parser strips the UTF-8 BOM (byte order mark, U+FEFF) if present at position 0 of the input. Excel on Windows adds this by default when saving as "CSV UTF-8". Failing to strip it would corrupt the first cell value.