About

TSV (Tab-Separated Values) files encode tabular data using the horizontal tab character (U+0009) as a field delimiter and a newline (U+000A) as a record terminator. Miscounting entries in a TSV file leads to silent data truncation during database imports, broken ETL pipelines, and corrupted analytics reports. A stray trailing newline inflates the row count by 1; an inconsistent column count across rows signals corrupted records that most naive parsers ignore. This tool parses raw TSV input, counts total rows, non-empty rows, columns, and individual cells, and flags structural anomalies such as ragged rows where the column count deviates from the header. It assumes the first non-empty row defines the column schema. No data leaves your browser.

Formulas

TSV entry counting relies on deterministic string splitting. The total line count is derived from splitting the input on the normalized newline character:

totalLines = split(input, \n).length

Non-empty rows are filtered by a non-whitespace test:

nonEmpty = totalLines − emptyRows

The column count is extracted from the header (first non-empty line):

cols = split(headerRow, \t).length

Total cell count across all non-empty rows:

totalCells = N∑i=1 split(row_i, \t).length

A row is classified as ragged when its field count deviates from the header column count:

isRagged(row) = split(row, \t).length ≠ cols

Where N = number of non-empty rows, cols = header-derived column count, \t = horizontal tab character (U+0009).

Reference Data

Metric	Description	Typical Range
Total Lines	All lines including empty trailing lines	1 - 10⁶
Non-Empty Rows	Lines with at least one non-whitespace character	1 - 10⁶
Empty Rows	Lines containing only whitespace or nothing	0 - 100
Data Rows	Non-empty rows excluding the header row	0 - 10⁶
Columns (from header)	Tab-delimited fields in the first non-empty row	1 - 500
Total Cells	Columns × non-empty rows	1 - 10⁸
Filled Cells	Cells containing at least one non-whitespace character	Varies
Empty Cells	Cells that are blank or whitespace-only	Varies
Ragged Rows	Rows whose column count ≠ header column count	0 (ideal)
Duplicate Rows	Rows with identical content to a previous row	Varies
Max Row Length	Highest number of fields in any single row	1 - 1000
Min Row Length	Lowest number of fields in any non-empty row	1 - 1000
Delimiter	TSV uses horizontal tab U+0009	Fixed
Line Ending	LF (\n), CR+LF (\r\n), or CR (\r)	Platform-dependent
File Size Limit (browser)	Practical limit for in-memory text processing	< 100 MB

Frequently Asked Questions

The parser normalizes line endings (CR, LF, CR+LF) to LF, then splits on LF. Trailing empty strings produced by a final newline are counted as empty rows and excluded from the non-empty row metric. The "Total Lines" metric includes them for transparency, so you can spot the discrepancy.

The tool reports the number of ragged rows - rows where the tab-delimited field count does not equal the header row's field count. It also shows the minimum and maximum row lengths across all non-empty rows so you can identify structural inconsistencies before importing into a database.

Yes. The first non-empty row is treated as the header. The "Data Rows" metric equals non-empty rows minus 1 (the header). If your TSV has no header, interpret "Data Rows" as total records minus 1 and add 1 back manually.

TSV, unlike CSV, does not define a quoting mechanism in its specification (IANA text/tab-separated-values). Fields containing literal tabs or newlines violate the TSV format. This tool splits strictly on tab and newline characters. If your data uses CSV-style quoting, convert it to properly escaped TSV first.

Processing occurs entirely in your browser's memory. Files under 50 MB parse in under 2 seconds on modern hardware. Files between 50-100 MB may cause brief UI pauses. Files exceeding 100 MB risk browser memory limits. For very large datasets, consider command-line tools like wc or awk.

Each non-empty row's raw string (untrimmed) is hashed into a Set. If a row string has been seen before, it is counted as a duplicate. Whitespace differences (e.g., trailing spaces within a cell) make rows distinct. The duplicate count excludes the first occurrence - it counts only the extra copies.