User Rating 0.0 ★★★★★

Total Usage 0 times

Category Text Formatting

Paste Your List

0 lines

Remove Duplicates Case-Insensitive Trim Whitespace Remove Empty Lines Strip Numbering / Bullets Reverse Order

Sort

Min Line Length

Strip Prefix

Strip Suffix

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Lists degrade. Copy-paste cycles introduce duplicate entries, inconsistent numbering, trailing whitespace, and phantom blank lines. A 200-line vendor list may contain 40 duplicates you cannot spot by eye. This tool applies deterministic compression: exact-match deduplication via hash set, regex-based numbering removal, and whitespace normalization. The output preserves original casing and order (unless sorting is enabled). Compression ratio R = 1 − L_outL_in is reported as a percentage. Note: deduplication is case-sensitive by default. Enable case-insensitive mode if your list mixes capitalization for identical entries.

Formulas

The compression ratio quantifies how much the list was reduced:

R = L_in − L_outL_in × 100%

Where L_in = number of input lines, L_out = number of output lines. Deduplication uses a hash-set approach with O(n) time complexity. Each line is optionally normalized (lowercased, trimmed) before insertion into the set. The numbering strip regex pattern is: match(/^[\s]*(\d+[.)\-:]?\s*|[-*•·►▸▹]\s*)/). This captures ordered markers (1., 2), 3-) and unordered bullets (-, *, •).

Reference Data

Operation	Description	Effect on Line Count	Preserves Order
Remove Duplicates	Eliminates exact-match repeated lines	Reduces by duplicate count	Yes (keeps first occurrence)
Case-Insensitive Dedup	Treats "Apple" and "apple" as identical	Reduces further	Yes (keeps first occurrence)
Remove Empty Lines	Strips lines containing only whitespace	Reduces by blank count	Yes
Trim Whitespace	Removes leading/trailing spaces per line	No change	Yes
Strip Numbering	Removes leading numbers, dots, dashes, bullets	No change	Yes
Strip Prefix	Removes user-defined prefix from each line	No change	Yes
Strip Suffix	Removes user-defined suffix from each line	No change	Yes
Sort A→Z	Alphabetical ascending sort	No change	No (reordered)
Sort Z→A	Alphabetical descending sort	No change	No (reordered)
Reverse Order	Flips the entire list upside down	No change	No (reversed)
Collapse Blank Lines	Replaces consecutive blanks with one	Slight reduction	Yes
Remove Short Lines	Removes lines below character threshold	Reduces	Yes

Frequently Asked Questions

Yes. When case-insensitive mode is enabled, the tool keeps the first occurrence of each unique line in its original casing. Subsequent duplicates (regardless of case) are discarded. For example, if your list contains "Apple", "APPLE", "apple" in that order, only "Apple" (the first) is retained.

Operations are applied sequentially: trim and strip run first, then deduplication checks the resulting lines. So if "1. Banana" and "2. Banana" both become "Banana" after numbering is stripped, deduplication will keep only the first occurrence. This is the expected behavior for cleaning numbered lists with repeated entries.

The tool runs entirely in your browser. Practical limits depend on your device's RAM. Lists up to 100,000 lines (roughly 5 MB of text) process in under 1 second on modern hardware. Beyond 500,000 lines, you may notice a brief delay. There is no server-side limit since no data leaves your browser.

This tool operates on line-delimited lists (one item per line). If your CSV has one column, paste it directly. For multi-column CSVs, extract the target column first. Tab-separated single-column data works as-is since each row is treated as one line.

The default sort is lexicographic (alphabetical), meaning "9" sorts after "10" because character-by-character comparison puts "9" > "1". For purely numeric lists, the tool detects numeric-only lines and applies natural numeric sorting automatically, placing 2 before 10 before 100.

When enabled, any line with fewer characters than the specified threshold (after trimming) is removed. A threshold of 2 removes single-character lines and empty lines. This is useful for cleaning lists with stray punctuation marks or single-letter artifacts from copy-paste operations.