User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
0 lines
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Lists degrade. Copy-paste cycles introduce duplicate entries, inconsistent numbering, trailing whitespace, and phantom blank lines. A 200-line vendor list may contain 40 duplicates you cannot spot by eye. This tool applies deterministic compression: exact-match deduplication via hash set, regex-based numbering removal, and whitespace normalization. The output preserves original casing and order (unless sorting is enabled). Compression ratio R = 1 βˆ’ LoutLin is reported as a percentage. Note: deduplication is case-sensitive by default. Enable case-insensitive mode if your list mixes capitalization for identical entries.

compress list remove duplicates deduplicate list clean list trim list remove blank lines list formatter text compressor

Formulas

The compression ratio quantifies how much the list was reduced:

R = Lin βˆ’ LoutLin Γ— 100%

Where Lin = number of input lines, Lout = number of output lines. Deduplication uses a hash-set approach with O(n) time complexity. Each line is optionally normalized (lowercased, trimmed) before insertion into the set. The numbering strip regex pattern is: match(/^[\s]*(\d+[.)\-:]?\s*|[-*β€’Β·β–Ίβ–Έβ–Ή]\s*)/). This captures ordered markers (1., 2), 3-) and unordered bullets (-, *, β€’).

Reference Data

OperationDescriptionEffect on Line CountPreserves Order
Remove DuplicatesEliminates exact-match repeated linesReduces by duplicate countYes (keeps first occurrence)
Case-Insensitive DedupTreats "Apple" and "apple" as identicalReduces furtherYes (keeps first occurrence)
Remove Empty LinesStrips lines containing only whitespaceReduces by blank countYes
Trim WhitespaceRemoves leading/trailing spaces per lineNo changeYes
Strip NumberingRemoves leading numbers, dots, dashes, bulletsNo changeYes
Strip PrefixRemoves user-defined prefix from each lineNo changeYes
Strip SuffixRemoves user-defined suffix from each lineNo changeYes
Sort A→ZAlphabetical ascending sortNo changeNo (reordered)
Sort Z→AAlphabetical descending sortNo changeNo (reordered)
Reverse OrderFlips the entire list upside downNo changeNo (reversed)
Collapse Blank LinesReplaces consecutive blanks with oneSlight reductionYes
Remove Short LinesRemoves lines below character thresholdReducesYes

Frequently Asked Questions

Yes. When case-insensitive mode is enabled, the tool keeps the first occurrence of each unique line in its original casing. Subsequent duplicates (regardless of case) are discarded. For example, if your list contains "Apple", "APPLE", "apple" in that order, only "Apple" (the first) is retained.
Operations are applied sequentially: trim and strip run first, then deduplication checks the resulting lines. So if "1. Banana" and "2. Banana" both become "Banana" after numbering is stripped, deduplication will keep only the first occurrence. This is the expected behavior for cleaning numbered lists with repeated entries.
The tool runs entirely in your browser. Practical limits depend on your device's RAM. Lists up to 100,000 lines (roughly 5 MB of text) process in under 1 second on modern hardware. Beyond 500,000 lines, you may notice a brief delay. There is no server-side limit since no data leaves your browser.
This tool operates on line-delimited lists (one item per line). If your CSV has one column, paste it directly. For multi-column CSVs, extract the target column first. Tab-separated single-column data works as-is since each row is treated as one line.
The default sort is lexicographic (alphabetical), meaning "9" sorts after "10" because character-by-character comparison puts "9" > "1". For purely numeric lists, the tool detects numeric-only lines and applies natural numeric sorting automatically, placing 2 before 10 before 100.
When enabled, any line with fewer characters than the specified threshold (after trimming) is removed. A threshold of 2 removes single-character lines and empty lines. This is useful for cleaning lists with stray punctuation marks or single-letter artifacts from copy-paste operations.