About

Splitting a string into equal-length segments is a routine operation in data formatting, serial number generation, cryptographic key display, and transmission protocol compliance. A naive substring loop will silently break multi-byte Unicode sequences - emoji, CJK ideographs, and combined diacritics get severed into invalid code points. This tool chunks by grapheme-aware codepoint count using Array.from, so a chunk size of 4 always means four visible characters, not four bytes. It handles edge cases: chunk size larger than the string, chunk size of 1, and empty input. The last chunk may be shorter than n; padding options let you normalize it.

Practical applications include formatting credit card numbers (4-digit groups), displaying SHA-256 hashes (8-char blocks), preparing data for fixed-width file formats, and segmenting DNA/RNA sequences for readability. The tool approximates no biological or cryptographic function - it is a deterministic string slicer with zero data loss.

Formulas

The chunking operation is a deterministic partitioning of an ordered sequence of length L into segments of fixed width n.

k = ceil(Ln)

where k is the total number of chunks produced, L is the input string length in codepoints, and n is the chunk size. Each chunk C_i is extracted as:

C_i = S[i ⋅ n .. min(i ⋅ n + n, L)]

for i ∈ {0, 1, …, k − 1}. The last chunk has length r = L mod n. When r = 0, all chunks are uniform. When r ≠ 0 and padding is enabled, the last chunk is right-padded with the pad character to length n.

Reference Data

Use Case	Typical Chunk Size	Separator	Standard / Context
Credit Card Display	4	Space	ISO/IEC 7812
IBAN Formatting	4	Space	ISO 13616
MAC Address	2	Colon :	IEEE 802
IPv6 Address	4	Colon :	RFC 4291
SHA-256 Hash Display	8	Space	NIST FIPS 180-4
UUID Segments	8-4-4-4-12	Hyphen -	RFC 4122
Binary Octets	8	Space	Digital Logic
Hex Dump (Word)	4	Space	Memory Inspection
Hex Dump (DWord)	8	Space	Memory Inspection
DNA Codon Triplets	3	Space	Molecular Biology
RNA Codon Triplets	3	Space	Molecular Biology
Base64 Line Wrap	76	Newline	RFC 2045 (MIME)
PEM Certificate Lines	64	Newline	RFC 7468
Fixed-Width Data Field	Variable	None	COBOL / Mainframe
QR Code Data Segments	Variable	None	ISO/IEC 18004
Morse Code Groups	5	Space	ITU-R M.1677
NATO Message Groups	5	Space	ACP 131
Telephone Number (US)	3-3-4	Hyphen -	NANP
Serial Key (Software)	5	Hyphen -	Industry Convention
Barcode Data (EAN-13)	1-6-6	Space	GS1

Frequently Asked Questions

The tool uses Array.from() to split the input string by Unicode codepoints rather than UTF-16 code units. This means a chunk size of 4 will yield four visible characters even when the input contains emoji (which occupy two UTF-16 code units each) or CJK ideographs. Note: combined emoji sequences like 👨‍👩‍👧 (family) consist of multiple codepoints joined by ZWJ and will count as multiple characters, not one. True grapheme cluster segmentation requires the Intl.Segmenter API, which this tool falls back to when available.

The result is a single chunk containing the entire input string. If padding is enabled, that chunk is right-padded to the specified chunk size. No error is thrown - this is a valid degenerate case producing k = 1 chunk.

Yes. The separator field accepts any string including spaces, pipes, tabs (entered as literal tab or \t), and newlines. The separator is inserted between chunks in "Joined" and "Lines" output modes but is never part of the chunk data itself.

The tool enforces a soft limit of 1,000,000 characters to prevent browser tab freezing. For a 1 MB string with chunk size 4, the tool produces 250,000 chunks in under 50 ms on modern hardware. If you need to process larger payloads, consider a server-side script or a streaming approach.

When enabled, the last chunk is extended to exactly n characters by appending copies of the pad character (default: space, configurable to "0", "_", or any single character). This is useful for fixed-width record formats and mainframe data fields where every record must be identical length.

Yes. The chunking algorithm treats every codepoint - including \n, \r, \t, and spaces - as a character of width 1. A newline in position 3 of a chunk is preserved exactly there. If you want to strip whitespace before chunking, use the "Trim whitespace" option.