About

Automated line breaking at a fixed character width is a fundamental operation in text processing, used in code formatting, SMS segmentation, subtitle timing, and legacy system integration where field widths are rigid. Getting it wrong introduces orphaned words, mid-word splits in user-facing content, or silent data truncation in fixed-width database columns. This tool splits input text at a configurable character limit L using either a hard break (exact slice at position L) or a soft break that scans backward to the nearest whitespace boundary to preserve whole words. It handles edge cases: single words exceeding the limit are force-split, existing newlines are respected, and trailing whitespace is trimmed per line. The output is not approximated - each line is guaranteed to be ≤ L characters.

Formulas

The hard break algorithm partitions a string S of length n into segments of at most L characters:

line_i = S[i ⋅ L .. min((i + 1) ⋅ L, n)]

The total number of lines produced:

k = ⌈nL⌉

The soft (word-aware) break modifies this by scanning backward from position i ⋅ L to find the last whitespace character. Let j be that index. If j > start of current segment, break at j. Otherwise, fall back to hard break at L (forced split for words longer than the limit).

{

break at j if j > startbreak at start + L otherwise (forced)

Where S is the input string, L is the character limit per line, n = length of S, i is the line index, j is the backward-scanned whitespace position, and start is the starting index of the current segment.

Reference Data

Context	Typical Char Limit	Break Mode	Notes
SMS (GSM 7-bit)	160	Hard	Multipart splits at exact boundary
SMS (Unicode/UCS-2)	70	Hard	Emoji & non-Latin text
Twitter / X post	280	Soft	Character count includes spaces
Terminal width (standard)	80	Soft	POSIX tradition since VT100
Terminal width (wide)	120	Soft	Modern widescreen terminals
Email body (RFC 5322)	78	Soft	SHOULD limit; 998 MUST limit
Git commit subject	50	Soft	Convention, not enforced
Git commit body	72	Soft	Wraps cleanly in git log
Subtitle (SRT/VTT)	42	Soft	Per line, max 2 lines per cue
COBOL record	80	Hard	Fixed-width card format
Mainframe fixed field	132	Hard	Line printer width
PEP 8 (Python)	79	Soft	Max line length recommendation
Google Java Style	100	Soft	Column limit
Markdown readability	80	Soft	Common convention
Facebook post preview	477	Soft	Before “See more” truncation
Instagram caption	2200	Soft	Full limit; preview ~125
Push notification (iOS)	178	Hard	Lock screen display limit
Push notification (Android)	240	Hard	Varies by launcher
Meta title (SEO)	60	Soft	Google truncates beyond this
Meta description (SEO)	160	Soft	Optimal snippet length

Frequently Asked Questions

In soft (word-aware) mode, the algorithm scans backward for whitespace. If no whitespace exists within the segment - meaning the entire segment is one continuous word - it falls back to a hard break at exactly position L, splitting the word. This guarantees every output line respects the limit. If preserving whole words is critical, increase L to accommodate your longest token.

Existing newlines (\n) in the source text are treated as explicit break points. The text is first split on existing newlines, then each resulting paragraph is independently wrapped at the character limit. A newline character itself is not counted toward the line length. Carriage returns (\r) are normalized to \n before processing.

Trailing whitespace at the break point is trimmed from the end of the current line, and leading whitespace is trimmed from the start of the next line. This prevents lines that begin or end with invisible space characters, which would waste characters against the limit and cause misalignment in fixed-width displays.

The tool counts JavaScript string length, which uses UTF-16 code units. Standard characters (Latin, Cyrillic, CJK) count as 1. Emoji and characters outside the Basic Multilingual Plane (astral plane, U+10000 and above) count as 2 because they require a surrogate pair. For SMS segmentation using GSM 7-bit encoding, certain characters (like braces and backslash) consume 2 slots in the GSM character set - this tool does not apply GSM-specific counting.

String slicing and whitespace scanning are O(n) operations. The tool processes texts up to approximately 500,000 characters without noticeable delay in modern browsers. Beyond that, the DOM rendering of the output becomes the bottleneck rather than the algorithm itself. For texts exceeding 1,000,000 characters, consider processing in chunks or using a command-line tool like fold on Unix systems.

CSS word-wrap (overflow-wrap) is a rendering-time property - it does not modify the underlying string, so copying the text yields the original unwrapped version. The Unix fold -s -w 80 command performs the same soft-break logic as this tool but requires terminal access. This tool produces the actual broken text as a new string that you can copy, download, or paste into systems that do not support dynamic wrapping (fixed-width database fields, plain-text emails, subtitle files).