Calculate String Length
Calculate string length in characters, code points, grapheme clusters, bytes, words, and lines. Supports Unicode, emoji, and surrogate pairs.
About
JavaScript's String.length returns UTF-16 code units, not visible characters. A single emoji like π¨βπ©βπ§βπ¦ reports 11 from .length but renders as 1 grapheme cluster. Database column limits, SMS segments (160 GSM-7 characters or 70 UCS-2), and API payload caps operate on different length definitions. Confusing code units with bytes or graphemes causes silent data truncation, broken emoji rendering, and failed validation. This tool computes all five canonical string metrics simultaneously: UTF-16 code units, Unicode code points, grapheme clusters, UTF-8 byte length, and word/line counts.
The grapheme segmentation uses the Intl.Segmenter API (UAX #29) when available, which correctly handles zero-width joiners, combining marks, and regional indicator sequences. UTF-8 byte count is derived via Blob constructor, matching what your server receives over HTTP. Note: results assume well-formed input. Lone surrogates (unpaired UTF-16 halves) produce implementation-defined byte counts.
Formulas
Five distinct length metrics are computed from an input string S:
Returns the number of UTF-16 code units. Surrogate pairs (code points above U+FFFF) count as 2.
The spread operator iterates by Unicode code point, correctly counting astral plane characters (emoji, rare CJK) as 1 each.
UAX #29 segmentation. A grapheme cluster is the smallest user-perceived character unit. Emoji sequences joined with Zero-Width Joiners (U+200D) and characters with combining marks each count as 1.
UTF-8 encoded byte length. ASCII characters use 1 byte. Most Latin/Cyrillic use 2. CJK and common symbols use 3. Emoji and astral characters use 4.
Where Lutf16 = UTF-16 code unit count, Lcp = Unicode code point count, Lgrapheme = grapheme cluster count (visual characters), Lbytes = UTF-8 byte size, Lwords = whitespace-delimited token count.
Reference Data
| Character / Sequence | Visual | UTF-16 Units | Code Points | Graphemes | UTF-8 Bytes |
|---|---|---|---|---|---|
| Latin letter A | A | 1 | 1 | 1 | 1 |
| Euro sign | β¬ | 1 | 1 | 1 | 3 |
| CJK ideograph (water) | ζ°΄ | 1 | 1 | 1 | 3 |
| Emoji (grinning face) | π | 2 | 1 | 1 | 4 |
| Emoji (flag US) | πΊπΈ | 4 | 2 | 1 | 8 |
| Family emoji (ZWJ) | π¨βπ©βπ§βπ¦ | 11 | 7 | 1 | 25 |
| e + combining acute | Γ© (2 CP) | 2 | 2 | 1 | 3 |
| Precomposed Γ© | Γ© (1 CP) | 1 | 1 | 1 | 2 |
| Hindi syllable ΰ€ΰ₯ΰ€·ΰ€Ώ | ΰ€ΰ₯ΰ€·ΰ€Ώ | 4 | 4 | 1 | 12 |
| Korean syllable ν | ν | 1 | 1 | 1 | 3 |
| Musical symbol π | π | 2 | 1 | 1 | 4 |
| Newline (LF) | \n | 1 | 1 | 1 | 1 |
| Tab | \t | 1 | 1 | 1 | 1 |
| Empty string | (none) | 0 | 0 | 0 | 0 |
| Space | (space) | 1 | 1 | 1 | 1 |
| Skin-toned emoji ππ½ | ππ½ | 4 | 2 | 1 | 8 |
| Keycap sequence 1οΈβ£ | 1οΈβ£ | 3 | 3 | 1 | 7 |
| Zalgo text (10 combiners) | αΊΜ΄Μ’ΜΜ«Μ€ΜΜΜΜΜΜ | 11 | 11 | 1 | 25 |