About

Miscounting characters costs real money. API payloads truncate at byte limits, not character limits. SMS billing segments at 160 chars (GSM-7) or 70 chars (UCS-2 for Unicode). Meta descriptions get clipped beyond roughly 155 characters. Database VARCHAR columns silently truncate inserts. This tool computes 15+ metrics from raw text: characters (with and without whitespace), words, sentences, paragraphs, lines, unique words, average word length, reading time at 238 wpm (Brysbaert, 2019), speaking time at 150 wpm, and UTF-8 byte size. All calculations run locally in-browser with zero server round-trips.

Limitations: sentence detection uses punctuation heuristics and will miscount abbreviations like "U.S.A." as multiple sentences. Reading time assumes adult silent reading of English prose. Byte size reflects UTF-8 encoding only. For CJK text, word segmentation is approximate since Chinese and Japanese lack whitespace delimiters. The tool treats any whitespace-separated token as a word.

Formulas

Character count returns the full Unicode-aware length of the string. Characters without spaces strips all whitespace classes before counting.

chars = len(text)

chars_no-space = len(replace(text, /\s/g, ""))

Word count splits on whitespace boundaries and filters empty tokens.

words = | split(text, /\s+/) |

Reading time divides word count by average adult silent reading speed.

t_read = words238 wpm

Speaking time uses conversational pace.

t_speak = words150 wpm

Byte size is computed via UTF-8 encoding using the Blob API.

bytes = Blob([text]).size

Where text = raw input string, chars = total character count, words = total word count, t_read = estimated reading time in minutes, t_speak = estimated speaking time in minutes, bytes = UTF-8 encoded byte size.

Reference Data

Platform / Context	Limit	Unit	Consequence of Exceeding
Twitter / X Post	280	chars	Post rejected
SMS (GSM-7)	160	chars	Split into multiple segments, doubled cost
SMS (UCS-2 / Unicode)	70	chars	Split into multiple segments
Google Meta Title	60	chars	Truncated with ellipsis in SERP
Google Meta Description	155	chars	Truncated, reduced CTR
Instagram Caption	2200	chars	Truncated after ~125 visible
YouTube Title	100	chars	Truncated at ~70 in search
LinkedIn Post	3000	chars	Post rejected
Facebook Post	63206	chars	Post rejected
Reddit Title	300	chars	Title rejected
Pinterest Pin Description	500	chars	Truncated
Slack Message	40000	chars	Message rejected
Email Subject Line (optimal)	50	chars	Clipped on mobile clients
MySQL VARCHAR max	65535	bytes	Silent truncation or error
PostgreSQL TEXT	1	GB	Performance degradation
JSON Web Token (URL)	8192	bytes	HTTP 414 URI Too Long
Average Reading Speed (adult)	238	wpm	Brysbaert 2019 meta-analysis
Average Speaking Speed	150	wpm	Conversational English pace
TikTok Caption	2200	chars	Truncated
WhatsApp Message	65536	chars	Message rejected
Push Notification (iOS)	178	chars	Truncated on lock screen

Frequently Asked Questions

Characters and bytes are different units. ASCII characters (English letters, digits) use 1 byte each in UTF-8. Accented characters (é, ñ) use 2 bytes. CJK characters (中, 日) use 3 bytes. Emoji (😀) use 4 bytes. A string of 10 emoji is 10 characters but 40 bytes. This matters for database VARCHAR columns defined in bytes, API payload limits, and network transfer costs.

The sentence counter splits on terminal punctuation (.!?) followed by whitespace or end-of-string. Abbreviations like "U.S.A." or "Dr. Smith" may inflate the count because each period followed by a space registers as a sentence boundary. For formal prose without heavy abbreviation use, accuracy is typically above 95%. For technical or legal text with many abbreviations, treat the count as an approximation.

The 238 wpm rate is derived from Brysbaert's 2019 meta-analysis of English silent reading. Other languages differ: Finnish averages around 240 wpm, Arabic around 181 wpm, and Chinese around 260 characters per minute (not words). For non-English text, use the word count and divide by the appropriate rate for your language.

This tool splits on whitespace. Chinese and Japanese do not use spaces between words, so an entire sentence without spaces counts as one token. For accurate CJK word segmentation, specialized tokenizers (like MeCab for Japanese or jieba for Chinese) are required. The character count and byte size metrics remain accurate regardless of language.

A paragraph is defined as a block of text separated by two or more consecutive newline characters (a blank line). A single line break does not start a new paragraph. This matches the convention used in Markdown, most word processors, and HTML rendering. If your text uses single line breaks between paragraphs, the count will read as 1.

All metrics return zero for empty or whitespace-only input. The tool does not count spaces, tabs, or newlines as words. Character count will still reflect whitespace characters present, but word count, sentence count, and paragraph count will be zero.

Yes. A 160-character SMS using emoji could be 640 bytes in UTF-8. A 100-character string in Chinese is 300 bytes. Always verify byte size independently of character count when working with byte-limited systems like SMS gateways, HTTP headers (8 KB limit), URL parameters, or binary protocols.