AI vs Human Text Analyzer - Detect AI-Generated Content
Analyze text to detect whether it was written by AI or a human. Uses linguistic metrics like burstiness, TTR, and readability scoring.
About
Large language models produce text with statistically detectable patterns. The average sentence length variance (ฯlen) in AI-generated prose is typically 30 - 50% lower than in human writing. AI models optimize for coherence, which flattens the natural "burstiness" of human cognition - the tendency to alternate between short, punchy sentences and longer, complex ones. This tool quantifies that signal alongside 8 other linguistic features: Type-Token Ratio, hapax legomena density, transition word overuse, punctuation diversity, syllable distribution, readability index, and n-gram repetition patterns. Each metric is scored independently, then combined into a weighted composite.
No detector is infallible. Heavily edited AI text or formulaic human writing (legal documents, technical manuals) can produce false results. The composite score assumes standard English prose of at least 50 words. Accuracy degrades below that threshold. Mixed-origin text (human-written with AI-assisted edits) will score in the ambiguous middle range, which is the correct result. Treat scores below 30 or above 70 as directional signals, not verdicts.
Formulas
The composite human-likeness score H is computed as a weighted sum of normalized sub-scores:
Where si is the normalized score (range 0 - 1) for each metric and wi is its weight such that nโi=1 wi = 1.
The Burstiness Index B is defined as:
Where ฯlen2 is the variance of sentence lengths and is the mean sentence length in words.
The Flesch Reading Ease score:
Where W = total words, S = total sentences, Y = total syllables.
Type-Token Ratio:
Where |V| is the count of unique word types and N is the total number of word tokens.
Reference Data
| Metric | What It Measures | Typical Human Range | Typical AI Range | Weight |
|---|---|---|---|---|
| Sentence Length Variance (ฯlen) | Standard deviation of words per sentence | 6 - 14 | 2 - 6 | 20% |
| Burstiness Index | Ratio of variance to mean in sentence lengths | 0.5 - 1.5 | 0.1 - 0.4 | 15% |
| Type-Token Ratio (TTR) | Lexical diversity: unique words รท total words | 0.55 - 0.80 | 0.40 - 0.60 | 12% |
| Hapax Legomena Ratio | Words used exactly once รท total words | 0.40 - 0.65 | 0.25 - 0.40 | 10% |
| Transition Word Density | Discourse markers per 100 words | 0.5 - 2.0 | 2.5 - 5.0 | 12% |
| Punctuation Diversity | Unique punctuation types รท total punctuation | 0.30 - 0.70 | 0.15 - 0.30 | 8% |
| Flesch Reading Ease | Readability based on syllables and sentence length | 40 - 80 (varies) | 50 - 65 (narrow band) | 8% |
| Readability Variance | Per-sentence Flesch score standard deviation | 15 - 35 | 5 - 15 | 8% |
| Bigram Repetition Rate | Repeated word pairs รท total bigrams | 0.02 - 0.08 | 0.08 - 0.18 | 7% |
| Average Syllables per Word | Vocabulary complexity indicator | 1.3 - 1.8 | 1.5 - 1.7 (consistent) | - |
| Paragraph Length Variance | Consistency of paragraph sizing | High (irregular) | Low (uniform) | - |
| Contraction Usage Rate | Contractions per 100 words | 1.0 - 4.0 | 0.1 - 0.8 | - |
| Unique Sentence Starters | Variety in first words of sentences | 60 - 90% | 35 - 55% | - |