AI Content Detector (Heuristic Analysis)
Advanced client-side text analyzer using statistical heuristics (burstiness, perplexity proxies) and n-gram pattern matching to identify AI-generated content patterns.
About
This tool utilizes statistical linguistics and pattern recognition to evaluate text for characteristics common to Large Language Models (LLMs). Unlike human writing, which typically exhibits high entropy and distinct structural variance (burstiness), AI models tend to converge on probabilistic averages, resulting in smoother, more monotonous sentence structures.
The analysis relies on three core vectors: Perplexity Proxy (vocabulary complexity), Burstiness (sentence length variance), and Semantic Fingerprinting (identifying overused AI transition phrases). By measuring the deviation from natural human chaos, we can estimate the likelihood of machine generation without sending data to external servers.
Formulas
The core heuristic for detecting "robotic" uniformity is the Standard Deviation of sentence lengths, often referred to as Burstiness.
Where xi is the length of sentence i, is the average sentence length, and N is the total number of sentences.
Readability is calculated using the Flesch-Kincaid Grade Level formula:
Reference Data
| Metric | Definition | AI Characteristic | Human Characteristic |
|---|---|---|---|
| Burstiness (σ) | Standard Deviation of Sentence Lengths | Low (< 5 words). Consistent, rhythmic structure. | High (> 10 words). Chaotic mix of short/long sentences. |
| Perplexity Proxy | Vocabulary Entropy & Rare Word Density | Low. Prefers high-probability tokens. | High. Uses unexpected synonyms and idioms. |
| Repetition Rate | N-gram density (3-grams) | High repetition of structural phrases. | Low repetition, more organic flow. |
| Transition Density | Frequency of conjunctive adverbs | High (e.g., "Moreover", "Furthermore"). | Moderate to Low. |
| Sentence Length (Avg) | Mean words per sentence | 15-20 words (optimized for readability). | Varies wildly (5 to 40+ words). |