AI Text Detector (Linguistic Focus)
Advanced client-side linguistic analysis tool using Shannon Entropy and stylometric fingerprinting to identify potential AI-generated text patterns without external APIs.
About
This tool utilizes stylometric analysis and information theory to detect linguistic patterns often associated with Large Language Models (LLMs). Unlike semantic detectors that look for factual inconsistencies, this engine analyzes the structure and predictability of text.
Artificial Intelligence models generate text based on probability maximization, often resulting in lower Shannon Entropy and uniform sentence structures. Human writing, conversely, exhibits "burstiness" - high variance in sentence length, complexity, and lexical diversity. By calculating the entropy density and lexical richness, we can flag text that lacks the chaotic nuances of human thought.
This detector operates entirely in your browser using local logic. It parses text into constituent sentences, calculates the entropy H for each, and maps the lexical diversity against a database of robotic fingerprints.
Formulas
The core mechanism relies on Shannon Entropy to measure the information density of a given text block X:
Where p(xi) is the probability frequency of character xi. We also utilize the Type-Token Ratio (TTR) for lexical diversity:
The Robotic Score R is a weighted composite of entropy inverse and pattern matching:
Reference Data
| Metric | Human Baseline | AI / Synthetic Baseline | Significance |
|---|---|---|---|
| Shannon Entropy | High Variance (> 4.5 bits) | Low / Uniform (< 3.8 bits) | Measures the unpredictability of information content. |
| Lexical Diversity (TTR) | 0.55 - 0.75 | 0.35 - 0.50 | Ratio of unique words to total words. AI tends to reuse common tokens. |
| Sentence Variance | High (Burstiness) | Low (Flat) | Humans vary sentence length significantly; AI seeks an "average" length. |
| Perplexity Proxy | Spikes on nouns/verbs | Smooth distribution | Measures how "surprised" a model is by the text. |
| Connective Density | Context-dependent | Over-utilized | Excessive use of "Furthermore", "In conclusion", "However". |
| Syllabic Complexity | Varied | Standardized | AI often prefers simpler, high-frequency token structures. |