Word Frequency Counter
Advanced text analysis suite: N-Gram (1-3 word) detection, keyword density, readability scoring, and sentiment analysis with CSV/JSON export.
About
In information retrieval and natural language processing, the frequency of lexical tokens obeys Zipf's Law, which states that the frequency of any word is inversely proportional to its rank in the frequency table. For SEO specialists and data linguists, relying solely on single-word counts is insufficient. Search engines analyze semantic context through N-Grams (contiguous sequences of n items from a given sample of text).
This tool surpasses standard counters by implementing a multi-layer analysis engine. It calculates Type-Token Ratio (TTR) to assess lexical diversity, processes Bigrams (n=2) and Trigrams (n=3) to detect recurring phrases, and estimates readability metrics. The local database includes an expanded library of over 800 stop words, including archaic forms and SEO filtering terms, ensuring that the noise-to-signal ratio is minimized during analysis.
Formulas
To identify the significance of a term beyond raw count, we consider the frequency f of a term t relative to the total word count N. However, to analyze phrase patterns, we utilize N-Gram probability approximation:
The Lexical Diversity, or Type-Token Ratio (TTR), serves as an indicator of writing quality. A low TTR suggests repetitive, simple text, while a high TTR indicates complex vocabulary.
Where V is the size of the vocabulary (unique words) and N is the total number of tokens.
Reference Data
| Metric | Definition | Formula / Representation | Typical Value (SEO) |
|---|---|---|---|
| Keyword Density | Relative frequency of a term. | countTotalWords × 100 | 1% - 2.5% |
| Lexical Diversity (TTR) | Variety of vocabulary used. | Unique TypesTotal Tokens | > 0.45 |
| Bigram | Two consecutive words. | wi, wi+1 | Context Dependent |
| Readability (Auto) | Sentence complexity proxy. | Avg(Words / Sentence) | 15 - 20 words |