About

Email spam filters use multi-layered heuristic engines that score messages against hundreds of weighted rules. A single misplaced keyword like FREE combined with excessive capitalization can push your message past the spam threshold T ≈ 5.0 on the SpamAssassin scale. Legitimate marketing emails get flagged at rates between 10% and 20%, costing businesses measurable revenue per campaign. This tool runs your text through 150+ pattern detectors covering urgency triggers, phishing signatures, obfuscated characters, and statistical anomalies to produce a composite spam probability P_spam.

The analyzer does not connect to any external service. All processing happens in your browser. It approximates the behavior of Bayesian classifiers by using a curated rule dictionary with empirically assigned weights. Limitations: it cannot evaluate sender reputation, SPF/DKIM headers, or IP blacklists. It analyzes content signals only. For production mail campaigns, cross-reference results with tools like Mail-Tester that inspect delivery infrastructure.

Formulas

The composite spam probability is computed by summing weighted category scores and normalizing through a sigmoid function to bound the output between 0 and 100%.

S_raw = n∑i=1 w_i ⋅ m_i

where w_i is the weight of the i-th rule and m_i is the match count (typically 0 or 1, but frequency-scaled for repeated triggers).

P_spam = 1001 + e^{−k(S_raw − T)}

where k = 0.35 controls the steepness of the curve and T = 8.0 is the midpoint threshold calibrated against SpamAssassin defaults. A raw score of 5.0 yields approximately 25% probability. A score above 15 yields > 90%.

Additional heuristic signals are computed as ratios:

R_caps = N_uppercaseN_alpha × 100%

where R_caps > 30% triggers the ALL CAPS penalty. Similarly, R_digits measures numeric character density and R_special measures non-alphanumeric pollution.

Reference Data

Spam Indicator Category	Example Triggers	Typical Weight	SpamAssassin Rule	Risk Level
Urgency / Pressure	"Act now", "Limited time", "Expires today"	2.5	URGENCY_PHRASES	High
Financial Bait	"$$$", "Million dollars", "Wire transfer"	3.0	MONEY_PHRASES	Critical
Pharmaceutical	"Viagra", "V1@gra", "Pharmacy", "Pills"	3.5	DRUGS_ERECTILE	Critical
Phishing / Identity	"Verify your account", "SSN", "Password"	3.8	PHISHING_PHRASES	Critical
Free Offers	"Free gift", "No cost", "Complimentary"	2.0	FREE_OFFERS	Medium
ALL CAPS Ratio	> 30% uppercase characters	1.8	UPPERCASE_50_75	Medium
Excessive Punctuation	"!!!", "???", "$$$", "***"	1.5	EXCL_MARKS	Medium
Suspicious URLs	IP-based links, URL shorteners, .xyz/.tk	2.8	SUSPICIOUS_URL	High
Character Obfuscation	"Fr33", "W1n", "C@sh", zero for O	3.2	OBFUSCATED_WORDS	Critical
Unsubscribe Absence	Marketing email without opt-out	1.0	NO_UNSUBSCRIBE	Low
Crypto / Investment	"Bitcoin", "NFT", "ROI guaranteed"	2.5	CRYPTO_SPAM	High
Adult Content	Explicit terms, dating scam phrases	3.5	ADULT_CONTENT	Critical
Lottery / Prize	"You've won", "Congratulations", "Claim"	3.0	LOTTERY_PRIZE	Critical
HTML Anomalies	Invisible text, tiny fonts, hidden divs	2.2	HTML_TRICKS	High
Sender Impersonation	"From: support@", "Dear Customer"	2.0	IMPERSONATION	High
Emotional Manipulation	"Help me", "Dying wish", "Orphan"	2.8	EMOTIONAL_MANIP	High
Malware Indicators	".exe", ".scr", "Download attachment"	3.5	MALWARE_ATTACH	Critical
Generic Greeting	"Dear Sir/Madam", "Dear Friend"	1.2	GENERIC_GREETING	Low
Digit Ratio	> 15% digits in text body	1.3	HIGH_DIGIT_RATIO	Low
Short Body + Link	Message under 20 words with URL	2.0	SHORT_BODY_URL	Medium
Encoding Tricks	Base64 body, Unicode homoglyphs	3.0	ENCODING_TRICKS	Critical

Frequently Asked Questions

The sigmoid curve with k = 0.35 and threshold T = 8.0 creates a gradual transition zone between scores of 3 and 13. A raw score of 5.0 maps to roughly 25% probability, while 8.0 maps to exactly 50%. This prevents a single medium-weight trigger from causing a dramatic jump. Messages with only 1-2 low-weight matches (like a generic greeting) will score under 15%, which reflects real-world filter behavior where isolated signals rarely cause rejection.

Marketing emails inherently contain spam-correlated patterns: promotional language ("special offer", "limited time"), calls to action ("click here", "buy now"), and HTML formatting. This is expected. Real spam filters offset this with sender reputation and authentication (SPF, DKIM, DMARC), which this content-only tool cannot evaluate. To reduce your score: replace urgency phrases with specific dates, use your company name instead of generic greetings, include a physical address, and ensure an unsubscribe link is mentioned in the text.

Modern filters detect obfuscation through Levenshtein distance matching, regex pattern libraries, and Unicode normalization. This tool checks for common substitutions: digits for letters (0→O, 1→I, 3→E, 4→A, 5→S), symbols (@→A, $→S), and Unicode homoglyphs (Cyrillic а vs Latin a). In practice, obfuscation now increases spam scores rather than decreasing them, because legitimate senders never obfuscate words. The tool assigns obfuscation a weight of 3.2 - among the highest.

The current rule dictionary focuses on English-language spam indicators. Non-English text will score lower because fewer keyword matches fire, but structural heuristics (CAPS ratio, digit density, punctuation excess, URL analysis) remain language-agnostic and will still detect statistical anomalies. For accurate multilingual analysis, supplement results with a service that maintains localized dictionaries.

SpamAssassin's default rejection threshold is a raw score of 5.0, which our sigmoid maps to approximately 25%. Most enterprise filters reject at 5.0-7.0 (25-42% in our scale). Gmail's neural classifier operates differently but empirically rejects content-only signals around our 50-60% range. If your text scores above 40%, review the flagged indicators. Above 70%, the message will almost certainly be filtered by any major provider.

URLs receive compound scoring. A bare domain scores 0. A URL shortened via bit.ly/tinyurl adds 1.5 weight. An IP-address URL (http://192.168.x.x) adds 2.8. Suspicious TLDs (.xyz, .tk, .top, .buzz) add 1.5. Multiple URLs in a short message trigger the SHORT_BODY_URL rule at weight 2.0. These compound because phishing emails typically combine shortened URLs with urgency language, and each signal reinforces the classification independently.