About

A single misclicked link costs enterprises an average of $4.91 million per phishing breach (IBM, 2023). Attackers exploit Unicode homograph attacks, replacing Latin characters with visually identical Cyrillic glyphs. The domain apple.com rendered with Cyrillic а (U+0430) instead of Latin a (U+0061) passes casual inspection. This tool decomposes any URL into its structural components - protocol, hostname, port, path, query parameters - and runs each through 15 weighted heuristic checks. The scoring engine assigns threat weights: an IP-address hostname scores +25, mixed-script homographs score +30, a data URI scores +35. The composite risk score R is clamped to the range [0, 100].

This tool approximates threat level using client-side heuristics only. It does not replace server-side threat intelligence feeds or sandbox detonation. Zero-day phishing domains with clean structure will score low. Always cross-reference with your organization's DNS filtering and endpoint protection.

Formulas

The composite risk score is computed as the clamped sum of individual threat indicator weights:

R = min(100, n∑i=1 w_i ⋅ t_i)

where R = composite risk score (0 - 100), w_i = weight assigned to threat indicator i, t_i ∈ {0, 1} = binary detection flag, and n = total number of heuristic checks (15).

Risk classification thresholds:

{

LOW if R ≤ 30MEDIUM if 30 < R ≤ 60HIGH if 60 < R ≤ 85CRITICAL if R > 85

Homograph detection uses Unicode codepoint range analysis. A character c in the hostname is flagged if c ∈ [U+0400, U+04FF] (Cyrillic block) while the majority script is Latin, or vice versa. The presence of any mixed-script pair triggers t_homograph = 1.

Reference Data

Threat Indicator	Weight	Description	Example
IP Address as Host	+25	Hostname is a raw IPv4/IPv6 address instead of a domain	http://192.168.1.1/login
Homograph Characters	+30	Mixed Unicode scripts in hostname (Cyrillic/Latin lookalikes)	аpple.com (Cyrillic а)
Data URI Scheme	+35	URL uses data: protocol to embed executable content	data:text/html;base64,...
Suspicious TLD	+15	Top-level domain frequently abused in phishing campaigns	.tk, .ml, .ga, .cf, .gq, .xyz, .top, .buzz
Brand Impersonation	+25	Domain contains major brand name but is not the official domain	paypal-secure-login.tk
URL Shortener	+10	Link uses a URL shortening service hiding the real destination	bit.ly, tinyurl.com, t.co
Encoded Hostname Chars	+20	Percent-encoded characters in the hostname portion	%61%70%70%6C%65.com
@ Symbol in URL	+20	Credentials in URL can mask the real destination	http://[email protected]
Excessive Subdomains	+10	More than 3 subdomain levels indicate obfuscation	login.secure.account.bank.evil.com
Suspicious Port	+15	Non-standard port (≠ 80, 443) used for web traffic	http://example.com:8443
HTTP (No TLS)	+10	Unencrypted connection exposes data to interception	http:// instead of https://
Punycode Domain	+15	Internationalized domain name with xn-- prefix	xn--pple-43d.com
Excessive URL Length	+5	URL exceeds 200 characters, common in obfuscation	Long query strings with encoded payloads
Deep Path Nesting	+5	Path depth exceeds 5 levels	/a/b/c/d/e/f/login.php
Phishing Keywords	+15	Path or query contains terms like login, verify, secure, account, update, confirm, suspend	/verify-account/secure-login

Frequently Asked Questions

The scanner iterates over each character in the hostname and checks its Unicode codepoint range. Latin characters fall within U+0041 - U+007A and U+00C0 - U+024F. Cyrillic characters occupy U+0400 - U+04FF. If characters from both script blocks appear in the same hostname label, the homograph flag triggers with a weight of +30. Common substitutions include Cyrillic а (U+0430) for Latin a (U+0061), Cyrillic е (U+0435) for Latin e (U+0065), and Cyrillic о (U+043E) for Latin o (U+006F).

Many legitimate legacy sites and internal tools still use HTTP. While the absence of TLS enables man-in-the-middle attacks, it does not by itself indicate malicious intent. The weight of +10 reflects a security hygiene concern rather than a phishing indicator. Combined with other flags (brand impersonation, suspicious TLD), the cumulative score will escalate appropriately.

No. This tool performs structural and heuristic analysis on the URL string itself. A freshly registered domain with a clean structure (valid HTTPS, no homographs, no suspicious keywords) will score low. Detection of zero-day domains requires server-side WHOIS age lookups, certificate transparency log monitoring, and real-time threat intelligence feeds. Always combine client-side checks with DNS-layer filtering such as Cisco Umbrella or Cloudflare Gateway.

The tool maintains a dictionary of the top 50 most-phished brands (PayPal, Apple, Microsoft, Amazon, Netflix, Google, Facebook, etc.) along with common misspelling variants. It checks whether the hostname contains any brand keyword as a substring while the registered domain does not match the brand's known official domains. For example, paypal-login.evil.com triggers the flag because paypal appears in the hostname but the registered domain is evil.com, not paypal.com.

The URL specification (RFC 3986) allows credentials before the @ symbol: http://user:pass@host. Attackers exploit this by crafting URLs like http://[email protected]. The browser ignores everything before @ and navigates to evil.com. Visually, the user sees google.com and assumes legitimacy. This technique carries a weight of +20 because it demonstrates deliberate deception intent.

The core heuristic engine runs entirely client-side with zero network requests. All 15 checks operate on the URL string using JavaScript's native URL parser and RegExp. Your URLs are never transmitted to any server. Scan history is stored exclusively in your browser's LocalStorage. This architecture ensures complete privacy even for sensitive internal URLs.