User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Category Security
Supports HTTP, HTTPS, FTP, and data URIs
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

A single misclicked link costs enterprises an average of $4.91 million per phishing breach (IBM, 2023). Attackers exploit Unicode homograph attacks, replacing Latin characters with visually identical Cyrillic glyphs. The domain apple.com rendered with Cyrillic Π° (U+0430) instead of Latin a (U+0061) passes casual inspection. This tool decomposes any URL into its structural components - protocol, hostname, port, path, query parameters - and runs each through 15 weighted heuristic checks. The scoring engine assigns threat weights: an IP-address hostname scores +25, mixed-script homographs score +30, a data URI scores +35. The composite risk score R is clamped to the range [0, 100].

This tool approximates threat level using client-side heuristics only. It does not replace server-side threat intelligence feeds or sandbox detonation. Zero-day phishing domains with clean structure will score low. Always cross-reference with your organization's DNS filtering and endpoint protection.

url checker malicious link phishing detector url scanner link safety malware check suspicious url url security

Formulas

The composite risk score is computed as the clamped sum of individual threat indicator weights:

R = min(100, nβˆ‘i=1 wi β‹… ti)

where R = composite risk score (0 - 100), wi = weight assigned to threat indicator i, ti ∈ {0, 1} = binary detection flag, and n = total number of heuristic checks (15).

Risk classification thresholds:

{
LOW if R ≀ 30MEDIUM if 30 < R ≀ 60HIGH if 60 < R ≀ 85CRITICAL if R > 85

Homograph detection uses Unicode codepoint range analysis. A character c in the hostname is flagged if c ∈ [U+0400, U+04FF] (Cyrillic block) while the majority script is Latin, or vice versa. The presence of any mixed-script pair triggers thomograph = 1.

Reference Data

Threat IndicatorWeightDescriptionExample
IP Address as Host+25Hostname is a raw IPv4/IPv6 address instead of a domainhttp://192.168.1.1/login
Homograph Characters+30Mixed Unicode scripts in hostname (Cyrillic/Latin lookalikes)Π°pple.com (Cyrillic Π°)
Data URI Scheme+35URL uses data: protocol to embed executable contentdata:text/html;base64,...
Suspicious TLD+15Top-level domain frequently abused in phishing campaigns.tk, .ml, .ga, .cf, .gq, .xyz, .top, .buzz
Brand Impersonation+25Domain contains major brand name but is not the official domainpaypal-secure-login.tk
URL Shortener+10Link uses a URL shortening service hiding the real destinationbit.ly, tinyurl.com, t.co
Encoded Hostname Chars+20Percent-encoded characters in the hostname portion%61%70%70%6C%65.com
@ Symbol in URL+20Credentials in URL can mask the real destinationhttp://[email protected]
Excessive Subdomains+10More than 3 subdomain levels indicate obfuscationlogin.secure.account.bank.evil.com
Suspicious Port+15Non-standard port (β‰  80, 443) used for web traffichttp://example.com:8443
HTTP (No TLS)+10Unencrypted connection exposes data to interceptionhttp:// instead of https://
Punycode Domain+15Internationalized domain name with xn-- prefixxn--pple-43d.com
Excessive URL Length+5URL exceeds 200 characters, common in obfuscationLong query strings with encoded payloads
Deep Path Nesting+5Path depth exceeds 5 levels/a/b/c/d/e/f/login.php
Phishing Keywords+15Path or query contains terms like login, verify, secure, account, update, confirm, suspend/verify-account/secure-login

Frequently Asked Questions

The scanner iterates over each character in the hostname and checks its Unicode codepoint range. Latin characters fall within U+0041 - U+007A and U+00C0 - U+024F. Cyrillic characters occupy U+0400 - U+04FF. If characters from both script blocks appear in the same hostname label, the homograph flag triggers with a weight of +30. Common substitutions include Cyrillic Π° (U+0430) for Latin a (U+0061), Cyrillic Π΅ (U+0435) for Latin e (U+0065), and Cyrillic ΠΎ (U+043E) for Latin o (U+006F).
Many legitimate legacy sites and internal tools still use HTTP. While the absence of TLS enables man-in-the-middle attacks, it does not by itself indicate malicious intent. The weight of +10 reflects a security hygiene concern rather than a phishing indicator. Combined with other flags (brand impersonation, suspicious TLD), the cumulative score will escalate appropriately.
No. This tool performs structural and heuristic analysis on the URL string itself. A freshly registered domain with a clean structure (valid HTTPS, no homographs, no suspicious keywords) will score low. Detection of zero-day domains requires server-side WHOIS age lookups, certificate transparency log monitoring, and real-time threat intelligence feeds. Always combine client-side checks with DNS-layer filtering such as Cisco Umbrella or Cloudflare Gateway.
The tool maintains a dictionary of the top 50 most-phished brands (PayPal, Apple, Microsoft, Amazon, Netflix, Google, Facebook, etc.) along with common misspelling variants. It checks whether the hostname contains any brand keyword as a substring while the registered domain does not match the brand's known official domains. For example, paypal-login.evil.com triggers the flag because paypal appears in the hostname but the registered domain is evil.com, not paypal.com.
The URL specification (RFC 3986) allows credentials before the @ symbol: http://user:pass@host. Attackers exploit this by crafting URLs like http://[email protected]. The browser ignores everything before @ and navigates to evil.com. Visually, the user sees google.com and assumes legitimacy. This technique carries a weight of +20 because it demonstrates deliberate deception intent.
The core heuristic engine runs entirely client-side with zero network requests. All 15 checks operate on the URL string using JavaScript's native URL parser and RegExp. Your URLs are never transmitted to any server. Scan history is stored exclusively in your browser's LocalStorage. This architecture ensures complete privacy even for sensitive internal URLs.