Malicious Link Detector
Advanced heuristic URL analysis tool. Detects phishing patterns, typosquatting, homoglyph attacks, and obfuscation techniques without external API dependencies.
Heuristic Engine v3.4 • Client-Side Privacy
About
The Malicious Link Detector is a client-side heuristic analysis engine designed to deconstruct and evaluate Uniform Resource Locators (URLs) for security threats. Unlike basic scanners that rely solely on blacklists, this tool performs deep structural analysis to identify Typosquatting (e.g., g0ogle.com), Homoglyph Attacks (using non-Latin characters to mimic legitimate letters), and high-entropy obfuscation often used in Domain Generation Algorithms (DGAs).
Phishing attacks account for over 80% of reported security incidents. Attackers often utilize URL Shorteners or Open Redirects to mask their final destination. This tool dissects the URL into its atomic components - Scheme, Authority, Path, and Query - to expose hidden risks, unencrypted connections, and suspicious Top-Level Domains (TLDs). It provides a granular risk score based on weighted vectors of suspicion, ensuring users can verify links from emails or SMS before establishing a connection.
Formulas
The scanner employs Shannon Entropy to detect random, machine-generated subdomains (DGA). The entropy H of a string S is calculated as:
Where pi is the probability of character i appearing in the string. High entropy values (> 4.5) often indicate non-human readable strings.
Typosquatting Detection utilizes the Levenshtein Distance algorithm to measure the edit distance between the input domain a and a target popular domain b (e.g., 'google').
Reference Data
| Indicator | Risk Level | Description |
|---|---|---|
| Homoglyph (IDN) | CRITICAL | Use of Cyrillic/Greek characters that visually resemble Latin (e.g., "a" vs 'а'). |
| IP Hostname | HIGH | Direct IP usage (e.g., 192.168.1.1) instead of a domain name is rare for legitimate public services. |
| @ Symbol (Auth) | HIGH | Used to obscure the true domain (e.g., user:[email protected]). Browsers may ignore text before "@". |
| Risky TLD | MEDIUM | TLDs like .zip, .country, .gq are statistically higher in abuse rates than .com or .org. |
| Deep Subdomains | MEDIUM | Excessive nesting (e.g., paypal.verify.secure.com) attempts to push the actual domain off-screen on mobile. |
| HTTP Scheme | MEDIUM | Lack of SSL/TLS encryption. Data is transmitted in cleartext. |