HTML Attribute and Style Cleaner
Sanitize messy HTML by stripping inline styles, classes, and unwanted attributes. Includes whitelisting and code formatting for CMS migration.
Cleaning Rules
Comma separated. If 'Strip All' is checked below, ONLY these will remain.
About
Content migration from rich text editors or word processors often introduces excessive markup bloat. Inline styles, proprietary classes, and non-standard attributes increase page weight and conflict with site-wide CSS. This tool parses raw HTML strings to strip unwanted attributes while preserving the document structure. It functions client-side using the browser's DOM parser, ensuring data remains local.
Web developers and content managers use this to sanitize markup before pasting it into a Content Management System. Unlike simple regex replacements which break on nested tags, this tool traverses the DOM tree. It allows specific attribute whitelisting, ensuring essential data like image sources or links remain intact while removing visual clutter.
Formulas
The cleaning logic operates on Set Theory principles. For every HTML element, let A be the set of existing attributes and W be the set of whitelisted attributes defined by the user. The operation performed is an intersection:
Where R is the resulting set of attributes preserved on the element. If the "Strip All" mode is disabled, the logic inverts to a blacklist approach where specific sets (like S for Styles) are subtracted:
Reference Data
| Tag Group | Recommended Whitelist | Description |
|---|---|---|
| Links | href, target, title, rel | Essential for navigation and SEO anchors. |
| Images | src, alt, width, height, title | Maintains visual content and accessibility standards. |
| Forms | action, method, name, type, value, placeholder | Required for functional user input fields. |
| Tables | colspan, rowspan, scope | Preserves structural data relationships. |
| Meta/Script | content, name, charset, src, type, async, defer | Critical for document headers and logic loading. |
| Global | id, class, data-*, lang, dir | Identifiers and language settings (often stripped in strict cleaning). |
| Embeds | src, width, height, allow, frameborder | Necessary for IFrames (YouTube, Maps). |
| Accessibility | aria-*, role, tabindex | Maintains screen reader compatibility. |