User Rating 0.0 ★★★★★

Total Usage 1 times

Category HTML/XML Utilities

Raw HTML Input

Length: 0 chars

Cleaning Rules

Attribute Whitelist (Keep these)

Comma separated. If 'Strip All' is checked below, ONLY these will remain.

STRIP ALL ATTRIBUTES (except whitelist)

Cleaned HTML Output

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Content migration from rich text editors or word processors often introduces excessive markup bloat. Inline styles, proprietary classes, and non-standard attributes increase page weight and conflict with site-wide CSS. This tool parses raw HTML strings to strip unwanted attributes while preserving the document structure. It functions client-side using the browser's DOM parser, ensuring data remains local.

Web developers and content managers use this to sanitize markup before pasting it into a Content Management System. Unlike simple regex replacements which break on nested tags, this tool traverses the DOM tree. It allows specific attribute whitelisting, ensuring essential data like image sources or links remain intact while removing visual clutter.

Formulas

The cleaning logic operates on Set Theory principles. For every HTML element, let A be the set of existing attributes and W be the set of whitelisted attributes defined by the user. The operation performed is an intersection:

R = A ∩ W

Where R is the resulting set of attributes preserved on the element. If the "Strip All" mode is disabled, the logic inverts to a blacklist approach where specific sets (like S for Styles) are subtracted:

R = A − S

Reference Data

Tag Group	Recommended Whitelist	Description
Links	href, target, title, rel	Essential for navigation and SEO anchors.
Images	src, alt, width, height, title	Maintains visual content and accessibility standards.
Forms	action, method, name, type, value, placeholder	Required for functional user input fields.
Tables	colspan, rowspan, scope	Preserves structural data relationships.
Meta/Script	content, name, charset, src, type, async, defer	Critical for document headers and logic loading.
Global	id, class, data-*, lang, dir	Identifiers and language settings (often stripped in strict cleaning).
Embeds	src, width, height, allow, frameborder	Necessary for IFrames (YouTube, Maps).
Accessibility	aria-*, role, tabindex	Maintains screen reader compatibility.

Frequently Asked Questions

The browser's DOM parser automatically decodes entities (like ©) into their character equivalents during parsing. When the HTML is regenerated, the tool re-encodes strictly necessary characters (like < and >) to maintain valid syntax, but may leave standard text characters as literals.

By default, this tool focuses on attributes (like onclick or style). However, if you add "script" to a tag removal list (feature available in advanced parsers), it would remove them. This tool is designed for cleanup, not strict security sanitization (like preventing XSS), though removing event handlers like "onload" significantly reduces risk.

Yes. If your layout relies on specific utility classes (e.g., Bootstrap or Tailwind), removing the "class" attribute will strip that styling. Use the whitelist feature to keep the "class" attribute if you intend to preserve existing CSS mappings.

The beautifier recursively walks the clean DOM tree. It assigns indentation levels based on nesting depth. It assumes a standard 2-space or 4-space tab width. Note that this adds whitespace text nodes to the document structure solely for readability.