About

URLs transmitted over HTTP must conform to RFC 3986, which restricts the allowed character set to 66 unreserved characters: A - Z, a - z, 0 - 9, and the symbols - _ . ~. Every other character - including spaces, non-ASCII glyphs, and reserved delimiters like & or = - must be percent-encoded as %HH, where HH is the uppercase hexadecimal value of the character's UTF-8 byte. Failing to encode query parameters correctly causes broken links, corrupted form submissions, injection vulnerabilities, and silent data loss in analytics pipelines. This tool performs real RFC 3986 percent-encoding on arbitrary input, including multi-byte UTF-8 sequences, and provides three encoding strictness modes.

Limitation: this tool approximates browser-native encodeURIComponent behavior for Component mode. Full URL mode preserves structural delimiters (: / ? # & = @) and encodes everything else. Encode All mode encodes every character including unreserved ones. Pro tip: always encode individual parameter values, never the entire URL string, or you will double-encode the delimiters.

Formulas

RFC 3986 defines the percent-encoding transformation. For each character c in the input string, the encoder determines whether c belongs to the unreserved set. If not, it converts c to its UTF-8 byte sequence and emits each byte as a percent-encoded triplet.

encode(c) =

{

c if c ∈ U% ⋅ hex(b_i) for each byte b_i ∈ UTF8(c) otherwise

Where U is the unreserved character set defined as:

U = { A - Z , a - z , 0 - 9 , - , _ , . , ~ }

For multi-byte characters (code point > 127), the character is first encoded into its UTF-8 byte representation. A character with code point U produces 1 to 4 bytes depending on the range:

{

1 byte if U ≤ 0x7F2 bytes if 0x80 ≤ U ≤ 0x7FF3 bytes if 0x800 ≤ U ≤ 0xFFFF4 bytes if 0x10000 ≤ U ≤ 0x10FFFF

Where c = input character, U = unreserved set, b_i = i-th byte of UTF-8 encoding, hex = uppercase hexadecimal conversion function.

Reference Data

Dec	Hex	Char	Encoded	Category	Description
0	00	NUL	%00	Control	Null character
9	09	TAB	%09	Control	Horizontal tab
10	0A	LF	%0A	Control	Line feed (newline)
13	0D	CR	%0D	Control	Carriage return
32	20	SP	%20	Reserved	Space (also + in forms)
33	21	!	%21	Sub-delimiter	Exclamation mark
34	22	"	%22	Unsafe	Double quote
35	23	#	%23	Reserved (gen-delim)	Fragment identifier
36	24	$	%24	Sub-delimiter	Dollar sign
37	25	%	%25	Reserved (encoding)	Percent sign (escape char itself)
38	26	&	%26	Reserved (gen-delim)	Ampersand (query separator)
39	27	'	%27	Sub-delimiter	Single quote / apostrophe
40	28	(	%28	Sub-delimiter	Opening parenthesis
41	29	)	%29	Sub-delimiter	Closing parenthesis
42	2A	*	%2A	Sub-delimiter	Asterisk
43	2B	+	%2B	Sub-delimiter	Plus sign (space in forms)
44	2C	,	%2C	Sub-delimiter	Comma
47	2F	/	%2F	Reserved (gen-delim)	Forward slash (path separator)
58	3A	:	%3A	Reserved (gen-delim)	Colon (scheme separator)
59	3B	;	%3B	Sub-delimiter	Semicolon
60	3C	<	%3C	Unsafe	Less-than sign
61	3D	=	%3D	Reserved (gen-delim)	Equals sign (key-value separator)
62	3E	>	%3E	Unsafe	Greater-than sign
63	3F	?	%3F	Reserved (gen-delim)	Question mark (query start)
64	40	@	%40	Reserved (gen-delim)	At sign (userinfo separator)
91	5B	[	%5B	Reserved (gen-delim)	Opening bracket (IPv6)
93	5D	]	%5D	Reserved (gen-delim)	Closing bracket (IPv6)
123	7B	{	%7B	Unsafe	Opening brace
124	7C	\|	%7C	Unsafe	Pipe / vertical bar
125	7D	}	%7D	Unsafe	Closing brace
126	7E	~	~	Unreserved	Tilde (not encoded)
127	7F	DEL	%7F	Control	Delete character

Frequently Asked Questions

RFC 3986 specifies %20 as the correct percent-encoding for a space character (ASCII 32). The + convention comes from the older application/x-www-form-urlencoded format (HTML form submissions), defined in the W3C HTML specification. In query strings submitted by HTML forms, spaces become +. In all other URL components (path, fragment, non-form queries), spaces must be %20. This tool uses %20 by default since it is universally valid.

Emoji characters have Unicode code points above 0xFFFF, placing them in the Supplementary Multilingual Plane. UTF-8 encodes these as 4 bytes. For example, the character U+1F600 (😀) becomes the byte sequence F0 9F 98 80, which percent-encodes as %F0%9F%98%80. Each %HH triplet represents one byte, not one character.

Double-encoding occurs when you encode an already-encoded string. The % character (ASCII 37) in %20 gets re-encoded to %25, producing %2520. The server then decodes it once, yielding %20 as a literal string instead of a space. This is a common bug in URL construction. Always encode raw values before assembling them into a URL, never encode the final assembled URL.

RFC 3986 Section 2.3 defines exactly 66 unreserved characters: uppercase A - Z (26), lowercase a - z (26), digits 0 - 9 (10), and four symbols: hyphen -, underscore _, period ., tilde ~. Every other character, including commonly assumed safe ones like !, *, and ', should be percent-encoded for maximum interoperability.

Component mode (equivalent to JavaScript's encodeURIComponent) encodes everything except the 66 unreserved characters. It is designed for encoding individual query parameter values or path segments. Full URL mode preserves the structural delimiters that define URL syntax: : / ? # & = @. Use Full URL mode when encoding an entire URL that already has correct structure. Use Component mode when encoding a value that will be inserted into a URL template.

Yes. Any valid Unicode code point from U+0000 to U+10FFFF can be percent-encoded. The character is first converted to its UTF-8 byte sequence (which can be 1 to 4 bytes), and each byte is individually percent-encoded. JavaScript's TextEncoder API handles surrogate pair resolution automatically, so characters like musical symbols (U+1D11E, 𝄞) or rare CJK ideographs encode correctly as 4-byte sequences.