About

Regular expressions compress powerful pattern logic into terse syntax. A production regex like ^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$ is functionally correct but visually opaque. Misreading a single quantifier or group boundary causes silent match failures in validation pipelines, data extraction, and routing logic. This tool parses any regex string into an Abstract Syntax Tree and renders it as an ASCII railroad diagram using box-drawing characters (┌─┐│└─┘). The output is plain-text, paste-ready for code comments, README files, terminal output, or any monospace environment where images are impractical.

The parser handles capturing and non-capturing groups, character classes with ranges, nested quantifiers, alternation branches, escape sequences, and anchors. Note: this tool approximates visualization for readability. Lookaheads, lookbehinds, and Unicode property escapes (\p{}) are displayed as labeled nodes but not expanded internally. The diagram uses a left-to-right flow convention consistent with standard railroad diagram notation (ISO/IEC 14977 EBNF visual style).

Formulas

The regex-to-ASCII conversion follows a two-phase pipeline: parsing and rendering.

parse(regex) → AST → render(AST) → ASCII

The parser implements a recursive descent strategy with operator precedence. At the top level, an expression is a sequence of alternatives separated by |. Each alternative is a sequence of terms. Each term is an atom optionally followed by a quantifier.

The rendering phase walks the AST and assigns each node a bounding box measured in character cells. Width w of a sequence node equals the sum of child widths plus connector characters. For alternation, height h equals the sum of branch heights plus separator lines.

w_seq = n∑i=1 w_i + (n − 1) ⋅ 1

Where w_i is the rendered width of the i-th child node, and the additional 1 accounts for the connecting ─ character between boxes. Quantifier suffixes add 2 - 6 characters depending on notation length (e.g., {2,5} adds 5 characters).

Reference Data

Regex Token	Symbol	ASCII Representation	Description
Literal	a	`─[ a ]─`	Matches exact character
Dot	.	`─[ . ANY ]─`	Matches any character except newline
Character Class	[a-z]	`─[ a-z ]─`	Matches one character from set
Negated Class	[^0-9]	`─[ ^0-9 ]─`	Matches any character NOT in set
Capturing Group	(abc)	`─┤ Group #1 ├─`	Captures matched substring
Non-capturing Group	(?:abc)	`─┤ Group ├─`	Groups without capturing
Alternation	a\|b	`┬─[ a ]─┬` `└─[ b ]─┘`	Matches either branch
Zero or More	a*	`─[ a ]─⟲*`	Matches 0 or more times
One or More	a+	`─[ a ]─⟲+`	Matches 1 or more times
Optional	a?	`─[ a ]─?`	Matches 0 or 1 time
Exact Count	a{3}	`─[ a ]─{3}`	Matches exactly n times
Range Count	a{2,5}	`─[ a ]─{2,5}`	Matches n to m times
Start Anchor	^	`─[ ^ START ]─`	Asserts start of string/line
End Anchor	$	`─[ $ END ]─`	Asserts end of string/line
Word Boundary	\b	`─[ \b BOUNDARY ]─`	Asserts word boundary position
Digit	\d	`─[ \d 0-9 ]─`	Shorthand for [0-9]
Word Char	\w	`─[ \w a-zA-Z0-9_ ]─`	Shorthand for [a-zA-Z0-9_]
Whitespace	\s	`─[ \s SPACE ]─`	Matches whitespace characters
Lookahead	(?=abc)	`─┤ ?= LOOK ├─`	Positive lookahead assertion
Neg. Lookahead	(?!abc)	`─┤ ?! LOOK ├─`	Negative lookahead assertion
Lookbehind	(?<=abc)	`─┤ ?<= LOOK ├─`	Positive lookbehind assertion
Lazy Quantifier	a*?	`─[ a ]─⟲*?`	Matches minimum possible times

Frequently Asked Questions

The recursive descent parser treats each opening parenthesis as a new sub-expression scope. Inside that scope, alternation is parsed at the top precedence level, creating branch nodes. When the group closes, the entire sub-AST becomes a single Group node. If a quantifier follows the closing parenthesis, it wraps the Group node. The ASCII renderer draws the group boundary first (using ┤ ├ delimiters), renders the internal branches, then appends the quantifier annotation to the right edge. This means (a|b)+ renders as a group box containing two branches, with a ⟲+ loop marker on the group's output connector.

The test-match execution runs inside a timeout guard. If matching a test string takes longer than 2000ms, execution is terminated and a warning toast is displayed. The ASCII diagram itself is generated from the parsed structure, not from execution, so it renders instantly regardless of backtracking risk. The diagram will correctly show the nested quantifier structure, which can help you visually identify the exponential branching that causes backtracking.

Named capture groups (?...) are parsed and displayed with their name label in the group box (e.g., ─┤ Group "name" ├─). Unicode property escapes (\p{Script=Latin}) are treated as opaque labeled nodes showing the raw escape text, since expanding all Unicode property categories into character ranges would produce diagrams thousands of characters wide. The u flag must be enabled for the regex engine to accept these patterns during test matching.

The renderer targets a maximum width of 120 characters per line by default. For complex patterns exceeding this, alternation branches and deeply nested groups are wrapped vertically rather than extended horizontally. You can copy the output directly into block comments (/* */ or # prefixed lines). The export function also offers a version with configurable prefix characters for different comment styles.

The character class parser follows the ECMA-262 specification: a hyphen is treated as a range operator only when it appears between two literal characters (e.g., a-z). A hyphen at the start or end of the class, or immediately after another range, is treated as a literal hyphen character. The ASCII box displays the condensed class content exactly as written. Negated classes (starting with ^) are shown with the ^ prefix inside the box label.

A railroad diagram (also called syntax diagram) shows the grammatical structure of the pattern: which tokens appear in sequence, which are alternatives, and which repeat. It does NOT show states or transitions of the equivalent NFA/DFA. A finite automaton diagram would show every possible state and transition edge, which for even simple patterns with quantifiers produces dozens of states. The railroad approach is more compact and readable for human comprehension of pattern intent, while automaton diagrams are better for analyzing engine execution behavior.