About

Binary comparison errors propagate silently. A single flipped bit in firmware, a corrupted byte in a cryptographic key, or a transmission error in a serialized protocol buffer can cause catastrophic downstream failures without any human-readable symptom. This tool performs a strict byte-level comparison of two binary streams, reporting every offset where stream A diverges from stream B. It computes the Hamming distance d_H, percentage similarity, and renders a visual heatmap so spatial patterns in corruption become immediately obvious. Input accepts raw files (up to 50 MB), hex-encoded strings, or plain UTF-8 text encoded to bytes. The tool approximates diff analysis assuming both streams share a common alignment at offset 0. It does not perform sequence alignment or fuzzy matching. For streams of unequal length, trailing bytes in the longer stream are flagged as overflow differences.

Formulas

The primary metric is the Hamming distance computed over the overlapping region of both streams:

d_H = min(n_A,n_B)−1∑i=0 δ(A[i] ≠ B[i])

where δ is the Kronecker indicator returning 1 on mismatch. The total difference count includes overflow bytes:

D = d_H + |n_A − n_B|

Similarity as a percentage over the maximum stream length:

S = max(n_A, n_B) − Dmax(n_A, n_B) × 100 %

where A[i] is the byte at offset i in stream A, n_A is byte length of stream A, n_B is byte length of stream B, D is total differences including overflow, and S is similarity percentage.

Reference Data

Metric	Symbol	Description	Range
Hamming Distance	d_H	Count of byte positions where streams differ	0 - min(n_A, n_B)
Similarity	S	Percentage of matching bytes over max length	0 - 100 %
First Diff Offset	f	Byte offset of the earliest mismatch	0 - n−1
Last Diff Offset	l	Byte offset of the latest mismatch	0 - n−1
Stream A Size	n_A	Total bytes in stream A	0 - 52428800 B
Stream B Size	n_B	Total bytes in stream B	0 - 52428800 B
Overflow Bytes	Δn	Absolute size difference between streams	0 - 52428800 B
Diff Density	ρ	Ratio of differing bytes to compared range	0.0 - 1.0
Match Count	m	Number of identical byte positions	0 - n
Compared Length	min(n_A, n_B)	Number of byte positions actually compared	0 - 52428800
Hex Byte	-	Two-character representation, base-16	00 - FF
ASCII Printable Range	-	Characters rendered in hex dump sidebar	0x20 - 0x7E

Frequently Asked Questions

The comparison runs over the overlapping region from offset 0 to min(n_A, n_B) − 1. Bytes beyond the shorter stream's length are counted as overflow differences and shown in gray on the visual map. The similarity formula uses max(n_A, n_B) as the denominator, so unequal lengths always reduce similarity.

Comparisons exceeding 1 MB are offloaded to a Web Worker to prevent UI freezing. A progress indicator shows completion percentage. The maximum accepted size is 50 MB per stream. Beyond that threshold the tool rejects the input to prevent browser memory exhaustion. The hex dump view is paginated at 4096 bytes per page regardless of file size.

Yes. The parser strips all whitespace, commas, and optional 0x prefixes. It then validates that only characters 0 - 9 and A - F (case-insensitive) remain. If the resulting string has odd length, a leading zero is prepended. Invalid characters trigger an error toast identifying the first illegal character and its position.

The map is drawn on an HTML Canvas element. Each pixel corresponds to one byte. Green (#6BCB77) indicates a matching byte. Red (#F87C8C) indicates a mismatch within the overlapping region. Gray (#C4C7D4) marks overflow bytes present in only one stream. The canvas width is fixed at 256 pixels, so rows wrap every 256 bytes. This layout reveals spatial corruption patterns such as block-aligned errors or periodic bit flips.

No. This tool performs fixed-offset comparison only. It does not implement sequence alignment algorithms like Needleman-Wunsch or Myers diff. A single inserted byte at offset 0 will cause every subsequent byte to appear different. For insertion/deletion detection, a dedicated binary diff tool with alignment capability is required.

Text input is encoded to bytes using the TextEncoder API, which produces UTF-8. A single ASCII character maps to 1 byte. Multi-byte characters (e.g., emoji, CJK) expand to 2 - 4 bytes. This means string length and byte length can differ significantly for non-ASCII text.