About

Binary analysis quantifies the statistical and structural properties of raw byte sequences. Misidentifying a file format, overlooking embedded payloads, or misjudging entropy can lead to failed forensic investigations, missed malware signatures, or corrupted data recovery. This tool computes Shannon entropy H across the full byte space (0x00 - 0xFF), maps frequency distributions, detects file signatures via magic byte matching against 80+ known headers, extracts printable ASCII strings, and renders a standard hex dump. All computation runs locally in the browser with no server transmission.

Entropy values near 0 indicate uniform or repetitive data. Values approaching 8.0 bits/byte suggest compression or encryption. The byte frequency histogram reveals structural patterns invisible in raw hex: null-byte padding, ASCII text regions, or high-entropy encrypted blocks. Note: magic byte detection relies on header matching only and does not validate full file structure. Files with stripped or spoofed headers will not be identified correctly. Pro tip: compare entropy across file sections to locate embedded archives or appended payloads within a single binary.

Formulas

Shannon entropy measures the average information content per byte. For a data stream of N total bytes where byte value i occurs n_i times:

H = − 255∑i=0 p(i) ⋅ log₂ p(i)

where the probability of each byte value is:

p(i) = n_iN

where H = Shannon entropy in bits/byte, p(i) = probability of byte value i occurring, n_i = count of byte value i, N = total number of bytes. The result ranges from 0 (all bytes identical) to 8.0 (perfectly uniform distribution across all 256 possible byte values). Chi-squared (χ²) goodness-of-fit tests randomness against a uniform distribution. The expected frequency for each byte in a truly random stream is:

E = N256

χ² = 255∑i=0 (n_i − E)²E

where χ² ≈ 255 indicates a near-random distribution. Values significantly above 255 suggest non-random structure.

Reference Data

File Type	Magic Bytes (Hex)	ASCII	Typical Entropy	Extension
PNG Image	89 50 4E 47 0D 0A 1A 0A	.PNG....	7.5 - 7.9 bits/byte	.png
JPEG Image	FF D8 FF	ÿØÿ	7.4 - 7.9 bits/byte	.jpg/.jpeg
GIF Image	47 49 46 38	GIF8	6.0 - 7.5 bits/byte	.gif
PDF Document	25 50 44 46 2D	%PDF-	6.5 - 7.8 bits/byte	.pdf
ZIP Archive	50 4B 03 04	PK..	7.6 - 8.0 bits/byte	.zip
GZIP Archive	1F 8B	..	7.8 - 8.0 bits/byte	.gz
RAR Archive	52 61 72 21 1A 07	Rar!..	7.8 - 8.0 bits/byte	.rar
7-Zip Archive	37 7A BC AF 27 1C	7z....	7.9 - 8.0 bits/byte	.7z
ELF Executable	7F 45 4C 46	.ELF	5.0 - 7.0 bits/byte	(none)
PE/EXE (MZ)	4D 5A	MZ	5.5 - 7.5 bits/byte	.exe/.dll
Mach-O (64-bit)	CF FA ED FE	....	5.0 - 7.0 bits/byte	(none)
BMP Image	42 4D	BM	3.0 - 6.0 bits/byte	.bmp
WAV Audio	52 49 46 46	RIFF	6.0 - 7.5 bits/byte	.wav
MP3 Audio	FF FB or 49 44 33	.. or ID3	7.5 - 7.9 bits/byte	.mp3
MP4 Video	00 00 00 .. 66 74 79 70	....ftyp	7.5 - 8.0 bits/byte	.mp4
WebP Image	52 49 46 46 .. .. .. .. 57 45 42 50	RIFF....WEBP	7.0 - 7.8 bits/byte	.webp
SQLite Database	53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00	SQLite format 3.	4.0 - 6.5 bits/byte	.db/.sqlite
TIFF Image (LE)	49 49 2A 00	II*.	6.0 - 7.5 bits/byte	.tiff
OGG Audio	4F 67 67 53	OggS	7.5 - 7.9 bits/byte	.ogg
FLAC Audio	66 4C 61 43	fLaC	7.0 - 7.8 bits/byte	.flac
Class (Java)	CA FE BA BE	....	5.5 - 7.0 bits/byte	.class
WASM Binary	00 61 73 6D	.asm	5.0 - 7.0 bits/byte	.wasm
TAR Archive	75 73 74 61 72 (at offset 257)	ustar	4.0 - 6.0 bits/byte	.tar
XML Document	3C 3F 78 6D 6C	<?xml	4.0 - 5.5 bits/byte	.xml
Plain Text (ASCII)	(no magic bytes)	-	3.5 - 5.0 bits/byte	.txt

Frequently Asked Questions

An entropy of 7.99 bits/byte is extremely close to the theoretical maximum of 8.0, meaning the byte distribution is nearly uniform. This is characteristic of encrypted data, compressed archives (ZIP, GZIP), or cryptographically secure random output. Uncompressed text typically falls between 3.5 and 5.0 bits/byte. If you see near-maximum entropy in a file that should not be encrypted or compressed, it may indicate embedded payloads or obfuscated content.

Magic byte detection matches only the first 16 bytes (or specific offsets like 257 for TAR) against known signatures. Files with stripped headers, custom formats, or multi-format containers (e.g., a ZIP inside a DOCX) may not match. Some formats like plain text or CSV have no magic bytes at all. Polyglot files - valid as multiple formats simultaneously - will match only the first signature found. For definitive identification, combine magic bytes with entropy analysis and string extraction.

Both compressed and encrypted data exhibit high entropy (above 7.5 bits/byte), but their byte frequency histograms differ subtly. Truly encrypted data (AES-256-CBC output) produces an almost perfectly flat histogram with chi-squared values near 255. Compressed data (DEFLATE, LZ77) tends to show slight peaks around certain byte values due to Huffman coding artifacts. The chi-squared statistic in this tool quantifies that difference: values below 300 lean toward encrypted, values above 350 suggest compression with residual structure.

The default threshold is 4 printable ASCII characters (bytes 0x20 - 0x7E). This is the standard used by the Unix strings utility and balances signal-to-noise. A threshold of 3 produces excessive false positives from random byte coincidences. A threshold of 6 or higher is useful for executables where you want only meaningful identifiers, function names, or embedded URLs. For firmware analysis, 8+ reduces noise from lookup tables.

The byte heatmap renders each byte as a colored pixel in a grid, mapping value 0x00 to dark and 0xFF to bright. Structural boundaries become visible as sharp color transitions. Header regions (often low-entropy metadata) appear as ordered patterns, while compressed or encrypted payloads appear as noise. Padding regions (runs of 0x00 or 0xFF) show as solid bands. This visual approach lets you identify appended data, embedded files, or section boundaries faster than scrolling through a hex dump.

The tool caps input at 100 MB to prevent browser tab crashes from memory pressure. Files above 1 MB automatically offload entropy calculation, frequency analysis, and string extraction to a Web Worker to keep the UI responsive. For the hex dump view, only 64 KB sections are rendered at a time with pagination to avoid DOM bloat. If you need to analyze larger binaries, split them with a tool like dd or split and analyze sections independently.