Analyze Binary Data
Analyze binary files and hex data: entropy calculation, byte frequency, magic bytes detection, hex dump, string extraction. All client-side.
About
Binary analysis quantifies the statistical and structural properties of raw byte sequences. Misidentifying a file format, overlooking embedded payloads, or misjudging entropy can lead to failed forensic investigations, missed malware signatures, or corrupted data recovery. This tool computes Shannon entropy H across the full byte space (0x00 - 0xFF), maps frequency distributions, detects file signatures via magic byte matching against 80+ known headers, extracts printable ASCII strings, and renders a standard hex dump. All computation runs locally in the browser with no server transmission.
Entropy values near 0 indicate uniform or repetitive data. Values approaching 8.0 bits/byte suggest compression or encryption. The byte frequency histogram reveals structural patterns invisible in raw hex: null-byte padding, ASCII text regions, or high-entropy encrypted blocks. Note: magic byte detection relies on header matching only and does not validate full file structure. Files with stripped or spoofed headers will not be identified correctly. Pro tip: compare entropy across file sections to locate embedded archives or appended payloads within a single binary.
Formulas
Shannon entropy measures the average information content per byte. For a data stream of N total bytes where byte value i occurs ni times:
where the probability of each byte value is:
where H = Shannon entropy in bits/byte, p(i) = probability of byte value i occurring, ni = count of byte value i, N = total number of bytes. The result ranges from 0 (all bytes identical) to 8.0 (perfectly uniform distribution across all 256 possible byte values). Chi-squared (Ο2) goodness-of-fit tests randomness against a uniform distribution. The expected frequency for each byte in a truly random stream is:
where Ο2 β 255 indicates a near-random distribution. Values significantly above 255 suggest non-random structure.
Reference Data
| File Type | Magic Bytes (Hex) | ASCII | Typical Entropy | Extension |
|---|---|---|---|---|
| PNG Image | 89 50 4E 47 0D 0A 1A 0A | .PNG.... | 7.5 - 7.9 bits/byte | .png |
| JPEG Image | FF D8 FF | ΓΏΓΓΏ | 7.4 - 7.9 bits/byte | .jpg/.jpeg |
| GIF Image | 47 49 46 38 | GIF8 | 6.0 - 7.5 bits/byte | .gif |
| PDF Document | 25 50 44 46 2D | %PDF- | 6.5 - 7.8 bits/byte | |
| ZIP Archive | 50 4B 03 04 | PK.. | 7.6 - 8.0 bits/byte | .zip |
| GZIP Archive | 1F 8B | .. | 7.8 - 8.0 bits/byte | .gz |
| RAR Archive | 52 61 72 21 1A 07 | Rar!.. | 7.8 - 8.0 bits/byte | .rar |
| 7-Zip Archive | 37 7A BC AF 27 1C | 7z.... | 7.9 - 8.0 bits/byte | .7z |
| ELF Executable | 7F 45 4C 46 | .ELF | 5.0 - 7.0 bits/byte | (none) |
| PE/EXE (MZ) | 4D 5A | MZ | 5.5 - 7.5 bits/byte | .exe/.dll |
| Mach-O (64-bit) | CF FA ED FE | .... | 5.0 - 7.0 bits/byte | (none) |
| BMP Image | 42 4D | BM | 3.0 - 6.0 bits/byte | .bmp |
| WAV Audio | 52 49 46 46 | RIFF | 6.0 - 7.5 bits/byte | .wav |
| MP3 Audio | FF FB or 49 44 33 | .. or ID3 | 7.5 - 7.9 bits/byte | .mp3 |
| MP4 Video | 00 00 00 .. 66 74 79 70 | ....ftyp | 7.5 - 8.0 bits/byte | .mp4 |
| WebP Image | 52 49 46 46 .. .. .. .. 57 45 42 50 | RIFF....WEBP | 7.0 - 7.8 bits/byte | .webp |
| SQLite Database | 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 | SQLite format 3. | 4.0 - 6.5 bits/byte | .db/.sqlite |
| TIFF Image (LE) | 49 49 2A 00 | II*. | 6.0 - 7.5 bits/byte | .tiff |
| OGG Audio | 4F 67 67 53 | OggS | 7.5 - 7.9 bits/byte | .ogg |
| FLAC Audio | 66 4C 61 43 | fLaC | 7.0 - 7.8 bits/byte | .flac |
| Class (Java) | CA FE BA BE | .... | 5.5 - 7.0 bits/byte | .class |
| WASM Binary | 00 61 73 6D | .asm | 5.0 - 7.0 bits/byte | .wasm |
| TAR Archive | 75 73 74 61 72 (at offset 257) | ustar | 4.0 - 6.0 bits/byte | .tar |
| XML Document | 3C 3F 78 6D 6C | <?xml | 4.0 - 5.5 bits/byte | .xml |
| Plain Text (ASCII) | (no magic bytes) | - | 3.5 - 5.0 bits/byte | .txt |
Frequently Asked Questions
strings utility and balances signal-to-noise. A threshold of 3 produces excessive false positives from random byte coincidences. A threshold of 6 or higher is useful for executables where you want only meaningful identifiers, function names, or embedded URLs. For firmware analysis, 8+ reduces noise from lookup tables.dd or split and analyze sections independently.