About

Testing observability pipelines with production data is a security risk. Sharing real logs exposes IPs, usernames, session tokens, and internal infrastructure topology. This generator produces structurally valid log entries across multiple formats - Apache Combined, Syslog RFC 5424, JSON (Elastic Common Schema), NGINX access, and NDJSON - with realistic field distributions. Severity levels follow configurable weighted random selection so your ERROR to INFO ratio mirrors actual systems (typically 1:50). Timestamps distribute uniformly across a user-defined window and sort chronologically. HTTP status codes follow a power-law distribution: ~78% are 2xx, ~12% are 3xx, ~7% are 4xx, and ~3% are 5xx, matching real-world traffic patterns.

The tool also generates OpenTelemetry-compatible trace and span IDs (32-char and 16-char hex strings respectively) and time-series metric data using a random walk algorithm with configurable drift d and variance σ². All generation runs client-side. No data leaves your browser. Output is limited to 100,000 lines per batch due to browser memory constraints. For load testing at scale, generate multiple batches and concatenate. Note: generated IPs use full 0.0.0.0 - 255.255.255.255 range without filtering reserved blocks - filter post-generation if realism in that dimension matters.

Formulas

Timestamps are generated using uniform random distribution within the specified window, then sorted ascending to simulate chronological log flow:

t_i = t_start + rand() ⋅ (t_end − t_start)

Where t_start and t_end are Unix epoch milliseconds of the user-defined range, and rand() produces a value in [0, 1).

HTTP status code selection uses weighted random sampling. Given weights w_k for each status category, the cumulative distribution function determines selection:

P(status = k) = w_kn∑i=1 w_i

Metric data uses a bounded random walk to produce realistic time-series:

v_t+1 = clamp(v_t + d + σ ⋅ N(0,1), min, max)

Where d = drift (trend bias), σ = volatility, and N(0,1) is a standard normal deviate produced via the Box-Muller transform: z = √−2 ln(u₁) ⋅ cos(2πu₂).

UUID v4 generation follows RFC 4122: 122 random bits with version nibble set to 0100 and variant bits to 10.

Reference Data

Format	Standard	Fields Generated	Use Case	Example System
Apache Combined	CLF + Referer/UA	IP, identity, user, timestamp, method, path, protocol, status, bytes, referer, user-agent	Web server access log testing	Apache HTTPD, HAProxy
NGINX Access	Custom (default format)	IP, timestamp, method, path, protocol, status, bytes, referer, user-agent, request_time	Reverse proxy log analysis	NGINX, OpenResty
Syslog RFC 5424	RFC 5424	Priority, version, timestamp, hostname, app-name, procid, msgid, structured-data, message	System/daemon log ingestion	rsyslog, syslog-ng
JSON (ECS)	Elastic Common Schema 1.x	@timestamp, log.level, message, host.name, service.name, trace.id, span.id, event.dataset	Structured log pipelines	Elasticsearch, Datadog
NDJSON	Newline Delimited JSON	timestamp, level, msg, pid, hostname, req_id, duration_ms, caller	Streaming JSON ingestion	Bunyan, Pino, Loki
Metric (Prometheus)	Prometheus Exposition	metric_name, labels, value, timestamp_ms	Time-series metric testing	Prometheus, Grafana
CSV	RFC 4180	Configurable columns matching selected format	Spreadsheet/DB import testing	Excel, PostgreSQL COPY

Frequently Asked Questions

IP addresses are randomly generated across the full IPv4 range (0.0.0.0 - 255.255.255.255) without filtering RFC 1918 private ranges or IANA reserved blocks. User agents are sampled from a curated pool of 20+ real-world browser strings (Chrome, Firefox, Safari, Edge, bots) with weighted distribution favoring Chrome (~65%). For GDPR-compliant testing, the IPs are synthetic and carry no PII risk.

JSON format follows Elastic Common Schema 1.x field naming (@timestamp, log.level, host.name, trace.id). Syslog output conforms to RFC 5424 structure including PRI calculation: PRI = facility × 8 + severity. Both will pass basic structural validation. However, semantic consistency (e.g., trace IDs correlating across spans) is not enforced - each line is independently generated.

The tool caps at 100,000 lines per batch. Generation runs in a Web Worker to prevent UI freezing. At 100K lines of Apache Combined format, output is approximately 25 - 35 MB. JSON format will be larger (40 - 60 MB). If your browser tab crashes, reduce the count or use the download option which streams to a Blob rather than rendering to the DOM.

You can adjust relative weights for each severity level (DEBUG, INFO, WARN, ERROR, FATAL). The default distribution mirrors production systems: INFO 60%, DEBUG 20%, WARN 12%, ERROR 6%, FATAL 2%. Weights are normalized to probabilities. Setting all weights equal produces uniform distribution. Setting ERROR to 100 and others to 0 generates error-only logs for fault-injection testing.

Trace IDs are 32-character lowercase hexadecimal strings and span IDs are 16-character lowercase hex, matching the W3C Trace Context format. Each log line receives a unique trace/span pair. The tool does not generate correlated spans (parent-child relationships) - every line represents an independent trace root. For distributed tracing pipeline testing, you would need to post-process and assign shared trace IDs across related entries.

Apache/NGINX use CLF format: [dd/Mon/yyyy:HH:mm:ss +0000]. Syslog uses ISO 8601 with millisecond precision. JSON/@timestamp uses ISO 8601 with UTC timezone designator (Z). All timestamps are generated in UTC. The date range picker operates in your local timezone but converts to UTC for generation. If you need a specific timezone offset, the Syslog format includes the offset field in structured data.