User Rating 0.0 ★★★★★

Total Usage 0 times

Category Dev Generators

Number of Events Total events (NEW + UPDATED), not patients. Range: 1–100,000

Start Date Earliest timestamp for NEW events

End Date Latest timestamp for NEW events

Output Format NDJSON for streaming/piping, JSON Array for file import

Positivity Rate (%) Percentage of tests returning confirmed positive

Case Fatality Rate (%) Percentage of confirmed cases resulting in death (age-adjusted internally)

Presets:

Generated Output

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Testing pipelines, streaming demos, and healthcare dashboards require volumetric event data that respects epidemiological state transitions. A random JSON blob fails here. Each patient record must follow a lifecycle: a NEW event (test administered, result pending) must precede any UPDATED event (confirmed, negative, recovered, dead). Confirmed cases must have a non-zero probability of generating a subsequent recovery or death event within a realistic time window of 5 - 30 days. This generator implements a weighted Markov state machine where transition probabilities approximate CDC aggregate ratios: roughly 40% positivity, 2% case fatality among confirmed. Dates, names, locations, and demographics are synthesized from embedded dictionaries without external API calls. Output is valid NDJSON or JSON array, suitable for piping into Kafka, curl, or jq.

Limitations: demographic distributions are US-centric. Age-weighted severity is simplified to two tiers (<60 and ≥60). Geographic data is not correlated to real outbreak hotspots. The tool approximates aggregate statistics and does not model transmission dynamics, R-number, or hospital capacity. For load-testing a streaming consumer or populating a dashboard prototype, this level of fidelity is sufficient. For epidemiological research, it is not.

Formulas

Each patient follows a Markov chain with weighted transitions. The probability of terminal state S given a confirmed case depends on age bracket:

P(S = dead | confirmed) =

{

0.02 if age < 600.08 if age ≥ 60

Age generation uses the Box-Muller transform to approximate a normal distribution:

age = round(μ + σ ⋅ √−2 ⋅ ln(u₁) ⋅ cos(2πu₂))

Where μ = 45 (mean age), σ = 18 (standard deviation), and u₁, u₂ are uniform random values in (0, 1). Result is clamped to [1, 99].

UUID v4 generation uses crypto.getRandomValues to fill 16 random bytes, then sets version bits (0100) at byte 6 and variant bits (10) at byte 8 per RFC 4122.

The state machine transition diagram:

NEW(pending) → UPDATED(confirmed | negative | probable) → UPDATED(recovered | dead | hospitalized)

Total event count per patient ranges from 2 (NEW + single UPDATED) to 3 (NEW + confirmed + outcome). The user-specified generation amount N refers to total events, not patients. Approximate patient count: P ≈ N2.2 (mean events per patient lifecycle).

Reference Data

Event Type	Status Value	Probability Weight	Follows	Delay Range	Description
NEW	`pending`	100% (initial)	-	0 days	Patient symptomatic, test administered, awaiting result
UPDATED	`confirmed`	40%	`pending`	1 - 14 days	Positive test result returned
UPDATED	`negative`	55%	`pending`	1 - 14 days	Negative test result returned
UPDATED	`probable`	5%	`pending`	1 - 14 days	Probable case, clinical diagnosis without conclusive test
UPDATED	`recovered`	85% of confirmed	`confirmed`	7 - 30 days	Patient recovered and released
UPDATED	`dead`	2% (age < 60), 8% (age ≥ 60)	`confirmed`	5 - 25 days	Patient deceased
UPDATED	`hospitalized`	13% of confirmed (remainder)	`confirmed`	2 - 10 days	Patient hospitalized, outcome pending further events
Output Field Reference
`event_id`		UUID v4 - unique per event record
`patient_id`		UUID v4 - consistent across all events for same patient
`event_type`		`NEW` or `UPDATED`
`status`		One of: `pending`, `confirmed`, `negative`, `probable`, `recovered`, `dead`, `hospitalized`
`patient.first_name`		Synthesized from embedded dictionary (~200 entries)
`patient.last_name`		Synthesized from embedded dictionary (~200 entries)
`patient.age`		Weighted normal distribution, μ = 45, σ = 18, clamped 1 - 99
`patient.gender`		`M`, `F`, or `X` (weighted 48/48/4)
`patient.phone`		US format: `(XXX) XXX-XXXX`
`location.city`		From embedded US city list (~120 entries)
`location.state`		US state abbreviation
`location.zip`		5-digit code matching state range
`timestamp`		ISO 8601 within configured date range
`symptomatic`		Boolean - 70% true for confirmed, 30% for negative

Frequently Asked Questions

Every patient receives a stable patient_id (UUID v4) at creation. The NEW event is always generated first with status pending. Subsequent UPDATED events reference the same patient_id and carry monotonically increasing timestamps. The generator processes all lifecycle stages for a patient before moving to the next, so piping the output preserves causal ordering. If you shuffle the output, you can re-sort by patient_id then timestamp to restore the correct event sequence.

The second-tier transitions (recovered, dead, hospitalized) apply only to patients whose first UPDATED status is confirmed (approximately 40% of all patients). Among those, the fatality rate is age-dependent: 2% for age < 60, 8% for age ≥ 60. The hospitalized category captures the remainder after subtracting dead and recovered percentages. This means roughly 13% of confirmed cases appear as hospitalized with no final resolution, simulating an open-ended data stream where not all outcomes are reported.

Yes. The NDJSON (Newline Delimited JSON) format produces one valid JSON object per line with no wrapping array brackets. This is the standard input format for kafka-console-producer, jq streaming mode (jq -c "."), and curl POST loops. The JSON Array format is better suited for file-based import into databases or REST API bulk endpoints. Copy the output or download the file and pipe it directly.

Phone numbers follow the US format (XXX) XXX-XXXX with area codes drawn from a realistic range (200-999, excluding reserved prefixes like 555). ZIP codes are generated within plausible ranges for each state (e.g., New York ZIPs start with 100xx-149xx). They are structurally valid but not guaranteed to correspond to real addresses. For privacy reasons, no real PII is embedded in the dictionaries.

Generation above 500 events is offloaded to a Web Worker to prevent UI freezing. The progress bar updates in real time. The practical upper limit is around 100,000 events before browser memory constraints on the resulting string become a concern (approximately 80MB of JSON). For very large datasets, download the file rather than previewing in-browser, as rendering large text blocks into the DOM is the bottleneck, not generation itself.

The start and end dates define the window for NEW event timestamps. Each NEW event receives a random timestamp within this range. Subsequent UPDATED events are offset forward by their delay range (1 - 14 days for test results, 5 - 30 days for outcomes). This means UPDATED events can have timestamps that exceed the configured end date, which mirrors real-world data where a test administered on the last day of a reporting period returns results days later.