About

Character naming in fiction, database seeding for QA, and anonymization of PII datasets all share one constraint: the surnames must be statistically plausible. A purely invented string like "Zxqvt" fails the plausibility test. This generator draws from a curated pool of 800+ verified surnames spanning Anglo-Saxon, Germanic, Slavic, Romance, Nordic, East Asian, South Asian, Middle Eastern, African, and Latin American origins. Selection uses a Fisher-Yates shuffle to guarantee uniform distribution with no index bias. Duplicate suppression within each batch is enforced, so requesting n names always yields n distinct results (up to pool size). The tool does not fabricate phoneme combinations. Every surname exists in real census or genealogical records.

Limitations: origin categories are broad groupings, not precise ethnolinguistic classifications. A surname tagged "East Asian" may be Chinese, Korean, or Japanese. Regional frequency weighting is not applied - a rare surname has equal selection probability to a common one within its origin group. For demographic-accurate frequency distributions, consult national census microdata (e.g., U.S. Census Bureau surname list ranked by frequency).

Formulas

Surname selection uses the Fisher-Yates (Knuth) shuffle algorithm to produce an unbiased random permutation of the filtered pool, then slices the first n elements:

for i = m − 1 down to 1
j = floor(random() × (i + 1))
swap(pool[i], pool[j])

The probability of any single surname appearing in position k is uniform:

P(name_k) = 1m

Where m = size of the filtered surname pool, n = requested quantity (1 ≤ n ≤ min(100, m)), j = random index in the range [0, i]. This guarantees each of the m! permutations is equally likely, avoiding modulo bias present in naive implementations.

Reference Data

Origin	Example Surnames	Pool Size	Typical Regions
Anglo-Saxon	Smith, Clarke, Fletcher, Thatcher, Hayward	100	England, Australia, Canada
Germanic	Müller, Schneider, Fischer, Braun, Hoffmann	80	Germany, Austria, Switzerland
Celtic	O'Brien, Sullivan, MacLeod, Brennan, Gallagher	70	Ireland, Scotland, Wales
Slavic	Novak, Petrov, Kowalski, Horvat, Volkov	80	Poland, Russia, Czech Republic, Croatia
Romance	Rossi, Moreau, García, Ferreira, Dumont	90	Italy, France, Spain, Portugal
Nordic	Lindgren, Johansson, Haugen, Virtanen, Andersen	70	Sweden, Norway, Denmark, Finland
East Asian	Tanaka, Kim, Wang, Chen, Nakamura	80	Japan, Korea, China, Taiwan
South Asian	Sharma, Patel, Das, Perera, Khan	70	India, Pakistan, Sri Lanka, Bangladesh
Middle Eastern	Al-Rashid, Hashemi, Yilmaz, Khoury, Sadiq	70	Turkey, Iran, Lebanon, Saudi Arabia
African	Okafor, Mensah, Diallo, Mbeki, Nkomo	60	Nigeria, Ghana, Senegal, South Africa
Latin American	Hernández, Castillo, Vargas, Mendoza, Ríos	70	Mexico, Colombia, Argentina, Peru

Frequently Asked Questions

The generator shuffles the entire filtered pool in-place using Fisher-Yates, then slices the first n entries. Since a permutation never repeats an element, every name in a batch is guaranteed unique - provided n does not exceed pool size m. If n > m, the tool automatically clamps to m and notifies you.

No. Each surname in a given origin pool has equal selection probability of 1m. This is intentional for fiction and testing use cases where rare names are as valuable as common ones. For frequency-weighted selection, you would need census rank data (e.g., U.S. Census top 1000 surnames by count).

The surnames are real and appear in public records, so they satisfy plausibility requirements. However, GDPR-compliant anonymization (per Recital 26) requires that re-identification is not "reasonably likely." Using a random surname alone is insufficient - you must also randomize first names, dates, and ensure no combination maps to a real individual in your dataset. This tool provides one component of a broader anonymization pipeline.

Origin tags reflect the primary linguistic and geographic tradition of each surname. "Müller" is tagged Germanic (German for "miller") even though bearers exist worldwide. Cross-cultural surnames (e.g., "Lee" appears in English and Chinese contexts) are assigned to the origin where the spelling variant is most prevalent. These are broad groupings, not precise ethnolinguistic classifications.

If you request n = 100 names but the combined pool of selected origins contains only m = 70 unique entries, the tool clamps output to 70 and displays a notification. This prevents duplicate injection, which would violate the uniqueness guarantee.

JavaScript's Math.random() uses a PRNG (typically xorshift128+ in V8). While not cryptographically secure, it provides sufficient uniformity for name selection. The Fisher-Yates algorithm avoids modulo bias by using Math.floor(Math.random() × (i + 1)) rather than modulo arithmetic.