About

Mixing numeral systems inside a single document causes rendering inconsistencies, breaks sorting algorithms, and produces search-index mismatches. Persian text uses Eastern Arabic-Indic digits ۰ - ۹ (Unicode U+06F0 - U+06F9), while standard Arabic text uses a distinct set ٠ - ٩ (U+0660 - U+0669). Western digits 0 - 9 (U+0030 - U+0039) occupy a third, incompatible range. This converter performs direct codepoint remapping across all three systems. It does not transliterate or approximate. Each input character is tested against a regex class and replaced via a lookup offset, preserving all non-digit content unchanged.

The tool supports bidirectional conversion: English → Persian, English → Arabic, Persian → English, and Arabic → English. Note: Persian and Arabic digit glyphs may appear identical in some fonts, but they occupy different Unicode blocks and are not interchangeable in database queries or programmatic comparisons. Always verify the target system's expected codepoint range before deployment.

Formulas

The conversion uses a direct Unicode codepoint offset. For English → Persian digits:

C_persian = char(code(C_english) − 0x0030 + 0x06F0)

For English → Arabic-Indic digits:

C_arabic = char(code(C_english) − 0x0030 + 0x0660)

For reverse conversion (Persian → English):

C_english = char(code(C_persian) − 0x06F0 + 0x0030)

Where C is the character, code returns the Unicode codepoint (via charCodeAt), and char produces the character from a codepoint (via String.fromCharCode). The regex /[0-9]/g isolates only Western digits. Non-digit characters pass through unmodified. The offset between Western and Persian is 1776 (0x06F0 − 0x0030). The offset between Western and Arabic-Indic is 1584 (0x0660 − 0x0030).

Reference Data

Digit	English (Western)	Unicode (Western)	Persian (Eastern Arabic-Indic)	Unicode (Persian)	Arabic-Indic	Unicode (Arabic)
0	0	U+0030	۰	U+06F0	٠	U+0660
1	1	U+0031	۱	U+06F1	١	U+0661
2	2	U+0032	۲	U+06F2	٢	U+0662
3	3	U+0033	۳	U+06F3	٣	U+0663
4	4	U+0034	۴	U+06F4	٤	U+0664
5	5	U+0035	۵	U+06F5	٥	U+0665
6	6	U+0036	۶	U+06F6	٦	U+0666
7	7	U+0037	۷	U+06F7	٧	U+0667
8	8	U+0038	۸	U+06F8	٨	U+0668
9	9	U+0039	۹	U+06F9	٩	U+0669
Extended: Common Symbols Preserved Across Conversions
Decimal	.	U+002E	٫	U+066B	٫	U+066B
Thousands	,	U+002C	٬	U+066C	٬	U+066C
Percent	%	U+0025	٪	U+066A	٪	U+066A

Frequently Asked Questions

Persian digits (۰-۹, U+06F0 - U+06F9) and Arabic-Indic digits (٠-٩, U+0660 - U+0669) occupy distinct Unicode blocks. While visually similar in many fonts, they differ at codepoints 4 (۴ vs ٤), 5 (۵ vs ٥), and 6 (۶ vs ٦). Database queries, string comparisons, and search engines treat them as separate characters. Using the wrong set causes lookup failures in Persian-locale applications.

No. The regex targets only digit-class characters ([0-9], [۰-۹], or [٠-٩]). All other characters - letters, punctuation, whitespace, emoji, and RTL markers - pass through without modification. The replacement function operates character-by-character using a codepoint offset.

Yes. The "Any → English" mode matches both Persian (U+06F0 - U+06F9) and Arabic-Indic (U+0660 - U+0669) digit ranges simultaneously and maps each back to the Western equivalent (U+0030 - U+0039). A string like "۱۲٣٤" would produce "1234".

Font rendering determines glyph appearance. Many system fonts use the same glyph for both ranges at digits 0-3 and 7-9. Only digits 4, 5, and 6 have visually distinct forms in most traditional Arabic vs Persian typography. Use a font like "Scheherazade" or "Vazirmatn" to see the differences. The underlying codepoints always differ regardless of visual appearance.

The core digit conversion does not alter separators by default, since they are not digit characters. The extended mode optionally converts the Western decimal point (U+002E) to the Arabic decimal separator (U+066B) and the Western comma (U+002C) to the Arabic thousands separator (U+066C). Enable "Convert separators" to activate this feature.

Yes. The converter does not insert or remove Unicode bidirectional control characters (LRM, RLM, LRE, RLE). It only replaces digit codepoints. Existing directional formatting in your text is preserved. However, switching digit systems may change how the browser's BiDi algorithm reorders number sequences within RTL paragraphs.