User Rating 0.0 ★★★★★

Total Usage 0 times

Category Text Analysis

Paste or type your text

0 / 1,000,000 chars

Filter specific number (optional) Leave empty to count all numbers

Sort by:

#	Number	Count	Frequency	Bar

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Extracting and counting numeric values from unstructured text is a prerequisite for data validation, audit trails, and statistical pre-processing. A miscount in financial reconciliation or inventory data can cascade into material discrepancies. This tool parses arbitrary text using pattern matching to isolate every numeric token - integers, decimals, and negatives - then aggregates occurrences into a frequency distribution. It handles edge cases such as leading zeros (007 vs 7), negative values (−3.5), and decimal precision. The extraction regex targets tokens matching n = match(/−?\d+(\.\d+)?/g), ensuring standalone numbers within sentences, CSV rows, or log files are captured without false positives from version strings or dates.

Accuracy depends on input formatting. Numbers embedded in words (e.g., "abc123def") will be extracted as 123. Phone numbers and zip codes will be tokenized as individual digit groups. This tool approximates general-purpose numeric extraction; domain-specific parsing (timestamps, IP addresses) requires dedicated validators. For datasets exceeding 100,000 characters, processing is chunked to prevent browser thread blocking.

Formulas

The occurrence count for each distinct number n_i in the extracted set S is computed as:

count(n_i) = |S|∑j=1

{

1 if normalize(s_j) = n_i0 otherwise

The normalization function strips leading zeros and trailing decimal zeros:

normalize(s) = parseFloat(s).toString()

Relative frequency (percentage) for each number:

f_rel(n_i) = count(n_i)|S| × 100%

Where S is the complete set of extracted numeric tokens, |S| is the total count of all extracted numbers, and n_i represents each unique normalized number.

Reference Data

Input Pattern	Extracted Numbers	Count	Notes
"Score: 42, 42, 85"	42, 42, 85	3	Duplicates counted separately
"Temperature: -5.3°C"	−5.3	1	Negative decimals supported
"Version 3.14.159"	3.14, .159 → 3, 14, 159	3	Dots split tokens by context
"ID: 007, 7"	7, 7	2	Leading zeros normalized
"No numbers here"	-	0	Empty result set
"10 20 10 30 10 20"	10×3, 20×2, 30×1	6	Frequency ranking applied
"Price: $1,234.56"	1, 234.56	2	Commas break tokens
"Coordinates: 51.5074, -0.1278"	51.5074, −0.1278	2	Full decimal precision
"abc123def456ghi"	123, 456	2	Embedded numbers extracted
"0 0 0 1 1 2"	0×3, 1×2, 2×1	6	Zeros are valid numbers
"-1 -1 -2 +3"	−1×2, −2×1, 3×1	4	Sign-aware extraction
"3.0 vs 3.00 vs 3"	3×3	3	Trailing zeros normalized

Frequently Asked Questions

All extracted numbers are normalized through parseFloat() conversion. This means 007 becomes 7 and 0042 becomes 42. Both are then counted as occurrences of the same canonical value. If you need to preserve leading-zero formatting (e.g., postal codes), this tool is not the right fit - use a string-matching tool instead.

No. The normalization step converts all three to the canonical string representation 3. Trailing decimal zeros carry no mathematical significance, so 3.0 and 3.00 are counted as identical to 3. This prevents false frequency splits in datasets where decimal formatting is inconsistent.

The extraction regex recognizes a minus sign directly preceding digits as part of the number. So −5.3 is extracted as a single token with value −5.3. Plus signs (+) are not included in the regex match, so +3 extracts as 3. If your data uses + as a meaningful sign, note that 3 and +3 will be counted as the same value.

Commas are treated as delimiters, not thousand separators. The text 1,234,567 yields three separate numbers: 1, 234, and 567. This is a deliberate design choice because the tool cannot distinguish between thousand-separator commas and list-separator commas without locale context. Pre-process your text to remove thousand separators if needed.

The tool accepts up to 1,000,000 characters. For inputs exceeding 100,000 characters, processing is chunked using asynchronous scheduling to prevent the browser from becoming unresponsive. On a typical modern device, 500,000 characters process in under 2 seconds. If you hit memory limits, split your dataset into smaller segments.

Yes. Use the optional filter field to specify a target number. When set, the results highlight that number's occurrence count, frequency percentage, and positions. The full frequency table is still generated, but the target number is visually emphasized and pinned to the top of the results. Leave the filter empty to analyze all numbers.

Results can be sorted by three criteria: frequency (descending, default), numeric value (ascending), or first appearance order. Click the column headers to toggle sort direction. Frequency sorting surfaces the most common numbers first, which is useful for outlier detection. Value sorting is better for identifying gaps in sequential data.