User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Web server access logs in Common Log Format (CLF) are dense, single-line records defined by the NCSA. Each line encodes the remote host (h), identity (l), authenticated user (u), timestamp (t), HTTP request (r), status code (s), and response size in bytes (b). The Combined variant appends the Referer and User-Agent headers. Parsing these with naive string splitting fails on edge cases: quoted strings containing spaces, missing byte counts represented as -, and date formats that vary between servers. Malformed entries corrupt downstream analytics pipelines silently. This tool applies a strict regular expression against the NCSA specification, flags unparseable lines with their line numbers, and outputs validated JSON with ISO 8601 dates.

The converter handles both CLF and Combined formats automatically. Byte values of - become null per convention. Status codes are cast to integers. Timestamps are normalized to UTC ISO 8601 regardless of the source timezone offset. Note: this tool approximates the CLF spec as implemented by Apache mod_log_config and Nginx log_format. Custom log formats with non-standard field ordering will not parse correctly.

clf common-log-format json log-parser apache-logs nginx-logs web-server-logs combined-log-format log-converter access-log

Formulas

The Common Log Format line is matched against the following regular expression pattern. Each capture group maps to a JSON field.

Pattern: ^(\S+) (\S+) (\S+) \[([^\]]+)\] "([^"]+)" (\d{3}) (\S+)

For Combined Log Format, two additional quoted fields are appended:

Extended: ... "([^"]*)") "([^"]*)"

The date field requires secondary parsing. The standard CLF timestamp format is:

dd/Mon/yyyy:HH:mm:ss ยฑhhmm

This is decomposed and reconstructed as an ISO 8601 string:

yyyyโˆ’MMโˆ’ddTHH:mm:ssZ

Where Mon is a three-letter English month abbreviation mapped to its zero-padded numeric index (01 - 12). The timezone offset ยฑhhmm is applied to convert to UTC. The byte field transformation follows:

{
parseInt(b, 10) if b โ‰  "-"null otherwise

Where b = raw byte string from the log entry. Status code s is always cast via parseInt(s, 10) to ensure numeric type in JSON output.

Reference Data

CLF FieldDirectiveJSON KeyTypeExample ValueNotes
Remote Host%hremoteHoststring192.168.1.1IPv4, IPv6, or hostname
Remote Logname%lremoteLogNamestring-Almost always - (identd disabled)
Auth User%uauthUserstring-- if no auth
Timestamp%tdatestring (ISO 8601)2024-01-15T08:23:45.000ZConverted from CLF bracket format
Request Line"%r"requeststringGET /index.html HTTP/1.1Full method + path + protocol
Status Code%>sstatusnumber200Final status after internal redirects
Response Bytes%bbytesnumber | null10305- โ†’ null
Referer"%{Referer}i"refererstringhttp://example.com/Combined format only
User-Agent"%{User-agent}i"userAgentstringMozilla/5.0 ...Combined format only
HTTP Method - methodstringGETExtracted from request (optional)
Request Path - pathstring/index.htmlExtracted from request (optional)
Protocol - protocolstringHTTP/1.1Extracted from request (optional)
Status Class - - - 2xx1xx=Info, 2xx=Success, 3xx=Redirect, 4xx=Client Error, 5xx=Server Error
CLF Date Format - - - [10/Oct/2000:13:55:36 -0700]dd/Mon/yyyy:HH:mm:ss ยฑhhmm
RFC Date Variant - - - [Wed, 11 Jun 2014 16:24:02 GMT]Some servers use RFC 2822 style
Apache Directive - - - LogFormat "%h %l %u %t \"%r\" %>s %b"Common format config
Nginx Directive - - - log_format combined ...Default is combined
Max Line Length - - - ~8192 bytes typicalServer-dependent buffer size
IPv6 Example%hremoteHoststring::1Localhost in IPv6
Null Bytes (%B)%Bbytesnumber0%B uses 0 instead of -

Frequently Asked Questions

Common Log Format (CLF) contains 7 fields: remote host, remote logname, auth user, timestamp, request, status, and bytes. Combined Log Format appends two additional quoted fields: the Referer header and the User-Agent header. This converter auto-detects both formats. If the line has trailing quoted strings after the byte count, they are captured as referer and userAgent in the JSON output.
CLF timestamps include a timezone offset like -0700 or +0530. The converter parses the offset hours and minutes, applies the difference to produce a UTC timestamp, and outputs the result in ISO 8601 format with the Z suffix. For example, 10/Oct/2000:13:55:36 -0700 becomes 2000-10-10T20:55:36.000Z. If the date is already in RFC 2822 style with GMT, no offset adjustment is needed.
Non-matching lines are silently skipped from the JSON output, but the converter reports the total count of skipped lines and their line numbers in a warning notification. Common causes include: custom log formats with reordered fields, malformed entries from crashed servers, or blank lines. Check that your server uses the standard LogFormat directive.
The CLF specification uses %b which outputs a literal dash (-) when zero bytes are sent. This is semantically distinct from %B, which outputs the number 0. The converter preserves this distinction: a dash becomes null in JSON, while a numeric 0 remains 0. If your logs use %B, all values will be numeric.
Yes. The remote host field regex (\S+) matches any non-whitespace sequence, which correctly captures IPv6 addresses like ::1 or 2001:db8::1. IPv6 addresses in CLF are not bracketed at the log level, so no special handling is needed beyond the greedy non-whitespace match.
The converter processes files entirely in the browser. For files under approximately 50,000 lines (roughly 10-15 MB), parsing occurs synchronously with near-instant results. For larger files, a Web Worker handles the parsing to prevent UI freezing. Practical limits depend on the device's available RAM. A 100 MB log file with ~500,000 entries will work on most modern devices with 4 GB+ RAM. For files exceeding 200 MB, consider splitting with a command-line tool like split first.
By default, the request field contains the full request line (e.g., GET /index.html HTTP/1.1). Enable the "Split Request Fields" option to additionally output method, path, and protocol as separate JSON keys. This is useful for filtering by HTTP method or aggregating by endpoint path in downstream analytics.