CLOC CSV to JSON Converter
Convert CLOC CSV output to hierarchical JSON tree or flat JSON format. Supports drag-and-drop, paste, and file upload for cloc reports.
About
The cloc utility produces CSV reports listing language, filename, blank, comment, and code counts per source file. That flat tabular format is useless for tree-based visualizations such as CodeFlower, D3 sunbursts, or treemaps, which require nested JSON with name and children properties. This converter parses each file path, splits on directory separators, and recursively inserts nodes into a hierarchical tree where every leaf carries a size equal to total lines (blank + comment + code). Directories that contain only a single child are optionally collapsed to reduce visual noise. The tool handles both Windows (\) and POSIX (/) path separators. Limitation: CLOC's summary rows (those without a file path) are silently skipped. If your CSV uses a non-standard delimiter or was generated by a cloc fork, verify the header row matches the canonical format before conversion.
Formulas
Each row in the CLOC CSV is parsed and the file path is decomposed into a tree. The total size for each leaf node is computed as:
Where blank is the count of blank lines, comment is the count of comment lines, and code is the count of code lines reported by cloc for that file.
The tree insertion algorithm operates as follows for each file path p:
For each directory segment di, the algorithm searches the current node's children array for a matching name. If found, it descends into that node. If not found, it creates a new branch node {name: di, children: []} and appends it. The final segment f becomes a leaf node with size, language, blank, comment, and code properties. Time complexity per file: O(k ⋅ c) where k is path depth and c is max children at each level.
Reference Data
| CLOC CSV Column | Type | JSON Mapping (Tree) | JSON Mapping (Flat) | Notes |
|---|---|---|---|---|
| language | String | Stored on leaf node as language | language field | e.g. JavaScript, Python |
| filename | String (path) | Split into nested children hierarchy | filename field | Supports / and \ separators |
| blank | Integer | Stored on leaf node | blank field | Blank line count |
| comment | Integer | Stored on leaf node | comment field | Comment line count |
| code | Integer | Stored on leaf node | code field | Code line count |
| (computed) | Integer | size = blank + comment + code | total field | Used for visualization sizing |
| Root name | String | Top-level name defaults to "root" | N/A | Configurable in the tool |
| Header row | N/A | Skipped during parsing | Skipped | Must match canonical cloc format |
| Summary rows | N/A | Skipped (no filename) | Skipped | Rows with SUM or empty filename |
| Path depth | Integer | Determines nesting level in tree | N/A | Deeper paths → more nesting |
| Duplicate dirs | N/A | Merged into single node | N/A | Files in same dir share parent |
| Output format | JSON | Hierarchical: {name, children} | Array: [{...}, ...] | Two output modes available |
| File size limit | N/A | Max 50 MB input | Browser memory constraint | |
| Encoding | UTF-8 | Assumed UTF-8 | Other encodings may cause errors | |
| CLOC version | Any | Compatible with cloc 1.x - 2.x | Canonical 5-column CSV format | |
Frequently Asked Questions
cloc --csv --by-file. The --by-file flag is critical; without it, CLOC only reports per-language summaries which lack individual file paths and cannot be converted to a tree. Header rows and summary rows (containing "SUM" or lacking a filename) are automatically filtered out.--by-file flag, every row is a language summary with no file path, and the converter will produce an empty tree. Additionally, rows where blank, comment, or code cannot be parsed as integers are skipped. Check the toast notifications for a count of skipped rows.src\utils\helpers.js is treated identically to src/utils/helpers.js. Leading drive letters (e.g., C:\) or leading slashes are stripped to avoid creating empty root nodes.cloc --exclude-dir=vendor,node_modules before conversion.d3.hierarchy(): a root object with name (string) and children (array). Leaf nodes carry a size property instead of children. You can directly feed the output JSON into d3.hierarchy(data).sum(d => d.size) for rendering.