About

The cloc utility produces CSV reports listing language, filename, blank, comment, and code counts per source file. That flat tabular format is useless for tree-based visualizations such as CodeFlower, D3 sunbursts, or treemaps, which require nested JSON with name and children properties. This converter parses each file path, splits on directory separators, and recursively inserts nodes into a hierarchical tree where every leaf carries a size equal to total lines (blank + comment + code). Directories that contain only a single child are optionally collapsed to reduce visual noise. The tool handles both Windows (\) and POSIX (/) path separators. Limitation: CLOC's summary rows (those without a file path) are silently skipped. If your CSV uses a non-standard delimiter or was generated by a cloc fork, verify the header row matches the canonical format before conversion.

Formulas

Each row in the CLOC CSV is parsed and the file path is decomposed into a tree. The total size for each leaf node is computed as:

size = blank + comment + code

Where blank is the count of blank lines, comment is the count of comment lines, and code is the count of code lines reported by cloc for that file.

The tree insertion algorithm operates as follows for each file path p:

split(p, /) → [d₁, d₂, …, f]

For each directory segment d_i, the algorithm searches the current node's children array for a matching name. If found, it descends into that node. If not found, it creates a new branch node {name: d_i, children: []} and appends it. The final segment f becomes a leaf node with size, language, blank, comment, and code properties. Time complexity per file: O(k ⋅ c) where k is path depth and c is max children at each level.

Reference Data

CLOC CSV Column	Type	JSON Mapping (Tree)	JSON Mapping (Flat)	Notes
language	String	Stored on leaf node as language	language field	e.g. JavaScript, Python
filename	String (path)	Split into nested children hierarchy	filename field	Supports / and \ separators
blank	Integer	Stored on leaf node	blank field	Blank line count
comment	Integer	Stored on leaf node	comment field	Comment line count
code	Integer	Stored on leaf node	code field	Code line count
(computed)	Integer	size = blank + comment + code	total field	Used for visualization sizing
Root name	String	Top-level name defaults to "root"	N/A	Configurable in the tool
Header row	N/A	Skipped during parsing	Skipped	Must match canonical cloc format
Summary rows	N/A	Skipped (no filename)	Skipped	Rows with SUM or empty filename
Path depth	Integer	Determines nesting level in tree	N/A	Deeper paths → more nesting
Duplicate dirs	N/A	Merged into single node	N/A	Files in same dir share parent
Output format	JSON	Hierarchical: {name, children}	Array: [{...}, ...]	Two output modes available
File size limit	N/A	Max 50 MB input		Browser memory constraint
Encoding	UTF-8	Assumed UTF-8		Other encodings may cause errors
CLOC version	Any	Compatible with cloc 1.x - 2.x		Canonical 5-column CSV format

Frequently Asked Questions

The converter expects the canonical CLOC CSV output with exactly five columns: language, filename, blank, comment, code. This is the default output when running cloc --csv --by-file. The --by-file flag is critical; without it, CLOC only reports per-language summaries which lack individual file paths and cannot be converted to a tree. Header rows and summary rows (containing "SUM" or lacking a filename) are automatically filtered out.

Rows without a valid filename are treated as summary/aggregate rows and are skipped. If your CLOC report was generated without the --by-file flag, every row is a language summary with no file path, and the converter will produce an empty tree. Additionally, rows where blank, comment, or code cannot be parsed as integers are skipped. Check the toast notifications for a count of skipped rows.

The Tree format produces a hierarchical JSON object mirroring the directory structure: each directory becomes a node with a children array, and each file becomes a leaf node with size (total lines). This is compatible with D3.js hierarchical layouts, CodeFlower, sunburst diagrams, and treemaps. The Flat format produces a simple JSON array where each element is an object with language, filename, blank, comment, code, and total. Flat format is suitable for table-based analysis or importing into spreadsheets.

Yes. The parser normalizes all path separators by splitting on both / and \ characters. A path like src\utils\helpers.js is treated identically to src/utils/helpers.js. Leading drive letters (e.g., C:\) or leading slashes are stripped to avoid creating empty root nodes.

The tool enforces a 50 MB limit on uploaded CSV files. This is a practical browser memory constraint. A typical CLOC report for a project with 100,000 files produces a CSV of approximately 5 - 15 MB. If your CSV exceeds the limit, consider filtering it with cloc --exclude-dir=vendor,node_modules before conversion.

The Tree output format produces exactly the structure expected by CodeFlower and D3's d3.hierarchy(): a root object with name (string) and children (array). Leaf nodes carry a size property instead of children. You can directly feed the output JSON into d3.hierarchy(data).sum(d => d.size) for rendering.