Group By

Aggregation

Group rows by one or more key columns and compute aggregates over the remaining columns. Each unique combination of key values produces one output row carrying the keys plus the requested aggregations. Use it to collapse line items into per-customer totals, count records per category, or compute min/max bounds per bucket.

How it works

Group By is a full-dataset batch transform. It buffers all input rows, partitions them by the configured groupKeys (using a JSON-stringified key tuple so multi-column groups work), then emits one output row per group with the key columns and every named aggregation.

When groupKeys is empty but aggregations is configured, the entire input collapses to a single output row holding the global aggregates. When both lists are empty, rows pass through unchanged.

Output row order follows the order each unique key combination first appeared in the input. Aggregate columns appear after the group-key columns in the order they were declared.

Input: One tabular data connection. Output: One row per unique group-key combination, with key columns first and aggregations second.

Options

Option	Type	Default	Description
`groupKeys`	`string[]`	`[]`	Columns whose values together identify a group. Empty list collapses everything into a single global row.
`aggregations`	`Array<{ inputColumn, function, outputColumn }>`	`[]`	Each entry computes one aggregate. `function` is one of `"sum"`, `"count"`, `"avg"`, `"min"`, `"max"`. `outputColumn` names the result column (required).

Examples

Total sales per customer

You have order line items and want per-customer totals plus order counts.

Before:

order_id	customer	amount
1001	Acme Corp	250
1002	Beta Inc	750
1003	Acme Corp	1100
1004	Beta Inc	500
1005	Acme Corp	300

Configuration: group keys: ["customer"], aggregations: sum(amount) → total, count(amount) → orders.

After:

customer	total	orders
Acme Corp	1650	3
Beta Inc	1250	2

Multi-column group with min/max/avg

You want bounds and average price per category and region.

Before:

product	category	region	price
Widget	tool	EU	25
Widget	tool	EU	27
Widget	tool	US	30
Gadget	toy	EU	12
Gadget	toy	EU	14
Gadget	toy	US	18

Configuration: group keys: ["category", "region"], aggregations: min(price) → price_min, max(price) → price_max, avg(price) → price_avg.

After:

category	region	price_min	price_max	price_avg
tool	EU	25	27	26
tool	US	30	30	30
toy	EU	12	14	13
toy	US	18	18	18

Global aggregate (no group keys)

You want a single summary row for the whole dataset.

Before:

order_id	amount
1001	250
1002	750
1003	1100
1004	500

Configuration: group keys: [], aggregations: sum(amount) → total, count(amount) → orders, avg(amount) → average.

After:

total	orders	average
2600	4	650

Tips and Edge Cases

count counts rows in the group, not non-null cells. The inputColumn is required by the schema but ignored by the executor — every row in the group counts. To count only rows where a column has a value, filter upstream first. See apps/web/src/transforms/group-by/logic.ts:42-43.
sum and avg skip non-numeric cells silently. A cell that fails parseFloat (e.g. "n/a", empty string) is excluded from the running total and from the divisor in avg. The result is the average over numeric cells only — not zero-filled. If every cell is non-numeric, avg returns null and sum returns 0. See apps/web/src/transforms/group-by/logic.ts:44-58.
min and max use a hybrid numeric/string comparison. When both candidate values parse as numbers, comparison is numeric (9 < 10); otherwise it falls back to localeCompare. This means min over ["9", "10", "2"] returns "2" numerically, but min over mixed ["banana", "10"] falls back to string order. Coerce the column with Change Type upstream to force one regime. See apps/web/src/transforms/group-by/logic.ts:31-38.

Pivot — produce wide cross-tabs (one column per pivot value) instead of long groups.
Filter Rows — drop rows before grouping to exclude unwanted records from aggregates.
Sort Rows — order the grouped output by an aggregation column (e.g. top customers by total).

Group By

How it works

Options

Examples

Total sales per customer

Multi-column group with min/max/avg

Global aggregate (no group keys)

Tips and Edge Cases

Related Transforms