Skip to content

Group By

Aggregation

Group rows by one or more key columns and compute aggregates over the remaining columns. Each unique combination of key values produces one output row carrying the keys plus the requested aggregations. Use it to collapse line items into per-customer totals, count records per category, or compute min/max bounds per bucket.

Group By is a full-dataset batch transform. It buffers all input rows, partitions them by the configured groupKeys (using a JSON-stringified key tuple so multi-column groups work), then emits one output row per group with the key columns and every named aggregation.

When groupKeys is empty but aggregations is configured, the entire input collapses to a single output row holding the global aggregates. When both lists are empty, rows pass through unchanged.

Output row order follows the order each unique key combination first appeared in the input. Aggregate columns appear after the group-key columns in the order they were declared.

Input: One tabular data connection. Output: One row per unique group-key combination, with key columns first and aggregations second.

OptionTypeDefaultDescription
groupKeysstring[][]Columns whose values together identify a group. Empty list collapses everything into a single global row.
aggregationsArray<{ inputColumn, function, outputColumn }>[]Each entry computes one aggregate. function is one of "sum", "count", "avg", "min", "max". outputColumn names the result column (required).

You have order line items and want per-customer totals plus order counts.

Before:

order_idcustomeramount
1001Acme Corp250
1002Beta Inc750
1003Acme Corp1100
1004Beta Inc500
1005Acme Corp300

Configuration: group keys: ["customer"], aggregations: sum(amount) → total, count(amount) → orders.

After:

customertotalorders
Acme Corp16503
Beta Inc12502

You want bounds and average price per category and region.

Before:

productcategoryregionprice
WidgettoolEU25
WidgettoolEU27
WidgettoolUS30
GadgettoyEU12
GadgettoyEU14
GadgettoyUS18

Configuration: group keys: ["category", "region"], aggregations: min(price) → price_min, max(price) → price_max, avg(price) → price_avg.

After:

categoryregionprice_minprice_maxprice_avg
toolEU252726
toolUS303030
toyEU121413
toyUS181818

You want a single summary row for the whole dataset.

Before:

order_idamount
1001250
1002750
10031100
1004500

Configuration: group keys: [], aggregations: sum(amount) → total, count(amount) → orders, avg(amount) → average.

After:

totalordersaverage
26004650
  • count counts rows in the group, not non-null cells. The inputColumn is required by the schema but ignored by the executor — every row in the group counts. To count only rows where a column has a value, filter upstream first. See apps/web/src/transforms/group-by/logic.ts:42-43.
  • sum and avg skip non-numeric cells silently. A cell that fails parseFloat (e.g. "n/a", empty string) is excluded from the running total and from the divisor in avg. The result is the average over numeric cells only — not zero-filled. If every cell is non-numeric, avg returns null and sum returns 0. See apps/web/src/transforms/group-by/logic.ts:44-58.
  • min and max use a hybrid numeric/string comparison. When both candidate values parse as numbers, comparison is numeric (9 < 10); otherwise it falls back to localeCompare. This means min over ["9", "10", "2"] returns "2" numerically, but min over mixed ["banana", "10"] falls back to string order. Coerce the column with Change Type upstream to force one regime. See apps/web/src/transforms/group-by/logic.ts:31-38.
  • Pivot — produce wide cross-tabs (one column per pivot value) instead of long groups.
  • Filter Rows — drop rows before grouping to exclude unwanted records from aggregates.
  • Sort Rows — order the grouped output by an aggregation column (e.g. top customers by total).