Skip to content

Working with Large Files

FileBender runs in your browser. Large files are handled by streaming where possible and by tier limits where streaming isn’t enough. This guide covers the limits, the performance behavior to expect, and a few patterns that keep large flows fast.

There are two thresholds, and they’re independent:

  1. The tier row limit. Each tier caps how many rows a single Input file can have. A file over the cap is rejected at the Input node.
  2. The browser memory ceiling. Even when a file fits the tier cap, full-dataset transforms (Sort, Group By, Pivot, Deduplicate) buffer rows in memory. The browser tab eventually runs out of headroom on a sufficiently large dataset.

For most files the first limit hits first, and once you’re past it, the second one matters.

TierRows per input fileOutput formats
Free5,000CSV, JSON
Starter100,000CSV, XLSX, JSON
ProUnlimitedCSV, XLSX, JSON
TeamUnlimitedCSV, XLSX, JSON

Numbers come from packages/domain/src/pricing/tiers.ts and are checked at runtime when an Input node receives a file. The full reference, including flow count limits and shared-flow execution caps, lives at Row Limits and Tiers.

The Input node refuses files over your tier cap and surfaces the limit in an inline error. The flow doesn’t run; nothing is silently truncated. To process the file you can:

  • Upgrade. Pricing is at filebender.com/pricing.
  • Pre-split the file outside FileBender into chunks under the cap, run each through the same flow, then concatenate the outputs. Stack Rows handles the concatenation if you wire the chunks back into one flow.
  • Sample first. If you only need to validate the flow logic, take a small slice of the file and run that through the pipeline; the configuration is identical.

The execution model splits transforms into two camps:

Streaming transforms process one row at a time and don’t hold the dataset in memory:

  • Filter Rows, Add Column, Formula, Rename Columns, Reorder Columns, Select Columns, Replace Values, Format Dates, Change Type, Split Column, Stack Rows, Currency Convert.

Full-dataset transforms buffer all rows before they emit anything:

  • Sort Rows, Group By, Pivot, Unpivot, Deduplicate, Lookup.

Streaming transforms scale linearly with row count and have negligible memory overhead. Full-dataset transforms scale with the size of whatever they need to keep in RAM — for Sort that’s every row; for Group By that’s every distinct group; for Lookup that’s every row of the lookup table.

Put Select Columns as close to the Input as possible. Every downstream transform touches narrower rows; full-dataset transforms hold less data; the final output is smaller.

If your input has 50 columns and the output needs 6, doing Input → Select Columns → ... instead of ... → Select Columns → Output cuts memory pressure by an order of magnitude on Pivot or Group By.

Filter Rows is streaming and cheap. Aggregations are expensive. Filter first.

Input → Filter Rows → Group By runs faster on a large file than Input → Group By → Filter Rows, even when the final output is the same.

Pivot creates one output column per distinct value of the pivot key. On a column with thousands of distinct values you’ll get a row that is thousands of columns wide, which the browser handles poorly. Use Group By for high-cardinality keys; reserve Pivot for keys with bounded distinct values (e.g. month names, status codes, regions).

4. Lookup with the smaller table on the lookup handle

Section titled “4. Lookup with the smaller table on the lookup handle”

Lookup loads the entire lookup table into a hash map. Put the smaller of the two inputs on the lookup handle — the primary stream stays streaming, the lookup table is what costs memory.

Sort holds every row in memory before it emits anything. If you can defer it until after Filter Rows, Select Columns, and any aggregations, you sort fewer rows over fewer columns.

Even within tier limits, the browser will choke on enough rows. Symptoms: the tab gets sluggish, the Run button stays in the running state for a long time, the system memory pressure indicator climbs.

If this happens:

  • Check the data preview underneath each node to find which transform is the bottleneck — execution stops at the slow one.
  • Restructure the flow following the patterns above (drop columns earlier, filter before aggregating, etc.).
  • If the bottleneck is genuinely a full-dataset transform on a dataset that won’t fit, split the file as described in “What happens when you exceed the limit” — even on Pro/Team where the row limit is unlimited, browser memory still has a ceiling.
  • The Row Limits and Tiers reference has the exact numbers per tier.
  • Core Concepts covers the streaming vs. full-dataset distinction in more detail.
  • The per-transform reference pages call out streaming behavior in their “How it works” section.