Skip to content

CSV Input

Input

Read a CSV file and emit one row per data line. Configure the column delimiter, choose whether the first row is a header, and skip blank lines. CSV Input streams rows as it parses, so it handles large files without loading everything into memory.

CSV Input is a source node — it produces rows from an uploaded file but takes no input handle. Parsing uses PapaParse line-by-line, which means rows are emitted as the file is read; downstream streaming transforms can begin work before the upload finishes.

The parser samples the first five data rows to infer column types. If every non-empty value in a column parses as a number, the column is coerced to number; otherwise values stay as strings. Columns whose first five rows are all empty fall back to string. Type detection happens once per column on the sample window — values past row five are coerced using the type chosen from the sample.

When Has Header Row is unchecked, the parser generates synthetic column names col_0, col_1, … and treats every line as data.

Input: A .csv file uploaded by the user. Output: A row stream where each row is an object keyed by header (or col_N).

OptionTypeDefaultDescription
delimiterstring,Single character that separates columns. Common values: ,, ;, \t, |.
hasHeaderbooleantrueTreat the first non-empty row as column names. When false, columns are named col_0, col_1, …
skipEmptyLinesbooleantrueSkip lines that are empty after trimming. When false, blank lines parse as zero-column rows.

A typical sales export — comma delimiter, first row contains column names.

Before (raw file content):

order_id,customer,amount
1001,Acme Corp,2400
1002,Beta Inc,750
1003,Gamma LLC,1200

Configuration: delimiter: ",", hasHeader: true, skipEmptyLines: true.

After:

order_idcustomeramount
1001Acme Corp2400
1002Beta Inc750
1003Gamma LLC1200

order_id and amount are inferred as number; customer stays a string.

A European-locale export with no header row. Synthetic column names are generated.

Before (raw file content):

2024-06-01;Alice;125.50
2024-06-02;Bob;88.00
2024-06-03;Carol;212.75

Configuration: delimiter: ";", hasHeader: false.

After:

col_0col_1col_2
2024-06-01Alice125.50
2024-06-02Bob88.00
2024-06-03Carol212.75
  • Type inference is sample-based, not full-scan. Numeric columns are detected from the first five non-empty rows. If row 7 of a column inferred as number contains "N/A", it will be coerced via Number("N/A") and become NaN rather than reverting to string. Pre-clean numeric columns or coerce types downstream if your file has sparse non-numeric rows past the first five. See apps/web/src/transforms/input-csv/logic.ts:45-70.
  • delimiter must be exactly one character. The Zod schema rejects multi-character delimiters at config-validation time. To parse files with multi-character separators, replace the separator upstream or pre-process the file. See apps/web/src/transforms/input-csv/logic.ts:24-29.
  • PapaParse runs per line, not over the whole file. Quoted values that span multiple lines are not preserved — each physical line is parsed independently. CSVs with embedded newlines inside quoted fields will produce extra rows. See apps/web/src/transforms/input-csv/logic.ts:101-109.
  • Excel Input — read .xlsx instead of CSV; the rest of the pipeline is identical.
  • JSON Input — read structured JSON or NDJSON when the source is hierarchical.
  • Change Type — fix up columns whose inferred type does not match the data.