Skip to content

Excel Input

Input

Read an Excel workbook and emit one row per data row from a chosen sheet. Pick the sheet by name or zero-based index, skip the first row as a header, and optionally drop additional leading rows. Excel Input loads the whole workbook into memory because the underlying SheetJS parser is not streaming.

Excel Input is a source node that reads a .xlsx (or .xls) workbook via SheetJS. Unlike CSV Input, it requires the entire file in memory before any rows are emitted — large workbooks scale memory linearly with cell count. Once parsed, rows are yielded one at a time so downstream streaming transforms still see a row stream.

Sheet selection is sheetName over sheetIndex over the first sheet. If sheetName is set, it must match exactly; a missing name throws with a list of available sheets. If sheetIndex is out of range, it throws with the workbook’s sheet count. Native Excel types are preserved — numbers stay numbers, dates come through as Excel serial numbers unless the cell is formatted as text.

When hasHeader is true (the default), sheet_to_json uses the first row as keys. When false, columns are emitted as col_0, col_1, …. skipRows drops that many rows from the top of the JSON output, which means it skips data rows after the header (or all leading rows when there is no header).

Input: An .xlsx or .xls file uploaded by the user. Output: A row stream of objects keyed by header (or col_N).

OptionTypeDefaultDescription
sheetNamestring?undefinedName of the sheet to import. Takes priority over sheetIndex. Leave unset to use the first sheet.
sheetIndexnumber?undefinedZero-based sheet index. Used only when sheetName is unset.
hasHeaderbooleantrueTreat the first row as column names. When false, columns are named col_0, col_1, …
skipRowsnumber0Number of rows to drop from the top of the data (after the header row, if any).

An invoice export — first sheet, first row is the header.

Before (sheet contents):

invoice_idcustomertotalissued_at
INV-1001Acme Corp4250.002025-03-12
INV-1002Beta Inc1875.502025-03-13
INV-1003Gamma LLC990.002025-03-14

Configuration: defaults — sheet 0, hasHeader: true, skipRows: 0.

After:

invoice_idcustomertotalissued_at
INV-1001Acme Corp42502025-03-12
INV-1002Beta Inc1875.52025-03-13
INV-1003Gamma LLC9902025-03-14

A workbook with title and timestamp rows above the actual data on a sheet called Q2_Sales.

Before (sheet contents):

ABC
Q2 Sales Report
Generated 2025-07-01
regionsalesreps
EMEA41200014
AMER58800022
APAC1950006

Configuration: sheetName: "Q2_Sales", hasHeader: true, skipRows: 2.

After:

regionsalesreps
EMEA41200014
AMER58800022
APAC1950006

skipRows: 2 drops the title and timestamp rows. The next row after that becomes the header row, since hasHeader is true.

  • sheetName and sheetIndex are mutually exclusive in the UI. Setting one clears the other. The runtime checks sheetName first, then sheetIndex, then defaults to sheet 0. A sheetName mismatch throws with the list of available sheets — useful for debugging mistyped names. See apps/web/src/transforms/input-xlsx/logic.ts:77-101.
  • Dates come through as Excel serial numbers, not ISO strings. If a cell is formatted as a date, SheetJS returns the underlying serial (days since 1900-01-00) unless the cell is explicitly typed as text. Use Format Dates downstream with the excel-serial source format to convert. See apps/web/src/transforms/input-xlsx/logic.ts:108-113.
  • Empty cells become "", not null. The parser uses defval: "", so missing cells are empty strings rather than nullish. Tests that expect null for blanks will need to coerce. See apps/web/src/transforms/input-xlsx/logic.ts:108-113.
  • CSV Input — read .csv files when the source is plain text.
  • JSON Input — read structured JSON or NDJSON when the source is hierarchical.
  • Format Dates — convert Excel serial dates to ISO or other formats.