How It Works
Excel processing follows the same two-layer architecture as PDF processing:- Native layer — The original
.xlsxfile is rendered directly in the workbook viewer, an OOXML-based canvas viewer that draws the native spreadsheet bytes for visual fidelity - Content layer — The parsed KDDB document contains a content node tree (
workbook → worksheet → row → cell) that holds tags, data objects, and extraction results
A1, B2, etc.) bridge the two layers, similar to how bounding boxes bridge PDF rendering and spatial content nodes.
The Workbook Viewer
Excel files open in the workbook viewer — a canvas-based OOXML viewer that renders the native.xlsx bytes directly, so the spreadsheet you see matches the source file’s fonts, column widths, and cell formatting. Tags and extraction results are drawn as overlays on top of the rendered grid.
The viewer is read-only for cell values — you view the spreadsheet and tag cells for extraction, but you don’t edit the underlying data.
Navigating the workbook
- Sheet tabs — When a workbook has more than one worksheet, each renders as a tab in source order; click a tab to switch sheets. Single-sheet workbooks show no tab bar.
- Zoom — Zoom in and out from the toolbar. Zoom steps by a factor of 1.25 each click and is clamped between 25% and 400%; the toolbar shows the current percentage. A reset control returns the view to 100%.
- Column and row resizing — Drag a column or row header boundary to resize it. Double-click a boundary to auto-fit: columns size to the widest cell value on that sheet (capped at 800px) and rows size to a single line of the default font. Resizes are session-only — reloading the document re-parses the original widths from the file.
Finding and copying cell content
- Find — The search box matches a case-insensitive substring against every cell value across all sheets. Press Enter (or use next/previous) to jump between matches, switching sheets automatically as needed. The current match is highlighted in amber with an outline; other matches use a lighter amber tint.
- Copy — Select a cell or drag-select a range and press Ctrl/Cmd+C to copy the selection as tab-separated values (tabs between columns, newlines between rows). Off-screen cells inside the selected range are included in the copy.
Tag highlights and linking
- Tag overlays — Tagged cells are highlighted with their taxon’s color, resolved from the taxonomy’s tag metadata. The focused tag is drawn with a stronger fill and a 2px outline; other tagged cells use a lighter tint of the same color. Toggle overlays on and off with the highlights button in the toolbar (on by default).
- Selection — The active cell or range is drawn as a translucent blue rectangle.
- Linking cells — In a data-form linking workflow, left-click a cell to focus and link it to the focused attribute; drag to link a range. Right-click a cell to open the tag popup for tagging and other cell actions.
Uploading Excel Files
Upload Excel files to a document store just like any other document. Supported formats:.xlsx— Native support.xlsm— Native support (macro-enabled workbooks).xls,.ods, and other formats — Automatically converted via LibreOffice
The Workbook Content Structure
After parsing, the KDDB contains a content node tree that mirrors the workbook:| Feature | Example | Description |
|---|---|---|
workbook:ref | B2 | Cell reference (column + row) |
workbook:sheet | Income Statement | Parent worksheet name |
workbook:formula | =SUM(B2:B10) | Formula (if present) |
workbook:merge | A1:D1 | Merged range (top-left cell only) |
Tagging and Extraction
Data-form-driven workflow
The recommended workflow for Excel extraction is data-form-driven:- Define a data definition (taxonomy) for the data you want to extract
- Open the Excel file in the workspace — it renders in the spreadsheet viewer
- The data form on the left shows the data objects and attributes from your taxonomy
- Focus an attribute in the data form (e.g., “Revenue Q1”)
- Click or drag-select cells in the spreadsheet to link them to that attribute
- Tagged cells highlight with the taxon’s color
- Repeat for all attributes
AI-assisted extraction
The LLM extraction engine works with workbook content the same way it works with spatial content. The AI reads the cell content and structure, then automatically tags cells based on your data definition. You review and correct the results in the same data-form-driven workflow.Selectors for Workbook Content
Query workbook content using the standard selector language:What’s Next
Content Structures
Learn about the workbook mixin and other content structures.
Data Definitions
Define taxonomies to extract structured data from spreadsheets.
