How It Works
Excel processing follows the same two-layer architecture as PDF processing:- Native layer — The original
.xlsxfile is rendered in a spreadsheet viewer for visual fidelity - Content layer — The parsed KDDB document contains a content node tree (
workbook → worksheet → row → cell) that holds tags, data objects, and extraction results
A1, B2, etc.) bridge the two layers, similar to how bounding boxes bridge PDF rendering and spatial content nodes.
Uploading Excel Files
Upload Excel files to a document store just like any other document. Supported formats:.xlsx— Native support.xlsm— Native support (macro-enabled workbooks).xls,.ods, and other formats — Automatically converted via LibreOffice
The Workbook Content Structure
After parsing, the KDDB contains a content node tree that mirrors the workbook:| Feature | Example | Description |
|---|---|---|
workbook:ref | B2 | Cell reference (column + row) |
workbook:sheet | Income Statement | Parent worksheet name |
workbook:formula | =SUM(B2:B10) | Formula (if present) |
workbook:merge | A1:D1 | Merged range (top-left cell only) |
Tagging and Extraction
Data-form-driven workflow
The recommended workflow for Excel extraction is data-form-driven:- Define a data definition (taxonomy) for the data you want to extract
- Open the Excel file in the workspace — it renders in the spreadsheet viewer
- The data form on the left shows the data objects and attributes from your taxonomy
- Focus an attribute in the data form (e.g., “Revenue Q1”)
- Click or drag-select cells in the spreadsheet to link them to that attribute
- Tagged cells highlight with the taxon’s color
- Repeat for all attributes
AI-assisted extraction
The LLM extraction engine works with workbook content the same way it works with spatial content. The AI reads the cell content and structure, then automatically tags cells based on your data definition. You review and correct the results in the same data-form-driven workflow.Selectors for Workbook Content
Query workbook content using the standard selector language:What’s Next
Content Structures
Learn about the workbook mixin and other content structures.
Data Definitions
Define taxonomies to extract structured data from spreadsheets.
