> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Definitions Guide

> Define the structure, validation rules, and extraction logic for document data in Kodexa using Data Definitions and configurable data elements.

## Overview

Data definitions in Kodexa provide the structure and rules for extracting, validating, and processing information from documents. They define what data to extract, how to validate it, and how to present it to users.

## What Are Data Definitions?

**Data definitions** are the blueprints for your document processing workflows. They specify:

* **Structure**: What data elements exist and how they relate
* **Types**: What kind of data each field contains (text, numbers, dates, etc.)
* **Sources**: Where data comes from (document content, metadata, calculations, review)
* **Validation**: Business rules and data quality checks
* **Behavior**: Formulas, selection options, event scripts, and validation cascades that run when data changes

<CardGroup cols={2}>
  <Card title="Data Definition Structure" icon="sitemap" href="/guides/data-definitions/taxonomy-guide">
    Data elements, groups, sources, and extraction behavior
  </Card>

  <Card title="Event-Based Scripting" icon="bolt" href="/guides/data-definitions/event-subscriptions">
    Reactive JavaScript scripts that run when modeled data changes
  </Card>

  <Card title="Formulas" icon="function" href="/guides/formulas/index">
    Calculation logic for derived and computed fields
  </Card>

  <Card title="Validation and Formatting" icon="shield-check" href="/guides/data-definitions/validation-and-conditional-formatting">
    Business rules, exceptions, and reviewer-facing visual cues
  </Card>
</CardGroup>

***

## Core Concepts

### Data Structure

**Data definitions** are hierarchical structures of data elements that define what to extract from documents. In configuration and API payloads, those elements are still stored under the `taxons` field.

**Example use cases**:

* Invoice data extraction (vendor, line items, totals)
* Contract metadata (parties, dates, terms)
* Form processing (applicant info, answers, signatures)

<Card title="Data Definition Structure" icon="book" href="/guides/data-definitions/taxonomy-guide">
  Learn how data elements, groups, value sources, and extraction behavior fit together
</Card>

### Data Types

Kodexa supports rich data types for accurate extraction and validation:

<Tabs>
  <Tab title="Basic Types">
    * **STRING** - Text of any length
    * **NUMBER** - Numeric values
    * **BOOLEAN** - True/false values
    * **DATE** - Calendar dates
    * **DATE\_TIME** - Dates with timestamps
  </Tab>

  <Tab title="Specialized Types">
    * **CURRENCY** - Monetary amounts with precision handling
    * **PERCENTAGE** - Percentage values
    * **EMAIL\_ADDRESS** - Email validation
    * **PHONE\_NUMBER** - Phone number formats
    * **URL** - Web addresses
  </Tab>

  <Tab title="Complex Types">
    * **SELECTION** - Dropdown/categorical values with options
    * **SECTION** - Visual grouping without data storage
    * **Groups** - Containers for related fields (can repeat)
  </Tab>
</Tabs>

### Data Sources

Define where each data element gets its value:

<AccordionGroup>
  <Accordion title="Document Extraction" icon="file-lines">
    Extract directly from document content using AI/ML models

    ```yaml theme={null}
    valuePath: VALUE_OR_ALL_CONTENT
    semanticDefinition: "Extract the invoice total amount"
    ```
  </Accordion>

  <Accordion title="Metadata" icon="database">
    Pull from document properties and system fields

    ```yaml theme={null}
    valuePath: METADATA
    metadataValue: FILENAME
    ```
  </Accordion>

  <Accordion title="Formulas" icon="calculator">
    Calculate from other fields

    ```yaml theme={null}
    valuePath: FORMULA
    semanticDefinition: "quantity * unit_price"
    ```
  </Accordion>

  <Accordion title="Review" icon="eye">
    Fields populated during human review

    ```yaml theme={null}
    valuePath: REVIEW
    userEditable: true
    ```
  </Accordion>
</AccordionGroup>

***

## Common Patterns

### Invoice Processing

Extract structured data from invoices:

```yaml theme={null}
taxons:
  - name: invoice_number
    label: Invoice Number
    taxonType: STRING

  - name: invoice_date
    label: Invoice Date
    taxonType: DATE

  - name: vendor
    label: Vendor
    group: true
    children:
      - name: name
      - name: address
      - name: tax_id

  - name: line_items
    label: Line Items
    group: true
    children:
      - name: description
      - name: quantity
      - name: unit_price
      - name: total
        valuePath: FORMULA
        semanticDefinition: "quantity * unit_price"

  - name: total_amount
    label: Total Amount
    taxonType: CURRENCY
```

### Contract Metadata

Capture key contract information:

```yaml theme={null}
taxons:
  - name: contract_type
    label: Contract Type
    taxonType: SELECTION
    selectionOptions:
      - label: "Service Agreement"
      - label: "Purchase Order"
      - label: "NDA"

  - name: parties
    group: true
    children:
      - name: party_a
      - name: party_b

  - name: key_terms
    group: true
    children:
      - name: effective_date
        taxonType: DATE
      - name: term_length
      - name: termination_notice
```

### Form Data

Process form submissions:

```yaml theme={null}
taxons:
  - name: applicant
    group: true
    children:
      - name: full_name
      - name: email
        taxonType: EMAIL_ADDRESS
      - name: phone
        taxonType: PHONE_NUMBER

  - name: responses
    group: true
    children:
      - name: question_1
      - name: question_2
      - name: agree_to_terms
        taxonType: BOOLEAN
```

***

## Validation and Quality

### Validation Rules

Define business rules to ensure data quality:

```yaml theme={null}
validationRules:
  - name: "Required field check"
    ruleFormula: "!isblank({invoice_number})"
    messageFormula: '"Invoice number is required"'
    overridable: false

  - name: "Date logic check"
    conditional: true
    conditionalFormula: "!isblank({due_date}) && !isblank({invoice_date})"
    ruleFormula: "isafterdate({due_date}, {invoice_date}) || {due_date} = {invoice_date}"
    messageFormula: '"Due date must be after invoice date"'
    overridable: false

  - name: "Total verification"
    ruleFormula: "abs({total_amount} - sum({line_items/total})) < 0.01"
    messageFormula: '"Total mismatch detected"'
    overridable: true
```

<Card title="Validation and Conditional Formatting" icon="shield-check" href="/guides/data-definitions/validation-and-conditional-formatting">
  Learn the exact `validationRules` schema, conditional formatting schema, formula language, and runtime behavior.
</Card>

### Conditional Formatting

Apply visual cues based on data values:

```yaml theme={null}
conditionalFormats:
  - type: backgroundColor
    condition: "isbeforedate({due_date}, datemath('today')) && {status} != 'PAID'"
    properties:
      color: "#FEE2E2"

  - type: icon
    condition: "{total_amount} > 10000"
    properties:
      icon: alert-circle-outline
      color: "#92400E"
```

***

## Best Practices

### Design Principles

<AccordionGroup>
  <Accordion title="Start Simple, Iterate" icon="seedling">
    Begin with core fields and add complexity as needed. Don't over-engineer initial data definitions.

    **Start with**:

    * Essential fields only
    * Basic data types
    * Simple validation

    **Add later**:

    * Computed fields
    * Complex validations
    * Conditional formatting
  </Accordion>

  <Accordion title="Use Semantic Definitions Well" icon="comment-dots">
    Write clear, specific extraction prompts:

    **Good**:

    ```yaml theme={null}
    semanticDefinition: |
      The vendor's legal business name as it appears at the top of the invoice.
      Look near 'Bill To', 'From', or 'Vendor' labels.
    ```

    **Avoid**:

    ```yaml theme={null}
    semanticDefinition: "vendor name"  # Too vague
    ```
  </Accordion>

  <Accordion title="Organize with Groups" icon="folder-tree">
    Use groups to:

    * Organize related fields logically
    * Handle repeating structures (line items, signatories)
    * Improve UI presentation

    **Single instance groups**: Organizational containers

    ```yaml theme={null}
    - name: vendor
      group: true
      children: [name, address, tax_id]
    ```

    **Repeating groups**: Collections

    ```yaml theme={null}
    - name: line_items
      group: true
      children: [description, quantity, price]
    ```
  </Accordion>

  <Accordion title="Validate Strategically" icon="shield-halved">
    **Critical validations** (non-overridable):

    * Required fields
    * Data type constraints
    * Business logic rules

    **Quality checks** (overridable):

    * Unusual values
    * Formatting issues
    * Threshold warnings
  </Accordion>
</AccordionGroup>

### Naming Conventions

Use consistent naming across your data definitions:

<CodeGroup>
  ```yaml Field Names (Internal) theme={null}
  name: vendor_name         # Snake case
  name: invoice_date        # Descriptive, unambiguous
  name: line_items          # Plural for groups
  ```

  ```yaml Display Labels theme={null}
  label: Vendor Name        # Title case
  label: Invoice Date       # Human-readable
  label: Line Items         # Matches business terminology
  ```

  ```yaml External Names (APIs) theme={null}
  externalName: vendorName  # Camel case
  externalName: invoiceDate # Consistent with API conventions
  ```
</CodeGroup>

***

## Getting Started

<Steps>
  <Step title="Understand Your Documents">
    Analyze the documents you'll process:

    * What data needs to be extracted?
    * What's the document structure?
    * What validations are needed?
  </Step>

  <Step title="Design Your Data Definition">
    Sketch out the data structure:

    * List all required fields
    * Group related fields
    * Identify repeating sections
  </Step>

  <Step title="Configure Data Elements">
    For each field, define:

    * Data type
    * Value source
    * Semantic definition
    * Validation rules
  </Step>

  <Step title="Test and Iterate">
    Process sample documents:

    * Verify extraction accuracy
    * Refine semantic definitions
    * Adjust validation rules
  </Step>
</Steps>

***

## Learn More

<CardGroup cols={2}>
  <Card title="Data Definition Structure" icon="book-open" href="/guides/data-definitions/taxonomy-guide">
    Data elements, groups, value sources, and configuration options
  </Card>

  <Card title="Formula Reference" icon="function" href="/guides/formulas/index">
    Built-in functions for calculations and validations
  </Card>

  <Card title="Validation and Formatting" icon="shield-check" href="/guides/data-definitions/validation-and-conditional-formatting">
    Complete guide to validation rules, conditional formats, and formula behavior
  </Card>

  <Card title="Event-Based Scripting" icon="bolt" href="/guides/data-definitions/event-subscriptions">
    Add reactive JavaScript behavior to the data model
  </Card>

  <Card title="API Documentation" icon="code" href="/api-reference/introduction">
    Programmatic access to data definition management
  </Card>
</CardGroup>

***

## Examples

<CardGroup cols={2}>
  <Card title="Invoice Data Definition" icon="file-invoice-dollar" href="/guides/data-definitions/examples/invoice">
    Complete invoice extraction example
  </Card>

  <Card title="Contract Data Definition" icon="file-contract" href="/guides/data-definitions/examples/contract">
    Contract metadata extraction example
  </Card>

  <Card title="Form Data Definition" icon="clipboard-list" href="/guides/data-definitions/examples/form">
    Form data processing example
  </Card>

  <Card title="Purchase Order Data Definition" icon="shopping-cart" href="/guides/data-definitions/examples/purchase-order">
    Purchase order extraction example
  </Card>
</CardGroup>
