> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Definition Structure

> Define and configure Kodexa Data Definitions for document extraction, including data elements, types, validation, formatting, and extraction logic.

## Overview

Data definitions are the foundation of data extraction in Kodexa. They define the hierarchical structure of data elements you want to extract from documents, along with their types, validation rules, and extraction logic.

### What is a Data Definition?

A **Data Definition** is a hierarchical structure of data elements. In configuration and API payloads, those elements are represented as `taxons`. A Data Definition defines:

* **What data** to extract from documents
* **Where** the data comes from (document content, metadata, formulas)
* **How** to validate and format the data
* **What type** of data it is (string, date, currency, etc.)

### Key Concepts

<CardGroup cols={2}>
  <Card title="Data Definition" icon="sitemap">
    Top-level container defining the complete data structure for extraction
  </Card>

  <Card title="Data Element" icon="tag">
    Individual field or group within a Data Definition
  </Card>

  <Card title="Data Group" icon="folder-tree">
    Organizational container that groups related data elements without storing data itself
  </Card>

  <Card title="Value Path" icon="route">
    Defines where the data element gets its value from (document, metadata, formula, etc.)
  </Card>
</CardGroup>

***

## Data Definition Structure

### Top-Level Configuration

Every data definition has these core properties:

```yaml theme={null}
slug: invoice-data
name: Invoice Data Extraction
description: Extract structured data from invoices
taxonomyType: CONTENT
enabled: true
taxons:
  - name: vendor_information
    label: Vendor Information
    # ... data element configuration
```

#### Data Definition Properties

| Property                   | Type      | Default   | Description                                  |
| -------------------------- | --------- | --------- | -------------------------------------------- |
| `slug`                     | string    | -         | Unique identifier for the data definition    |
| `name`                     | string    | -         | Display name                                 |
| `description`              | string    | -         | Description of the data definition's purpose |
| `taxonomyType`             | enum      | `CONTENT` | Type of data definition (typically CONTENT)  |
| `enabled`                  | boolean   | `true`    | Whether the data definition is active        |
| `externalDataTaxonomyRefs` | string\[] | `[]`      | References to external data definitions      |
| `taxons`                   | Taxon\[]  | `[]`      | Array of root-level data elements            |

***

## Data Element Configuration

Data elements are the individual fields and groups within a Data Definition. In configuration, each one is a `taxon` with extensive options organized into several categories.

### Basic Properties

Every data element requires these fundamental properties:

```yaml theme={null}
id: "auto-generated-uuid"
name: vendor_name
label: Vendor Name
description: The name of the vendor or supplier
enabled: true
color: "#4F46E5"
```

<ParamField path="name" type="string" required>
  Internal identifier (alphanumeric, hyphens, underscores only)
</ParamField>

<ParamField path="label" type="string" required>
  Human-readable display name
</ParamField>

<ParamField path="description" type="string">
  Detailed explanation of what this data element represents
</ParamField>

<ParamField path="enabled" type="boolean" default="true">
  Whether this data element is active (disabled elements cascade to children)
</ParamField>

<ParamField path="color" type="string">
  Hex color code for UI display (auto-generated if not specified)
</ParamField>

<ParamField path="generateName" type="boolean" default="true">
  Auto-generate the internal `name` from the `label`
</ParamField>

<ParamField path="externalName" type="string">
  Name used when publishing to external systems (auto-generated from label if not specified)
</ParamField>

***

### Data Source (Value Path)

The `valuePath` determines where the data element gets its value from:

<AccordionGroup>
  <Accordion title="Document (VALUE_OR_ALL_CONTENT)" icon="file-lines">
    Extracts data directly from document content using AI/ML models or pattern matching.

    **When to use**: Standard document extraction (invoices, contracts, forms)

    **Configuration**:

    ```yaml theme={null}
    valuePath: VALUE_OR_ALL_CONTENT
    semanticDefinition: "Extract the vendor's business name as it appears on the invoice"
    ```

    **Features**:

    * Uses semantic definition as extraction prompt
    * Can leverage document structure and layout
    * Supports AI-assisted extraction
  </Accordion>

  <Accordion title="Metadata (METADATA)" icon="database">
    Pulls data from document metadata (filename, creation date, owner, etc.).

    **When to use**: Document properties, system fields, audit trail

    **Configuration**:

    ```yaml theme={null}
    valuePath: METADATA
    metadataValue: FILENAME  # or CREATED_DATETIME, OWNER_NAME, etc.
    ```

    **Available metadata values**:

    * `FILENAME` - Document filename
    * `TRANSACTION_UUID` - Unique transaction identifier
    * `CREATED_DATETIME` - Document creation timestamp
    * `DOCUMENT_LABELS` - Applied labels
    * `OWNER_NAME` - Document owner
    * `DOCUMENT_STATUS` - Processing status
    * `PAGE_NUMBER` - Current page number
  </Accordion>

  <Accordion title="Formula (FORMULA)" icon="function">
    Calculates values using formulas that reference other data elements.

    **When to use**: Computed fields, calculations, aggregations

    **Configuration**:

    ```yaml theme={null}
    valuePath: FORMULA
    semanticDefinition: |
      sum({line_items/amount})
    ```

    **Features**:

    * Reference other data elements with `{field_name}` or `{group/field_name}`
    * Built-in functions such as `sum`, `average`, `if`, `isblank`, and `datemath`
    * Conditional logic support
  </Accordion>

  <Accordion title="Review (REVIEW)" icon="eye">
    Generates review templates using Jinja2 templating.

    **When to use**: Human review interfaces, validation checklists

    **Configuration**:

    ```yaml theme={null}
    valuePath: REVIEW
    semanticDefinition: |
      ## Review Checklist

      - [ ] Vendor name matches PO: {{ vendor_name }}
      - [ ] Total amount is correct: {{ total_amount }}
      - [ ] All line items present: {{ line_items|length }} items
    ```
  </Accordion>

  <Accordion title="Derived (DERIVED)" icon="diagram-project">
    Placeholder for derived values (less common, use FORMULA instead).
  </Accordion>
</AccordionGroup>

***

### Data Types

The `taxonType` defines how the data should be treated and validated:

<Tabs>
  <Tab title="String">
    ```yaml theme={null}
    taxonType: STRING
    typeFeatures:
      longText: true           # Multi-line text field
      maxTextRows: 10          # Maximum rows for display
      markdown: true           # Enable markdown formatting
      expected: true           # Field is expected to be present
      stringExtract: '\d'      # Keep only matching characters (regex)
      stringReplace: '[-\s]'   # Remove matching characters (regex)
    ```

    **Use for**: Names, addresses, descriptions, any text content

    <Tip>Use `stringExtract` and `stringReplace` to automatically clean extracted values. See [String Filters](#string-filters) below.</Tip>
  </Tab>

  <Tab title="Number">
    ```yaml theme={null}
    taxonType: NUMBER
    typeFeatures:
      truncateDecimal: true    # Round to fixed decimal places
      decimalPlaces: 2         # Number of decimal places
    ```

    **Use for**: Quantities, counts, measurements
  </Tab>

  <Tab title="Currency">
    ```yaml theme={null}
    taxonType: CURRENCY
    typeFeatures:
      preferTwoDecimalPlaces: true  # Assume last 2 digits are decimal (1234 → 12.34)
    ```

    **Use for**: Prices, totals, monetary amounts
  </Tab>

  <Tab title="Date">
    ```yaml theme={null}
    taxonType: DATE
    typeFeatures:
      normalizeDate: true              # Normalize for display
      normalizeDateInExport: true      # Normalize in exports
      dateFormat: "yyyy-MM-dd"         # Target format
    ```

    **Use for**: Invoice dates, due dates, any date without time
  </Tab>

  <Tab title="Date Time">
    ```yaml theme={null}
    taxonType: DATE_TIME
    typeFeatures:
      normalizeDate: true
      dateFormat: "yyyy-MM-dd HH:mm:ss"
    ```

    **Use for**: Timestamps, creation dates with time
  </Tab>

  <Tab title="Selection">
    SELECTION data elements present users with a dropdown of predefined options and guide AI extraction toward valid categorical values.

    #### Basic Example

    ```yaml theme={null}
    taxonType: SELECTION
    selectionOptions:
      - label: "Net 30"
        id: "net_30"
        description: "Payment due in 30 days"
      - label: "Net 60"
        id: "net_60"
        description: "Payment due in 60 days"
    ```

    **Use for**: Dropdown selections, categorical data, classification

    #### Selection Option Properties

    Each item in `selectionOptions` supports these properties:

    | Property             | Type    | Description                                                                                                                                                                                                   |
    | -------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `label`              | string  | **Required.** Display text shown to the user in the dropdown                                                                                                                                                  |
    | `id`                 | string  | Unique identifier for the option (auto-generated if omitted)                                                                                                                                                  |
    | `value`              | string  | The value stored when this option is selected. Defaults to `label` if empty. Use this to separate display text from stored codes (e.g., label "Net 30" with value "NET\_30")                                  |
    | `description`        | string  | Description text shown alongside the option                                                                                                                                                                   |
    | `hint`               | string  | Additional help text displayed with the option                                                                                                                                                                |
    | `hintMarkdown`       | boolean | When `true`, renders the `hint` as Markdown instead of plain text                                                                                                                                             |
    | `disabled`           | string  | Set to `"true"` to disable the option. Disabled options are excluded from AI extraction requests but remain visible (struck through) in the UI. Useful for deprecating options without breaking existing data |
    | `isConditional`      | boolean | Enables conditional visibility for this option                                                                                                                                                                |
    | `conditionalFormula` | string  | Formula evaluated per data object to determine if this option appears. Only used when `isConditional` is `true`                                                                                               |
    | `lexicalRelations`   | array   | Semantic relationships that help AI/ML models understand option equivalences. See [Lexical Relations](#lexical-relations-on-selection-options) below                                                          |

    #### Value vs Label

    When `value` is set, the UI displays the `label` but stores the `value`. This is useful when you need human-readable display text but machine-friendly stored values:

    ```yaml theme={null}
    selectionOptions:
      - label: "United States Dollar"
        value: "USD"
        id: "currency_usd"
      - label: "Euro"
        value: "EUR"
        id: "currency_eur"
    ```

    #### Hints

    Hints provide contextual help for individual options. Enable Markdown rendering for rich formatting:

    ```yaml theme={null}
    selectionOptions:
      - label: "Net 30"
        id: "net_30"
        hint: "Standard payment terms — **30 calendar days** from invoice date"
        hintMarkdown: true
      - label: "Due on Receipt"
        id: "due_receipt"
        hint: "Payment expected immediately upon receipt of invoice"
    ```

    #### Disabled Options

    Disabled options are excluded from AI extraction prompts (so the model won't extract them from new documents) but remain visible in the UI for historical data:

    ```yaml theme={null}
    selectionOptions:
      - label: "Net 30"
        id: "net_30"
        disabled: ""          # Active — included in AI requests
      - label: "Net 15"
        id: "net_15"
        disabled: "true"      # Deprecated — excluded from AI requests, shown struck-through
    ```

    <Note>The `disabled` field is a string, not a boolean. Use `"true"` to disable and `""` (empty string) or omit for enabled.</Note>

    #### Conditional Options

    Show or hide individual options based on the current data object's context using `isConditional` and `conditionalFormula`:

    ```yaml theme={null}
    selectionOptions:
      - label: "Standard"
        id: "priority_standard"
      - label: "Rush"
        id: "priority_rush"
        isConditional: true
        conditionalFormula: "{total_amount} > 10000"
      - label: "Emergency"
        id: "priority_emergency"
        isConditional: true
        conditionalFormula: "{total_amount} > 50000 && {is_critical} = true"
    ```

    The `conditionalFormula` is evaluated against the current data object at runtime. Options where the formula evaluates to `false` are hidden from the dropdown. Options without `isConditional` always appear.

    #### Lexical Relations on Selection Options

    Lexical relations help AI models understand synonyms and related terms for each option, improving extraction accuracy when documents use varied terminology:

    ```yaml theme={null}
    selectionOptions:
      - label: "Net 30"
        id: "net_30"
        description: "Payment due in 30 days"
        lexicalRelations:
          - type: SYNONYM
            value: "30 days, thirty days, N30"
          - type: SIMILAR_TO
            value: "monthly payment"
            weight: 0.7
      - label: "Due on Receipt"
        id: "due_receipt"
        lexicalRelations:
          - type: SYNONYM
            value: "immediate payment, payable on receipt, COD"
          - type: ANTONYM
            value: "deferred payment"
    ```

    **Supported relation types**: `SYNONYM`, `ANTONYM`, `HYPERNYM` (more general), `HYPONYM` (more specific), `MERONYM` (part-of), `HOLONYM` (whole-of), `ENTAILMENT`, `SIMILAR_TO`, `OTHER`

    Each relation has:

    * `type` — The relationship type from the list above
    * `value` — The related term(s). Can be comma-separated for multiple terms
    * `weight` — Optional confidence/importance weight (0.0–1.0)

    #### Dynamic Options with Formulas

    Instead of a static list, you can compute selection options dynamically using a JavaScript formula. Enable this on the data element:

    ```yaml theme={null}
    taxonType: SELECTION
    useSelectionOptionFormula: true
    selectionOptionFormula: |
      serviceBridgeCall("myorg/reference-data", "get-currencies", {
        country: getAttribute("country_code")
      })
    ```

    The formula is evaluated by the [Goja scripting runtime](/guides/scripting) and must return an array of `{label, value}` objects or plain strings. Options re-evaluate automatically when referenced attributes change, and the results are persisted on the data object so they survive page reloads.

    <Tip>
      For the full guide on dynamic selection options — including service bridge integration, grid child formulas, dependency tracking, and troubleshooting — see [Selection Option Formulas](/guides/data-definitions/selection-option-formulas).
    </Tip>

    #### Complete Example

    ```yaml theme={null}
    - name: payment_terms
      label: Payment Terms
      taxonType: SELECTION
      semanticDefinition: "The payment terms specified on the invoice"
      selectionOptions:
        - label: "Net 10"
          value: "NET_10"
          id: "terms_net10"
          description: "Payment due in 10 days"
          lexicalRelations:
            - type: SYNONYM
              value: "10 days, N10"
        - label: "Net 30"
          value: "NET_30"
          id: "terms_net30"
          description: "Payment due in 30 days"
          hint: "Most common payment term for B2B invoices"
          lexicalRelations:
            - type: SYNONYM
              value: "30 days, N30, thirty days"
        - label: "Net 60"
          value: "NET_60"
          id: "terms_net60"
          description: "Payment due in 60 days"
          isConditional: true
          conditionalFormula: "total_amount > 10000"
          lexicalRelations:
            - type: SYNONYM
              value: "60 days, N60"
        - label: "Due on Receipt"
          value: "DUE_ON_RECEIPT"
          id: "terms_receipt"
          description: "Payment expected immediately"
          lexicalRelations:
            - type: SYNONYM
              value: "immediate, COD, cash on delivery, payable on receipt"
        - label: "2/10 Net 30"
          value: "2_10_NET_30"
          id: "terms_2_10_net30"
          description: "2% discount if paid within 10 days, otherwise net 30"
          hint: "Early payment discount — **2% off** if paid within 10 days"
          hintMarkdown: true
          disabled: "true"
    ```
  </Tab>

  <Tab title="Boolean">
    ```yaml theme={null}
    taxonType: BOOLEAN
    ```

    **Use for**: Yes/no fields, flags, checkboxes
  </Tab>

  <Tab title="Other Types">
    Additional specialized types:

    * `URL` - Website addresses
    * `EMAIL_ADDRESS` - Email addresses
    * `PHONE_NUMBER` - Phone numbers
    * `PERCENTAGE` - Percentage values
    * `SECTION` - Visual grouping (no data storage)
  </Tab>
</Tabs>

***

### Data Groups and Hierarchies

Groups organize related data elements and can represent repeating structures:

```yaml theme={null}
name: line_items
label: Line Items
group: true                    # This is a group, not a value
children:
  - name: description
    label: Description
    taxonType: STRING

  - name: quantity
    label: Quantity
    taxonType: NUMBER

  - name: unit_price
    label: Unit Price
    taxonType: CURRENCY

  - name: total
    label: Total
    taxonType: CURRENCY
    valuePath: FORMULA
    semanticDefinition: "quantity * unit_price"
```

#### Group Configuration

<ParamField path="group" type="boolean" default="false">
  Mark as a group (container for other data elements)
</ParamField>

<ParamField path="children" type="Taxon[]">
  Array of child data elements nested under this group
</ParamField>

<ParamField path="cardinality" type="object">
  Define how many instances of this group can exist:

  ```yaml theme={null}
  cardinality:
    min: 1      # Minimum required instances
    max: 100    # Maximum allowed instances
  ```
</ParamField>

<ParamField path="naturalKeys" type="object[]">
  Define unique identifiers for group instances:

  ```yaml theme={null}
  naturalKeys:
    - taxonRef: "invoice_number"
    - taxonRef: "line_number"
  ```
</ParamField>

<ParamField path="eventSubscriptions" type="TaxonEventSubscription[]">
  Attach reactive JavaScript scripts to a group data element. Event subscriptions can derive values, enforce business rules, call Service Bridges, create data exceptions, or emit follow-up events when modeled data changes.

  ```yaml theme={null}
  eventSubscriptions:
    - name: derive-total
      on: "changed:dataAttribute:(quantity|unit_price)"
      script: |
        if (!currentObject) return;
        var qty = currentObject.getFirstAttributeValue("quantity");
        var price = currentObject.getFirstAttributeValue("unit_price");
        if (qty && price) {
          currentObject.setAttribute("line_total", qty * price);
        }
  ```

  For the full runtime guide, including the JavaScript objects available to scripts, see [Event-Based Scripting](/guides/data-definitions/event-subscriptions).
</ParamField>

***

### Validation Rules

Define business rules and data quality checks on the data element they apply to:

```yaml theme={null}
validationRules:
  - name: "Total matches sum of line items"
    description: "Ensure calculated total matches the invoice total"
    disabled: false
    conditional: false
    ruleFormula: |
      abs({total_amount} - sum({line_items/total})) < 0.01
    messageFormula: |
      concat(
        "Total mismatch: Invoice shows ",
        {total_amount},
        " but line items sum to ",
        sum({line_items/total})
      )
    detailFormula: |
      "Check line item totals and invoice-level charges."
    overridable: true
    exceptionId: TOTAL_MISMATCH
    supportArticleId: "9117988"

  - name: "Due date after invoice date"
    conditional: true
    conditionalFormula: "!isblank({due_date}) && !isblank({invoice_date})"
    ruleFormula: |
      isafterdate({due_date}, {invoice_date}) || {due_date} = {invoice_date}
    messageFormula: |
      "Due date must be after invoice date"
    overridable: false
```

#### Validation Rule Properties

| Property             | Type    | Description                                            |
| -------------------- | ------- | ------------------------------------------------------ |
| `name`               | string  | Rule name                                              |
| `description`        | string  | Detailed explanation                                   |
| `disabled`           | boolean | Temporarily disable this rule                          |
| `conditional`        | boolean | Only apply if condition is true                        |
| `conditionalFormula` | string  | Formula determining if rule applies                    |
| `ruleFormula`        | string  | Formula that must be true (false = validation failure) |
| `messageFormula`     | string  | Formula generating the error message                   |
| `detailFormula`      | string  | Formula generating additional details                  |
| `overridable`        | boolean | Can users override this validation?                    |
| `exceptionId`        | string  | Unique exception identifier                            |
| `supportArticleId`   | string  | Link to help documentation                             |

<Card title="Validation and Conditional Formatting" icon="shield-check" href="/guides/data-definitions/validation-and-conditional-formatting">
  Read the complete guide for rule placement, exception lifecycle, conditional formatting schema, and the formula language.
</Card>

***

### Conditional Formatting

Apply visual formatting based on data values:

```yaml theme={null}
conditionalFormats:
  - type: backgroundColor
    condition: "isbeforedate({due_date}, datemath('today')) && {status} != 'PAID'"
    properties:
      color: "#FEE2E2"

  - type: textColor
    condition: "isbeforedate({due_date}, datemath('today')) && {status} != 'PAID'"
    properties:
      color: "#991B1B"

  - type: icon
    condition: "{total_amount} > 10000"
    properties:
      icon: alert-circle-outline
      color: "#92400E"
```

***

### Classification Features

Help AI/ML models understand and classify content:

<AccordionGroup>
  <Accordion title="Semantic Definition" icon="brain">
    Provides guidance for AI extraction:

    ```yaml theme={null}
    semanticDefinition: |
      The vendor's legal business name as registered with tax authorities.
      Look for names near "Bill To", "Vendor", or "From" sections.
      Should be a proper business name, not an individual's name.
    ```

    **Best practices**:

    * Be specific about what to look for
    * Describe location hints
    * Clarify edge cases
    * Provide examples if helpful
  </Accordion>

  <Accordion title="Additional Context" icon="list-check">
    Helps with record-based chunking and classification:

    ```yaml theme={null}
    additionContexts:
      - type: RECORD_DEFINITION
        context: |
          Each line item represents a product or service being billed.
          Line items typically appear in a table format.

      - type: RECORD_START_MARKER
        context: "Item #"

      - type: RECORD_END_MARKER
        context: "Subtotal"
    ```

    **Context types**:

    * `RECORD_DEFINITION` - Describes the record structure
    * `RECORD_START_MARKER` - Text indicating record start
    * `RECORD_END_MARKER` - Text indicating record end
    * `RECORD_SECTION_STARTER_MARKER` - Section start marker
    * `RECORD_SECTION_END_MARKER` - Section end marker
  </Accordion>

  <Accordion title="Lexical Relations" icon="link">
    Synonyms and antonyms for embedding-based classification:

    ```yaml theme={null}
    lexicalRelations:
      - type: SYNONYM
        value: "Supplier, Provider, Seller, Merchant"

      - type: ANTONYM
        value: "Customer, Buyer, Client"
    ```

    **Use for**:

    * Improving classification accuracy
    * Handling terminology variations
    * Training embedding models
  </Accordion>
</AccordionGroup>

***

### Advanced Options

#### Display Configuration

Control how fields appear in the UI:

```yaml theme={null}
typeFeatures:
  overrideWidth: true
  displayWidth: 300            # Width in pixels
  expected: true               # Mark as required field
```

#### String Filters

Automatically clean extracted values using regex patterns. **Extract** keeps only matching characters, **Replace** removes matching characters. If both are set, extract runs first. The original value is preserved separately.

```yaml theme={null}
typeFeatures:
  stringExtract: '\d'            # Keep only digits
  stringReplace: '[^a-zA-Z0-9]' # Remove non-alphanumeric characters
```

<Tabs>
  <Tab title="Common Patterns">
    | Pattern         | Effect                                                              |
    | --------------- | ------------------------------------------------------------------- |
    | `\d`            | Keep digits only (use with `stringExtract`)                         |
    | `[a-zA-Z]`      | Keep letters only (use with `stringExtract`)                        |
    | `[-\s]`         | Strip dashes and spaces (use with `stringReplace`)                  |
    | `[^a-zA-Z0-9]`  | Strip all non-alphanumeric (use with `stringReplace`)               |
    | `[^a-zA-Z0-9 ]` | Strip special characters but keep spaces (use with `stringReplace`) |
  </Tab>

  <Tab title="Examples">
    **Pro number cleanup** — remove special characters:

    ```yaml theme={null}
    name: pronumber
    taxonType: STRING
    typeFeatures:
      stringReplace: '[^a-zA-Z0-9 ]'
    ```

    **Phone number — digits only:**

    ```yaml theme={null}
    name: phone
    taxonType: STRING
    typeFeatures:
      stringExtract: '\d'
    ```

    **ID field — strip dashes and spaces:**

    ```yaml theme={null}
    name: tracking_id
    taxonType: STRING
    typeFeatures:
      stringReplace: '[-\s.]'
    ```

    **Combined — extract digits then strip leading zeros:**

    ```yaml theme={null}
    name: account_number
    taxonType: STRING
    typeFeatures:
      stringExtract: '\d'
      stringReplace: '^0+'
    ```
  </Tab>
</Tabs>

#### User Interaction

```yaml theme={null}
multiValue: true               # Allow multiple values
userEditable: true             # User can edit in forms
notUserLabelled: false         # Show in labeling interface
nullable: true                 # Allow null values
nullValue: "N/A"              # Display text for null
```

***

## Common Patterns

### Invoice Extraction

Complete example of a typical invoice data definition:

```yaml theme={null}
slug: invoice-extraction
name: Invoice Data Extraction
taxonomyType: CONTENT
enabled: true
taxons:
  # Header Information
  - name: invoice_number
    label: Invoice Number
    taxonType: STRING
    valuePath: VALUE_OR_ALL_CONTENT
    semanticDefinition: "The unique invoice number, typically at the top right"
    validationRules:
      - name: "Invoice number required"
        ruleFormula: "!isblank({invoice_number})"
        messageFormula: '"Invoice number is required"'
        overridable: false

  - name: invoice_date
    label: Invoice Date
    taxonType: DATE
    valuePath: VALUE_OR_ALL_CONTENT
    semanticDefinition: "The date the invoice was issued"
    typeFeatures:
      normalizeDate: true
      dateFormat: "yyyy-MM-dd"

  - name: due_date
    label: Due Date
    taxonType: DATE
    valuePath: VALUE_OR_ALL_CONTENT
    semanticDefinition: "The payment due date"

  # Vendor Information Group
  - name: vendor
    label: Vendor
    group: true
    children:
      - name: name
        label: Vendor Name
        taxonType: STRING
        valuePath: VALUE_OR_ALL_CONTENT
        semanticDefinition: "The vendor's business name"

      - name: address
        label: Address
        taxonType: STRING
        valuePath: VALUE_OR_ALL_CONTENT
        typeFeatures:
          longText: true

      - name: tax_id
        label: Tax ID
        taxonType: STRING
        valuePath: VALUE_OR_ALL_CONTENT

  # Line Items (Repeating Group)
  - name: line_items
    label: Line Items
    group: true
    children:
      - name: description
        label: Description
        taxonType: STRING

      - name: quantity
        label: Quantity
        taxonType: NUMBER

      - name: unit_price
        label: Unit Price
        taxonType: CURRENCY

      - name: line_total
        label: Line Total
        taxonType: CURRENCY
        valuePath: FORMULA
        semanticDefinition: "{quantity} * {unit_price}"

  # Totals
  - name: subtotal
    label: Subtotal
    taxonType: CURRENCY
    valuePath: FORMULA
    semanticDefinition: "sum({line_items/line_total})"

  - name: tax_amount
    label: Tax Amount
    taxonType: CURRENCY
    valuePath: VALUE_OR_ALL_CONTENT

  - name: total_amount
    label: Total Amount
    taxonType: CURRENCY
    valuePath: VALUE_OR_ALL_CONTENT
    validationRules:
      - name: "Total calculation check"
        ruleFormula: "abs({total_amount} - ({subtotal} + ifnull({tax_amount}, 0))) < 0.01"
        messageFormula: '"Total amount mismatch"'
        overridable: true
```

### Contract Data Extraction

```yaml theme={null}
slug: contract-extraction
name: Contract Data Extraction
taxons:
  - name: contract_metadata
    label: Contract Metadata
    group: true
    children:
      - name: contract_number
        label: Contract Number
        taxonType: STRING

      - name: contract_type
        label: Contract Type
        taxonType: SELECTION
        selectionOptions:
          - label: "Service Agreement"
            id: "service"
          - label: "Purchase Agreement"
            id: "purchase"
          - label: "NDA"
            id: "nda"
          - label: "License Agreement"
            id: "license"

  - name: parties
    label: Parties
    group: true
    children:
      - name: party_a
        label: Party A
        taxonType: STRING

      - name: party_b
        label: Party B
        taxonType: STRING

  - name: key_terms
    label: Key Terms
    group: true
    children:
      - name: effective_date
        label: Effective Date
        taxonType: DATE

      - name: term_length
        label: Term Length
        taxonType: STRING
        semanticDefinition: "Duration of the contract (e.g., '12 months', '2 years')"

      - name: auto_renewal
        label: Auto Renewal
        taxonType: BOOLEAN
        semanticDefinition: "Does the contract automatically renew?"

      - name: termination_notice
        label: Termination Notice Period
        taxonType: STRING
        semanticDefinition: "Required notice period for termination (e.g., '30 days')"

  - name: financial_terms
    label: Financial Terms
    group: true
    children:
      - name: total_value
        label: Total Contract Value
        taxonType: CURRENCY

      - name: payment_terms
        label: Payment Terms
        taxonType: STRING
        semanticDefinition: "Payment schedule and terms (e.g., 'Net 30', 'Monthly in advance')"
```

***

## Best Practices

### Naming Conventions

<CodeGroup>
  ```yaml Good theme={null}
  name: vendor_name           # Snake case for internal names
  label: Vendor Name          # Title case for display
  externalName: vendorName    # Camel case for APIs
  ```

  ```yaml Avoid theme={null}
  name: VendorName            # Don't use capitals in internal names
  label: vendor_name          # Don't use technical names for display
  ```
</CodeGroup>

### Semantic Definitions

<Tip>
  Write semantic definitions as if explaining to a human what to look for. Be specific about:

  * What the field represents
  * Where it typically appears
  * How to identify it
  * Edge cases to consider
</Tip>

**Good example**:

```yaml theme={null}
semanticDefinition: |
  The total amount due on the invoice, including all taxes and fees.
  Look for labels like "Total", "Amount Due", "Balance Due", or "Total Amount".
  This should be the final bottom-line number, not a subtotal.
  If multiple totals exist (e.g., by currency), extract the primary total.
```

**Avoid**:

```yaml theme={null}
semanticDefinition: "The total"  # Too vague
```

### Group Structures

<CodeGroup>
  ```yaml Repeating Data (Use Groups) theme={null}
  # Good: Line items are a repeating group
  - name: line_items
    label: Line Items
    group: true
    children:
      - name: description
      - name: quantity
      - name: price
  ```

  ```yaml Single Instances (Use Groups for Organization) theme={null}
  # Good: Vendor info as an organizational group
  - name: vendor
    label: Vendor Information
    group: true
    children:
      - name: name
      - name: address
      - name: tax_id
  ```
</CodeGroup>

### Validation Strategy

1. **Start Simple**: Begin with basic "not empty" validations
2. **Add Business Rules**: Implement domain-specific validations
3. **Make Critical Rules Non-Overridable**: Block processing if essential data is wrong
4. **Allow Overrides for Quality Checks**: Let users override formatting or minor issues

```yaml theme={null}
validationRules:
  # Critical: Don't allow override
  - name: "Invoice number required"
    ruleFormula: "!isblank({invoice_number})"
    overridable: false

  # Quality check: Allow override
  - name: "Total seems high"
    ruleFormula: "{total_amount} < 100000"
    messageFormula: '"Invoice total exceeds $100,000 - please verify"'
    overridable: true
```

### Formula Usage

<AccordionGroup>
  <Accordion title="Simple Calculations" icon="calculator">
    ```yaml theme={null}
    semanticDefinition: "{quantity} * {unit_price}"
    ```
  </Accordion>

  <Accordion title="Aggregations" icon="function">
    ```yaml theme={null}
    semanticDefinition: "sum({line_items/total})"
    ```
  </Accordion>

  <Accordion title="Conditional Logic" icon="code-branch">
    ```yaml theme={null}
    semanticDefinition: |
      if({total_amount} > 10000, "Requires Approval", "Auto-Approve")
    ```
  </Accordion>

  <Accordion title="Date Calculations" icon="calendar">
    ```yaml theme={null}
    semanticDefinition: "datemath({invoice_date}, 'days', 30)"
    ```
  </Accordion>
</AccordionGroup>

***

## Troubleshooting

### Common Issues

<AccordionGroup>
  <Accordion title="Data element not appearing in UI">
    **Possible causes**:

    * `enabled: false` is set
    * Parent data element is disabled (disabling cascades to children)
    * `notUserLabelled: true` for labeling interfaces

    **Solution**: Check enabled status up the hierarchy
  </Accordion>

  <Accordion title="Extraction not working">
    **Check**:

    1. Is `valuePath` correct for your use case?
    2. Is `semanticDefinition` clear and specific?
    3. Are you using the right `taxonType`?
    4. Is the model trained for this document type?
  </Accordion>

  <Accordion title="Formula errors">
    **Common mistakes**:

    * Referencing data elements that don't exist
    * Syntax errors in formula
    * Circular references

    **Test**: Use formula builder to validate syntax
  </Accordion>

  <Accordion title="Validation not triggering">
    **Check**:

    * Is validation rule `disabled: false`?
    * Does `conditionalFormula` evaluate to true?
    * Is `ruleFormula` returning the expected boolean?
  </Accordion>
</AccordionGroup>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Data Types Reference" icon="play" href="/guides/data-definitions/data-types">
    Reference guide for all supported data types
  </Card>

  <Card title="Formula Reference" icon="function" href="/guides/formulas/index">
    Complete formula function reference
  </Card>

  <Card title="Selection Option Formulas" icon="list-dropdown" href="/guides/data-definitions/selection-option-formulas">
    Compute dropdown options dynamically using JavaScript and service bridges
  </Card>

  <Card title="Event-Based Scripting" icon="bolt" href="/guides/data-definitions/event-subscriptions">
    Attach reactive JavaScript behavior to group data elements
  </Card>

  <Card title="Scripting Reference" icon="code" href="/guides/scripting/index">
    Complete API reference for Kodexa JavaScript scripting
  </Card>

  <Card title="Validation and Formatting" icon="shield-check" href="/guides/data-definitions/validation-and-conditional-formatting">
    Common validation rule and conditional formatting patterns
  </Card>
</CardGroup>
