> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Definition Concepts

> Understand Data Definitions in Kodexa and how they model the structured data, groups, validation rules, and extraction targets for document-heavy workflows.

Data Definitions describe the structured information Kodexa should extract, validate, review, and use in downstream Activity Plans. They turn a business document into a clear data model: the fields you care about, the groups those fields belong to, the types those fields should normalize into, and the rules that determine whether the extracted data is ready to use.

<Note>
  In some APIs, SDKs, and configuration files, Data Definitions are still represented by the historical terms `taxonomy` and `taxon`. In user-facing documentation, think **Data Definition** for the overall model and **Data Element** for each field or group inside it.
</Note>

## What Is a Data Definition?

A Data Definition is a hierarchy of data elements. Each element represents either:

* A piece of data to extract, validate, calculate, or review
* A group that organizes related data elements
* A repeating group, such as invoice line items or contract parties

This hierarchy becomes the shared model used by extraction, validation, review forms, Activity steps, and downstream systems.

### Example Structure

```text theme={null}
Invoice (Data Definition)
├── Invoice Number (data element)
├── Invoice Date (data element)
├── Vendor (group)
│   ├── Name (data element)
│   ├── Address (data element)
│   └── Tax ID (data element)
└── Line Items (repeating group)
    ├── Description (data element)
    ├── Quantity (data element)
    ├── Unit Price (data element)
    └── Total (calculated element)
```

***

## Data Definition Roles

Most business users think about Data Definitions as the final data they want from a document. Kodexa also uses Data Definition structures during processing so modules, Activity steps, and model outputs can share the same vocabulary.

<CardGroup cols={3}>
  <Card title="Content Data Definition" icon="file-lines">
    Defines the business-level data extracted from documents. This is the main model used for final output, review, validation, and downstream integrations.

    **Use for**: Business data extraction, final output structure
  </Card>

  <Card title="Processing Data Definition" icon="gears">
    Supports intermediate processing work. These structures can be provided by modules or Activity Plan steps and become available when those resources are bound into a project.

    **Use for**: Intermediate labels, routing signals, AI model support
  </Card>

  <Card title="Module Data Definition" icon="brain">
    Comes from modules used for training or inference. These structures become available when you add a module to a project or reference it from an Activity Plan.

    **Use for**: ML module training, module-specific labels
  </Card>
</CardGroup>

***

## Key Concepts

### Data Elements

In configuration, data elements are written under the API field `taxons`. Each element can be a simple field, a group, or a repeating group.

**Simple Data Element**:

```yaml theme={null}
taxons:
  - name: invoice_number
    label: Invoice Number
    taxonType: STRING
```

**Group Container**:

```yaml theme={null}
taxons:
  - name: vendor
    label: Vendor Information
    group: true
    children:
      - name: name
      - name: address
```

**Repeating Group**:

```yaml theme={null}
taxons:
  - name: line_items
    label: Line Items
    group: true
    allowsMultipleEntries: true
    children:
      - name: description
      - name: quantity
```

### Hierarchy and Relationships

Data Definitions use parent-child relationships to organize data:

* **Root elements**: Top-level fields or groups
* **Child elements**: Fields nested inside a parent group
* **Sibling elements**: Fields at the same level in the model

This structure helps Kodexa:

* Organize related data logically
* Mirror the way information appears in documents
* Improve extraction and review accuracy
* Produce output that downstream systems can understand

***

## Data Definition Lifecycle

### 1. Design Phase

Define the model from the business problem:

* What documents are involved?
* What data must be extracted?
* Which fields repeat?
* Which values need review or validation?
* Which downstream systems will consume the output?

### 2. Configuration Phase

Set properties for each data element:

* Data type
* Value source
* Semantic definition
* Validation rules
* Conditional formatting
* Event-based scripts

### 3. Training and Testing Phase

Use the Data Definition to:

* Label representative documents
* Train or evaluate extraction models
* Refine semantic definitions
* Test validation and review behavior

### 4. Production Phase

Use the Data Definition in live workflows to:

* Extract structured data from new documents
* Validate extracted values
* Present data to reviewers through Data Forms
* Feed Activity Plan steps and downstream integrations

***

## Common Use Cases

### Invoice Processing

```yaml theme={null}
taxons:
  - name: header
    group: true
    children:
      - name: invoice_number
      - name: invoice_date
      - name: due_date

  - name: vendor
    group: true
    children:
      - name: name
      - name: address

  - name: line_items
    group: true
    children:
      - name: description
      - name: quantity
      - name: unit_price
```

### Contract Metadata

```yaml theme={null}
taxons:
  - name: contract_type
    taxonType: SELECTION

  - name: parties
    group: true
    children:
      - name: party_a
      - name: party_b

  - name: key_dates
    group: true
    children:
      - name: effective_date
      - name: expiration_date
```

### Form Data

```yaml theme={null}
taxons:
  - name: applicant
    group: true
    children:
      - name: full_name
      - name: email
      - name: phone

  - name: application_details
    group: true
    children:
      - name: application_type
      - name: submission_date
```

***

## Best Practices

<AccordionGroup>
  <Accordion title="Design for Reuse" icon="recycle">
    Create Data Definitions that can be reused across similar document types:

    * Use business names that make sense across teams
    * Factor out common structures
    * Keep repeated document patterns consistent
  </Accordion>

  <Accordion title="Keep It Simple" icon="wand-magic-sparkles">
    Start with the fields that drive the workflow:

    * Begin with core data elements
    * Add groups only where they improve clarity
    * Introduce validation incrementally
  </Accordion>

  <Accordion title="Mirror the Business Document" icon="sitemap">
    Align the hierarchy with how users understand the document:

    * Match visual organization where it helps
    * Follow natural reading order
    * Group related information
  </Accordion>

  <Accordion title="Use Meaningful Names" icon="signature">
    Choose clear, stable names:

    * Use business terminology
    * Be specific and unambiguous
    * Follow consistent naming conventions
  </Accordion>
</AccordionGroup>

***

## Learn More

<CardGroup cols={2}>
  <Card title="Data Definition Structure" icon="book-open" href="/guides/data-definitions/taxonomy-guide">
    Configure data elements, groups, value sources, and extraction behavior
  </Card>

  <Card title="Data Types Reference" icon="list" href="/guides/data-definitions/data-types">
    Detailed information about available data types and normalization
  </Card>

  <Card title="Building Data Classes" icon="code" href="/guides/data-definitions/building-data-classes">
    Generate Python data classes from Data Definitions for programmatic access
  </Card>

  <Card title="Data Definitions Overview" icon="diagram-project" href="/guides/data-definitions">
    Overall Data Definition concepts and patterns
  </Card>
</CardGroup>

***

## Next Steps

<Steps>
  <Step title="Review the structure guide">
    Read [Data Definition Structure](/guides/data-definitions/taxonomy-guide) for detailed configuration instructions.
  </Step>

  <Step title="Explore examples">
    Check out [Data Definition examples](/guides/data-definitions/examples/invoice) for common document types.
  </Step>

  <Step title="Try it in a project">
    Create your first Data Definition in a Kodexa project and test it with sample documents.
  </Step>

  <Step title="Iterate and refine">
    Use extraction results to improve semantic definitions, validation rules, and review behavior.
  </Step>
</Steps>
