Overview
Data definitions in Kodexa provide the structure and rules for extracting, validating, and processing information from documents. They define what data to extract, how to validate it, and how to present it to users.What Are Data Definitions?
Data definitions are the blueprints for your document processing workflows. They specify:- Structure: What data elements exist and how they relate
- Types: What kind of data each field contains (text, numbers, dates, etc.)
- Sources: Where data comes from (document content, metadata, calculations, external systems)
- Validation: Business rules and data quality checks
- Presentation: How data appears in forms and exports
Data Definitions
Hierarchical structures defining data elements for extraction
Data Forms
UI configurations for displaying and editing extracted data
Formulas
Calculation logic for derived and computed fields
Validation Rules
Business rules and data quality constraints
Core Concepts
Data Structure
Data definitions are hierarchical structures of data elements (taxons) that define what to extract from documents. Example use cases:- Invoice data extraction (vendor, line items, totals)
- Contract metadata (parties, dates, terms)
- Form processing (applicant info, answers, signatures)
Complete Definition Guide
Comprehensive guide to configuring data definitions
Data Types
Kodexa supports rich data types for accurate extraction and validation:- Basic Types
- Specialized Types
- Complex Types
- STRING - Text of any length
- NUMBER - Numeric values
- BOOLEAN - True/false values
- DATE - Calendar dates
- DATE_TIME - Dates with timestamps
Data Sources
Define where each data element gets its value:Document Extraction
Document Extraction
Extract directly from document content using AI/ML models
Metadata
Metadata
Pull from document properties and system fields
Formulas
Formulas
Calculate from other fields
External Data
External Data
Fetch from APIs or databases
Common Patterns
Invoice Processing
Extract structured data from invoices:Contract Metadata
Capture key contract information:Form Data
Process form submissions:Validation and Quality
Validation Rules
Define business rules to ensure data quality:Conditional Formatting
Apply visual cues based on data values:Best Practices
Design Principles
Start Simple, Iterate
Start Simple, Iterate
Begin with core fields and add complexity as needed. Don’t over-engineer initial data definitions.Start with:
- Essential fields only
- Basic data types
- Simple validation
- Computed fields
- Complex validations
- Conditional formatting
Use Semantic Definitions Well
Use Semantic Definitions Well
Write clear, specific extraction prompts:Good:Avoid:
Organize with Groups
Organize with Groups
Use groups to:Repeating groups: Collections
- Organize related fields logically
- Handle repeating structures (line items, signatories)
- Improve UI presentation
Validate Strategically
Validate Strategically
Critical validations (non-overridable):
- Required fields
- Data type constraints
- Business logic rules
- Unusual values
- Formatting issues
- Threshold warnings
Naming Conventions
Use consistent naming across your data definitions:Getting Started
1
Understand Your Documents
Analyze the documents you’ll process:
- What data needs to be extracted?
- What’s the document structure?
- What validations are needed?
2
Design Your Data Definition
Sketch out the data structure:
- List all required fields
- Group related fields
- Identify repeating sections
3
Configure Data Elements
For each field, define:
- Data type
- Value source
- Semantic definition
- Validation rules
4
Test and Iterate
Process sample documents:
- Verify extraction accuracy
- Refine semantic definitions
- Adjust validation rules
Learn More
Complete Definition Guide
Comprehensive reference for all data definition configuration options
Formula Reference
Built-in functions for calculations and validations
Data Forms
Configure UI for reviewing and editing extracted data
API Documentation
Programmatic access to data definition management
