Search

Introduction

Introduction

Kodexa is an AI-powered document processing platform that transforms unstructured documents into high-quality structured data. By seamlessly integrating AI capabilities with robust document processing infrastructure, Kodexa enables organizations to efficiently extract, process, and manage information from their documents.

Core Platform Concepts

Document Processing & AI Integration

  • Documents are parsed into standardized structures that can be processed by various AI models
  • Generative AI capabilities are deeply integrated to handle unstructured data, tabular content, and even low-quality scans
  • The platform is model vendor-agnostic, allowing you to use the best AI models for specific use cases
  • Built-in guardrails ensure data lineage and quality

Task-Based Architecture

The platform is built around the concept of Tasks, which provide:

  • A framework for AI/human collaboration
  • Context for processing multiple related documents
  • Integration points for workflow management
  • Support for comments, assignments, and progress tracking

Infrastructure Components

Core Services

  • Operational Data Store for managing metadata, lineage, and orchestration
  • Event-bus infrastructure for scalable processing
  • S3-based Storage Layer for Data Lake capabilities
  • OpenSearch Index Services for monitoring and reporting

Developer Tools

  • RESTful API supporting all platform capabilities
  • Python SDK for easy integration
  • Studio interface for designing, testing, and debugging implementations
  • Workflow interface for managing human-in-the-loop tasks

Platform Features

  • Rich Human-in-the-Loop tools with feedback mechanisms
  • Powerful validation and rules engine
  • Comprehensive logging and analytics
  • Blue/Green deployment support for AI models
  • Event-based processing architecture

Resource-Driven Design

The Kodexa platform allows for sharable “resources” to be defined, these resources are the building blocks of AI-driven document automation.

Metadata Classes

The Kodexa platform uses a hierarchy of metadata classes to represent various components and configurations:

Action

Represents a specific action in the system. Actions are discrete operations that can be performed within the Kodexa platform, such as processing documents, triggering workflows, or executing custom logic.

AssistantDefinition

Defines an AI assistant's capabilities. This class encapsulates the configuration, behavior, and functionality of AI assistants used in the platform for various tasks such as document analysis, question answering, or task automation.

CredentialDefinition

Defines credential types and their properties. This class is used to specify different types of authentication and authorization credentials used across the platform, ensuring secure access to various resources and services.

Dashboard

Represents a dashboard configuration. Dashboards provide a visual interface for users to monitor, analyze, and interact with data and processes within the Kodexa platform.

DataForm

Defines structure for data input forms. This class is used to create and manage forms for data entry, ensuring consistent and structured data collection across the platform.

ExtensionPack

Represents a package of platform extensions. Extension packs allow for the addition of new functionality, integrations, or customizations to the Kodexa platform, enhancing its capabilities and adaptability.

GuidanceSet

Defines a set of guidance rules or instructions. Guidance sets provide structured information to guide users or automated processes through complex tasks or decision-making scenarios.

ModelRuntime

Represents a runtime environment for models. This class defines the configuration and requirements for executing machine learning or AI models within the Kodexa platform, ensuring proper resource allocation and execution.

Pipeline

Defines a sequence of processing steps. Pipelines orchestrate the flow of data and operations, allowing for complex, multi-stage processing of documents or data within the platform.

ProjectTemplate

Represents a template for creating projects. Project templates provide predefined structures, configurations, and resources to streamline the creation of new projects within the Kodexa platform.

Prompt

Defines a prompt template for AI interactions. This class is used to create structured prompts for AI models, ensuring consistent and effective communication between users and AI assistants.

RuleSet

Represents a set of business or processing rules. Rule sets define logical conditions and actions to be applied to data or processes, enabling dynamic and configurable behavior within the platform.

Store

Represents a data store configuration. This class defines the properties and settings for various data storage solutions used within the Kodexa platform, ensuring proper data management and access.

Taxonomy

Defines a hierarchical classification system. Taxonomies provide a structured way to categorize and organize information within the platform, facilitating efficient data retrieval and analysis.