Python SDK

The Kodexa Document Python SDK (kodexa-document) provides a high-performance library for working with KDDB documents and the Kodexa platform. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.

Architecture

The SDK has a layered architecture:

Go Core — Document storage (KDDB/SQLite), content tree operations, extraction engine, and delta tracking are implemented in Go for performance
CFFI Bridge — Python calls into the Go shared library via CFFI, with automatic handle management using weakref.finalize
Pydantic Models — 959 platform models are auto-generated from the OpenAPI spec using datamodel-codegen, providing full type safety
Platform Client — A REST client for interacting with the Kodexa API, built on the generated models

Requirements

Python 3.12 or higher
pip (Python package manager)

Installation

Install the SDK from PyPI:

pip install kodexa-document

The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.

Optional Dependencies

Install additional features as needed:

# Development tools (pytest, black, mypy, etc.)
pip install kodexa-document[dev]

# Testing only
pip install kodexa-document[test]

# Platform integrations (requests, numpy, pydantic-yaml)
pip install kodexa-document[platform]

Verify Installation

Test your installation:

from kodexa_document import Document

# Create a new document
with Document() as doc:
    print(f"Document UUID: {doc.uuid}")
    print(f"Version: {doc.version}")

Module Organization

The SDK is organized into several subsystems:

Module	Import Path	Description
Core Document	`kodexa_document`	`Document`, `ContentNode`, `ContentFeature`, `Tag`
Extraction	`kodexa_document`	`ExtractionEngine`, `Taxonomy`, `DataObject`, `DataAttribute`
Accessors	`kodexa_document`	`DataObjectAccessor`, `DataAttributeAccessor`, `AuditAccessor`, `DeltaAccessor`, `NoteAccessor`, `NativeDocumentAccessor`
Transactions	`kodexa_document`	`TransactionContext`, `BatchTransactionContextManager`
Processing	`kodexa_document`	`ProcessingStep`, `KnowledgeItem`, `KnowledgeFeature`
Platform Models	`kodexa_document.model._generated`	959 auto-generated Pydantic models (`Organization`, `Project`, `Task`, etc.)
Platform Client	`kodexa_document.platform`	`KodexaClient`, `KodexaPlatform` (lazy import)
LLM	`kodexa_document.llm`	`ModelManager`, `ChatMessage`, `LLMUsageMetrics` (lazy import)

Platform, LLM, and assistant modules use lazy imports to avoid pulling in heavy dependencies. Import them explicitly when needed:

from kodexa_document.platform import KodexaClient, KodexaPlatform

Key Imports

The most commonly used imports from the top-level package:

from kodexa_document import (
    # Core
    Document, ContentNode, ContentFeature, Tag,

    # Extraction
    ExtractionEngine, Taxonomy, DataObject, DataAttribute, DataException,

    # Accessors
    DataObjectAccessor, DataAttributeAccessor, AuditAccessor,
    DeltaAccessor, NativeDocumentAccessor, NoteAccessor,

    # Transactions
    TransactionContext, BatchTransactionContextManager,

    # Processing
    ProcessingStep,

    # Errors
    DocumentError, DocumentNotFoundError, ExtractionError,
)

What’s Next?

Getting Started

Learn the basics of creating and manipulating documents

Platform Models

Auto-generated Pydantic models from the OpenAPI spec

Platform Client

REST API client for the Kodexa platform

Extraction

Extract structured data from documents using taxonomies

Processing

Track processing steps and knowledge items

Package Information

Property	Value
Package Name	`kodexa-document`
PyPI	pypi.org/project/kodexa-document
License	Apache-2.0
Repository	github.com/kodexa-ai/kodexa-document

Overview

Document Data

Document Structure

Structured Data

Change Management

Python SDK

Architecture

Requirements

Installation

Optional Dependencies

Verify Installation

Module Organization

Key Imports

What’s Next?

Getting Started

Platform Models

Platform Client

Extraction

Processing

Package Information

Overview

Document Data

Document Structure

Structured Data

Change Management

Python SDK

​Architecture

​Requirements

​Installation

​Optional Dependencies

​Verify Installation

​Module Organization

​Key Imports

​What’s Next?

Getting Started

Platform Models

Platform Client

Extraction

Processing

​Package Information

Architecture

Requirements

Installation

Optional Dependencies

Verify Installation

Module Organization

Key Imports

What’s Next?

Package Information