Skip to main content
The Kodexa Document Python SDK (kodexa-document) provides a high-performance library for working with KDDB documents and the Kodexa platform. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.

Architecture

The SDK has a layered architecture:
  • Go Core — Document storage (KDDB/SQLite), content tree operations, extraction engine, and delta tracking are implemented in Go for performance
  • CFFI Bridge — Python calls into the Go shared library via CFFI, with automatic handle management using weakref.finalize
  • Pydantic Models — 959 platform models are auto-generated from the OpenAPI spec using datamodel-codegen, providing full type safety
  • Platform Client — A REST client for interacting with the Kodexa API, built on the generated models

Requirements

  • Python 3.12 or higher
  • pip (Python package manager)

Installation

Install the SDK from PyPI:
pip install kodexa-document
The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.

Optional Dependencies

Install additional features as needed:
# Development tools (pytest, black, mypy, etc.)
pip install kodexa-document[dev]

# Testing only
pip install kodexa-document[test]

# Platform integrations (requests, numpy, pydantic-yaml)
pip install kodexa-document[platform]

Verify Installation

Test your installation:
from kodexa_document import Document

# Create a new document
with Document() as doc:
    print(f"Document UUID: {doc.uuid}")
    print(f"Version: {doc.version}")

Module Organization

The SDK is organized into several subsystems:
ModuleImport PathDescription
Core Documentkodexa_documentDocument, ContentNode, ContentFeature, Tag
Extractionkodexa_documentExtractionEngine, Taxonomy, DataObject, DataAttribute
Accessorskodexa_documentDataObjectAccessor, DataAttributeAccessor, AuditAccessor, DeltaAccessor, NoteAccessor, NativeDocumentAccessor
Transactionskodexa_documentTransactionContext, BatchTransactionContextManager
Processingkodexa_documentProcessingStep, KnowledgeItem, KnowledgeFeature
Platform Modelskodexa_document.model._generated959 auto-generated Pydantic models (Organization, Project, Task, etc.)
Platform Clientkodexa_document.platformKodexaClient, KodexaPlatform (lazy import)
LLMkodexa_document.llmModelManager, ChatMessage, LLMUsageMetrics (lazy import)
Platform, LLM, and assistant modules use lazy imports to avoid pulling in heavy dependencies. Import them explicitly when needed:
from kodexa_document.platform import KodexaClient, KodexaPlatform

Key Imports

The most commonly used imports from the top-level package:
from kodexa_document import (
    # Core
    Document, ContentNode, ContentFeature, Tag,

    # Extraction
    ExtractionEngine, Taxonomy, DataObject, DataAttribute, DataException,

    # Accessors
    DataObjectAccessor, DataAttributeAccessor, AuditAccessor,
    DeltaAccessor, NativeDocumentAccessor, NoteAccessor,

    # Transactions
    TransactionContext, BatchTransactionContextManager,

    # Processing
    ProcessingStep,

    # Errors
    DocumentError, DocumentNotFoundError, ExtractionError,
)

What’s Next?

Getting Started

Learn the basics of creating and manipulating documents

Platform Models

Auto-generated Pydantic models from the OpenAPI spec

Platform Client

REST API client for the Kodexa platform

Extraction

Extract structured data from documents using taxonomies

Processing

Track processing steps and knowledge items

LLM & Model Manager

Access large language models through the AI Gateway

Package Information

PropertyValue
Package Namekodexa-document
PyPIpypi.org/project/kodexa-document
LicenseApache-2.0
Repositorygithub.com/kodexa-ai/kodexa-document