Skip to main content
The Kodexa Document Python SDK (kodexa-document) provides a high-performance library for working with KDDB documents and the Kodexa platform. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.

Architecture

The SDK has a layered architecture:
  • Go Core — Document storage (KDDB/SQLite), content tree operations, extraction engine, and delta tracking are implemented in Go for performance
  • CFFI Bridge — Python calls into the Go shared library via CFFI, with automatic handle management using weakref.finalize
  • Pydantic Models — 959 platform models are auto-generated from the OpenAPI spec using datamodel-codegen, providing full type safety
  • Platform Client — A REST client for interacting with the Kodexa API, built on the generated models

Requirements

  • Python 3.12 or higher
  • pip (Python package manager)

Installation

Install the SDK from PyPI:
pip install kodexa-document
The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.

Optional Dependencies

Install additional features as needed:
# Development tools (pytest, black, mypy, etc.)
pip install kodexa-document[dev]

# Testing only
pip install kodexa-document[test]

# Platform integrations (requests, numpy, pydantic-yaml)
pip install kodexa-document[platform]

Verify Installation

Test your installation:
from kodexa_document import Document

# Create a new document
with Document() as doc:
    print(f"Document UUID: {doc.uuid}")
    print(f"Version: {doc.version}")

Module Organization

The SDK is organized into several subsystems:
ModuleImport PathDescription
Core Documentkodexa_documentDocument, ContentNode, ContentFeature, Tag
Extractionkodexa_documentExtractionEngine, Taxonomy, DataObject, DataAttribute
Accessorskodexa_documentDataObjectAccessor, DataAttributeAccessor, AuditAccessor, DeltaAccessor, NoteAccessor, NativeDocumentAccessor
Transactionskodexa_documentTransactionContext, BatchTransactionContextManager
Processingkodexa_documentProcessingStep, KnowledgeItem, KnowledgeFeature
Platform Modelskodexa_document.model._generated959 auto-generated Pydantic models (Organization, Project, Task, etc.)
Platform Clientkodexa_document.platformKodexaClient, KodexaPlatform (lazy import)
LLMkodexa_document.llmModelManager, ChatMessage, LLMUsageMetrics (lazy import)
Platform, LLM, and assistant modules use lazy imports to avoid pulling in heavy dependencies. Import them explicitly when needed:
from kodexa_document.platform import KodexaClient, KodexaPlatform

Key Imports

The most commonly used imports from the top-level package:
from kodexa_document import (
    # Core
    Document, ContentNode, ContentFeature, Tag,

    # Extraction
    ExtractionEngine, Taxonomy, DataObject, DataAttribute, DataException,

    # Accessors
    DataObjectAccessor, DataAttributeAccessor, AuditAccessor,
    DeltaAccessor, NativeDocumentAccessor, NoteAccessor,

    # Transactions
    TransactionContext, BatchTransactionContextManager,

    # Processing
    ProcessingStep,

    # Errors
    DocumentError, DocumentNotFoundError, ExtractionError,
)

What’s Next?

Package Information

PropertyValue
Package Namekodexa-document
PyPIpypi.org/project/kodexa-document
LicenseApache-2.0
Repositorygithub.com/kodexa-ai/kodexa-document