Skip to main content
The Kodexa Document Python SDK (kodexa-document) provides a high-performance library for working with KDDB documents. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.

Requirements

  • Python 3.12 or higher
  • pip (Python package manager)

Installation

Install the SDK from PyPI:
pip install kodexa-document
The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.

Optional Dependencies

Install additional features as needed:
# Development tools (pytest, black, mypy, etc.)
pip install kodexa-document[dev]

# Testing only
pip install kodexa-document[test]

# Platform integrations (requests, numpy, pydantic-yaml)
pip install kodexa-document[platform]

Verify Installation

Test your installation:
from kodexa_document import Document

# Create an in-memory document
with Document(inmemory=True) as doc:
    print(f"Document UUID: {doc.uuid}")
    print(f"Version: {doc.version}")
You should see output like:
Document UUID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Version: 6.0.0

Core Concepts

Documents and Nodes

A Document is the top-level container that holds:
  • Metadata: Document-level key-value pairs
  • Labels: Categorization tags for the document
  • Content Node: The root of the hierarchical content tree
ContentNode represents individual pieces of content organized in a tree:
  • Type: Classification (paragraph, heading, table, cell, etc.)
  • Content: The actual text or data
  • Features: Key-value metadata specific to the node
  • Tags: Annotations with optional confidence and value

In-Memory vs File-Based

The SDK supports two storage modes:
ModeCreation TimeUse Case
In-Memory~1.2msProcessing pipelines, temporary documents
File-Based~121msPersistent storage, large documents
# In-memory (fast, temporary)
doc = Document(inmemory=True)

# File-based (persistent)
doc = Document(inmemory=False)
Use inmemory=True for processing pipelines to get ~100x faster performance. Save to file only when you need persistence.

What’s Next?

Package Information

PropertyValue
Package Namekodexa-document
PyPIpypi.org/project/kodexa-document
LicenseApache-2.0
Repositorygithub.com/kodexa-ai/kodexa-document