kodexa-document) provides a high-performance library for working with KDDB documents and the Kodexa platform. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.
Architecture
The SDK has a layered architecture:- Go Core — Document storage (KDDB/SQLite), content tree operations, extraction engine, and delta tracking are implemented in Go for performance
- CFFI Bridge — Python calls into the Go shared library via CFFI, with automatic handle management using
weakref.finalize - Pydantic Models — 959 platform models are auto-generated from the OpenAPI spec using
datamodel-codegen, providing full type safety - Platform Client — A REST client for interacting with the Kodexa API, built on the generated models
Requirements
- Python 3.12 or higher
- pip (Python package manager)
Installation
Install the SDK from PyPI:The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.
Optional Dependencies
Install additional features as needed:Verify Installation
Test your installation:Module Organization
The SDK is organized into several subsystems:| Module | Import Path | Description |
|---|---|---|
| Core Document | kodexa_document | Document, ContentNode, ContentFeature, Tag |
| Extraction | kodexa_document | ExtractionEngine, Taxonomy, DataObject, DataAttribute |
| Accessors | kodexa_document | DataObjectAccessor, DataAttributeAccessor, AuditAccessor, DeltaAccessor, NoteAccessor, NativeDocumentAccessor |
| Transactions | kodexa_document | TransactionContext, BatchTransactionContextManager |
| Processing | kodexa_document | ProcessingStep, KnowledgeItem, KnowledgeFeature |
| Platform Models | kodexa_document.model._generated | 959 auto-generated Pydantic models (Organization, Project, Task, etc.) |
| Platform Client | kodexa_document.platform | KodexaClient, KodexaPlatform (lazy import) |
| LLM | kodexa_document.llm | ModelManager, ChatMessage, LLMUsageMetrics (lazy import) |
Platform, LLM, and assistant modules use lazy imports to avoid pulling in heavy dependencies. Import them explicitly when needed:
Key Imports
The most commonly used imports from the top-level package:What’s Next?
Getting Started
Learn the basics of creating and manipulating documents
Platform Models
Auto-generated Pydantic models from the OpenAPI spec
Platform Client
REST API client for the Kodexa platform
Extraction
Extract structured data from documents using taxonomies
Processing
Track processing steps and knowledge items
Package Information
| Property | Value |
|---|---|
| Package Name | kodexa-document |
| PyPI | pypi.org/project/kodexa-document |
| License | Apache-2.0 |
| Repository | github.com/kodexa-ai/kodexa-document |
