kodexa-document) provides a high-performance library for working with KDDB documents. Built on a Go backend with Python bindings via CFFI, it delivers exceptional performance while maintaining a Pythonic API.
Requirements
- Python 3.12 or higher
- pip (Python package manager)
Installation
Install the SDK from PyPI:The package includes pre-built native libraries for Linux, macOS, and Windows. No compilation or Go installation required.
Optional Dependencies
Install additional features as needed:Verify Installation
Test your installation:Core Concepts
Documents and Nodes
ADocument is the top-level container that holds:
- Metadata: Document-level key-value pairs
- Labels: Categorization tags for the document
- Content Node: The root of the hierarchical content tree
ContentNode represents individual pieces of content organized in a tree:
- Type: Classification (paragraph, heading, table, cell, etc.)
- Content: The actual text or data
- Features: Key-value metadata specific to the node
- Tags: Annotations with optional confidence and value
In-Memory vs File-Based
The SDK supports two storage modes:| Mode | Creation Time | Use Case |
|---|---|---|
| In-Memory | ~1.2ms | Processing pipelines, temporary documents |
| File-Based | ~121ms | Persistent storage, large documents |
What’s Next?
Getting Started
Learn the basics of creating and manipulating documents
API Reference
Explore the complete API documentation
Package Information
| Property | Value |
|---|---|
| Package Name | kodexa-document |
| PyPI | pypi.org/project/kodexa-document |
| License | Apache-2.0 |
| Repository | github.com/kodexa-ai/kodexa-document |
