What is KDDB?
KDDB is a SQLite-based format for storing and manipulating structured documents. It provides:- Hierarchical Structure: Documents are organized as trees of content nodes
- Rich Metadata: Support for features, tags, and document-level metadata
- High Performance: In-memory mode delivers ~100x faster operations
- Cross-Platform: Same document format works across Python, TypeScript, and more
Supported Languages
The SDK is available for multiple languages with a consistent API design:| Language | Package | Runtime | Use Cases |
|---|---|---|---|
| Python | kodexa-document | Python 3.12+ | Pipelines, ML, Servers |
| TypeScript | @kodexa-ai/document-wasm-ts | Node.js 16+ / Browsers | Web Apps, APIs, Frontends |
Key Features
Hierarchical Document Model
Hierarchical Document Model
Documents are composed of ContentNodes arranged in a tree structure. Each node can have:
- Content: Text or data stored in the node
- Type: Classification like “paragraph”, “heading”, “table”, etc.
- Features: Key-value metadata attached to nodes
- Tags: Annotations with optional confidence scores and values
XPath-like Selectors
XPath-like Selectors
Query documents using a familiar selector syntax:
Performance Modes
Performance Modes
Choose the right mode for your use case:
- In-Memory: ~1ms document creation, ideal for processing pipelines
- File-Based: Persistent storage, ideal for long-term document management
Flexible I/O
Flexible I/O
Load and save documents in multiple ways:
- KDDB files (native SQLite format)
- Bytes/Blobs (for API responses)
- JSON (for debugging and interoperability)
- Text (automatic paragraph parsing)
Implementation Details
| Feature | Python | TypeScript |
|---|---|---|
| Backend | Go via CFFI | Go via WebAssembly |
| Performance | ~100x faster in-memory | ~5x faster than pure JS |
| Memory Management | Context managers | Manual dispose |
Next Steps
Getting Started
Learn the SDK fundamentals with side-by-side examples in Python and TypeScript
