What is KDDB?
KDDB is a SQLite-based format for storing and manipulating structured documents. It provides:- Hierarchical Structure: Documents are organized as trees of content nodes
- Rich Metadata: Support for features, tags, and document-level metadata
- High Performance: In-memory mode delivers ~100x faster operations
- Cross-Platform: Same document format works across Python, TypeScript, and more
Choose Your Language
Python
Production-ready Python SDK with 400+ tests. Perfect for data pipelines, ML workflows, and server-side processing.
TypeScript
WebAssembly-powered SDK for Node.js and browsers. Ideal for web applications and frontend document handling.
Key Features
Hierarchical Document Model
Hierarchical Document Model
Documents are composed of ContentNodes arranged in a tree structure. Each node can have:
- Content: Text or data stored in the node
- Type: Classification like “paragraph”, “heading”, “table”, etc.
- Features: Key-value metadata attached to nodes
- Tags: Annotations with optional confidence scores and values
XPath-like Selectors
XPath-like Selectors
Query documents using a familiar selector syntax:
Performance Modes
Performance Modes
Choose the right mode for your use case:
- In-Memory: ~1ms document creation, ideal for processing pipelines
- File-Based: Persistent storage, ideal for long-term document management
Flexible I/O
Flexible I/O
Load and save documents in multiple ways:
- KDDB files (native SQLite format)
- Bytes/Blobs (for API responses)
- JSON (for debugging and interoperability)
- Text (automatic paragraph parsing)
Quick Comparison
| Feature | Python | TypeScript |
|---|---|---|
| Package | kodexa-document | @kodexa-ai/document-wasm-ts |
| Runtime | Python 3.12+ | Node.js 16+ / Modern Browsers |
| Backend | Go via CFFI | Go via WebAssembly |
| Performance | ~100x faster in-memory | ~5x faster than pure JS |
| Use Cases | Pipelines, ML, Servers | Web Apps, APIs, Frontends |
