What is KDDB?
KDDB is a SQLite-based format for storing and manipulating structured documents. It provides:- Hierarchical Structure: Documents are organized as trees of content nodes
- Rich Metadata: Support for features, tags, and document-level metadata
- High Performance: Optimized native backend for fast operations
- Cross-Platform: Same document format works across Python, TypeScript, and more
Supported Languages
The SDK is available for multiple languages with a consistent API design:| Language | Package | Runtime | Use Cases |
|---|---|---|---|
| Python | kodexa-document | Python 3.12+ | Pipelines, ML, Servers |
| TypeScript | @kodexa-ai/document-wasm-ts | Node.js 16+ / Browsers | Web Apps, APIs, Frontends |
Key Features
Hierarchical Document Model
Hierarchical Document Model
Documents are composed of ContentNodes arranged in a tree structure. Each node can have:
- Content: Text or data stored in the node
- Type: Classification like “paragraph”, “heading”, “table”, etc.
- Features: Key-value metadata attached to nodes
- Tags: Annotations with optional confidence scores and values
XPath-like Selectors
XPath-like Selectors
Query documents using a familiar selector syntax:
Structured Data Layer
Structured Data Layer
Beyond the content tree, documents include:
- Data Objects: Represent extracted entities with parent-child relationships
- Data Attributes: Typed fields with confidence scores attached to data objects
- Transactions: Atomic batch operations for high-performance bulk data creation
- Audit Trail: Complete revision history tracking all changes
- Delta Tracking: Export, preview, and apply changes between document versions
- Notes: Human-readable annotations attached to documents and data
Flexible I/O
Flexible I/O
Load and save documents in multiple ways:
- KDDB files (native SQLite format)
- Bytes/Blobs (for API responses)
- JSON (for debugging and interoperability)
- Text (automatic paragraph parsing)
Implementation Details
| Feature | Python | TypeScript |
|---|---|---|
| Backend | Go via CFFI | Go via WebAssembly |
| Memory Management | Context managers | Manual dispose |
Next Steps
Getting Started
Learn the SDK fundamentals with side-by-side examples in Python and TypeScript
