Kodexa Document SDK

The Kodexa Document SDK provides a powerful, cross-platform library for working with hierarchical structured documents stored in KDDB (Kodexa Document Database) format. Whether you’re building document processing pipelines, extraction workflows, or content analysis tools, the SDK gives you the tools you need.

What is KDDB?

KDDB is a SQLite-based format for storing and manipulating structured documents. It provides:

Hierarchical Structure: Documents are organized as trees of content nodes
Rich Metadata: Support for features, tags, and document-level metadata
High Performance: Optimized native backend for fast operations
Cross-Platform: Same document format works across Python, TypeScript, and more

Supported Languages

The SDK is available for multiple languages with a consistent API design:

Language	Package	Runtime	Use Cases
Python	`kodexa-document`	Python 3.12+	Pipelines, ML, Servers
TypeScript	`@kodexa-ai/document-wasm-ts`	Node.js 16+ / Browsers	Web Apps, APIs, Frontends

All implementations share the same KDDB format and core concepts, making it easy to work with documents across different parts of your stack.

Key Features

Hierarchical Document Model

Documents are composed of ContentNodes arranged in a tree structure. Each node can have:

Content: Text or data stored in the node
Type: Classification like “paragraph”, “heading”, “table”, etc.
Features: Key-value metadata attached to nodes
Tags: Annotations with optional confidence scores and values

XPath-like Selectors

Query documents using a familiar selector syntax:

//paragraph                           # All paragraphs
//paragraph[contains(@content, 'invoice')]  # Paragraphs containing "invoice"
//section/paragraph                   # Direct child paragraphs of sections
//*[@tag='important']                 # Any node tagged as important

Structured Data Layer

Beyond the content tree, documents include:

Data Objects: Represent extracted entities with parent-child relationships
Data Attributes: Typed fields with confidence scores attached to data objects
Transactions: Atomic batch operations for high-performance bulk data creation
Audit Trail: Complete revision history tracking all changes
Delta Tracking: Export, preview, and apply changes between document versions
Notes: Human-readable annotations attached to documents and data

Flexible I/O

Load and save documents in multiple ways:

KDDB files (native SQLite format)
Bytes/Blobs (for API responses)
JSON (for debugging and interoperability)
Text (automatic paragraph parsing)

Implementation Details

Feature	Python	TypeScript
Backend	Go via CFFI	Go via WebAssembly
Memory Management	Context managers	Manual dispose

Next Steps

Getting Started

Learn the SDK fundamentals with side-by-side examples in Python and TypeScript

Overview

Document Data

Document Structure

Structured Data

Change Management

What is KDDB?

Supported Languages

Key Features

Implementation Details

Next Steps

Getting Started

Overview

Document Data

Document Structure

Structured Data

Change Management

​What is KDDB?

​Supported Languages

​Key Features

​Implementation Details

​Next Steps

Getting Started

What is KDDB?

Supported Languages

Key Features

Implementation Details

Next Steps