Skip to main content
The Kodexa Document SDK provides a powerful, cross-platform library for working with hierarchical structured documents stored in KDDB (Kodexa Document Database) format. Whether you’re building document processing pipelines, extraction workflows, or content analysis tools, the SDK gives you the tools you need.

What is KDDB?

KDDB is a SQLite-based format for storing and manipulating structured documents. It provides:
  • Hierarchical Structure: Documents are organized as trees of content nodes
  • Rich Metadata: Support for features, tags, and document-level metadata
  • High Performance: Optimized native backend for fast operations
  • Cross-Platform: Same document format works across Python, TypeScript, and more

Supported Languages

The SDK is available for multiple languages with a consistent API design:
LanguagePackageRuntimeUse Cases
Pythonkodexa-documentPython 3.12+Pipelines, ML, Servers
TypeScript@kodexa-ai/document-wasm-tsNode.js 16+ / BrowsersWeb Apps, APIs, Frontends
All implementations share the same KDDB format and core concepts, making it easy to work with documents across different parts of your stack.

Key Features

Documents are composed of ContentNodes arranged in a tree structure. Each node can have:
  • Content: Text or data stored in the node
  • Type: Classification like “paragraph”, “heading”, “table”, etc.
  • Features: Key-value metadata attached to nodes
  • Tags: Annotations with optional confidence scores and values
Query documents using a familiar selector syntax:
//paragraph                           # All paragraphs
//paragraph[contains(@content, 'invoice')]  # Paragraphs containing "invoice"
//section/paragraph                   # Direct child paragraphs of sections
//*[@tag='important']                 # Any node tagged as important
Beyond the content tree, documents include:
  • Data Objects: Represent extracted entities with parent-child relationships
  • Data Attributes: Typed fields with confidence scores attached to data objects
  • Transactions: Atomic batch operations for high-performance bulk data creation
  • Audit Trail: Complete revision history tracking all changes
  • Delta Tracking: Export, preview, and apply changes between document versions
  • Notes: Human-readable annotations attached to documents and data
Load and save documents in multiple ways:
  • KDDB files (native SQLite format)
  • Bytes/Blobs (for API responses)
  • JSON (for debugging and interoperability)
  • Text (automatic paragraph parsing)

Implementation Details

FeaturePythonTypeScript
BackendGo via CFFIGo via WebAssembly
Memory ManagementContext managersManual dispose

Next Steps

Getting Started

Learn the SDK fundamentals with side-by-side examples in Python and TypeScript