Skip to main content
The kdx document command provides a comprehensive set of tools for inspecting, searching, annotating, and extracting structured data from local Kodexa documents (KDDB files). These commands work offline without requiring a connection to the Kodexa platform.

What are KDDB Files?

KDDB (Kodexa Document Database) files are SQLite-based document containers that store:
  • Document structure - Hierarchical content nodes with types and content
  • Metadata - Document properties like UUID, version, and custom fields
  • Tags - Annotations on content nodes for extraction workflows
  • Data objects - Structured extracted data with attributes
  • Native files - Embedded binary files (PDFs, images, etc.)
  • Audit trail - Full revision history of data changes

Reading & Analysis

Commands for understanding document content without modifying anything.

Info

Document summary with metadata and statistics

Stats

Detailed statistics and node type breakdown

Text

Extract readable text with page markers

Grep & Lines

Search content with regex and retrieve lines

Find

Multi-criteria search (text, type, page, region)

Locate

Find nodes with match positions for tagging

Node

Inspect a single node by ID

Structure & Metadata

Commands for inspecting document structure, annotations, and spatial layout.

Print & Select

Tree view and selector queries

Tags

List all tags with counts

Audit

Revision history and change tracking

Delta

Inspect delta binary files to see what changed

Spatial

Bounding box queries and region search

Native Files

List and extract embedded files

External Data

Manage external data key-value store

Metadata

View and modify document metadata

Data & Tagging

Commands that modify the document by adding tags, creating data objects, and setting attributes.

Data

Create and inspect data objects and attributes

Tag

Annotate content nodes with tags

Agentic Workflows

Agentic CLI Use

How AI agents use kdx document commands to analyze and annotate documents programmatically. Includes end-to-end workflow examples and best practices.

Quick Start

Explore a Document

# Get overview
kdx document info invoice.kddb

# Check structure
kdx document print invoice.kddb --depth 3

# Read page content
kdx document text invoice.kddb --pages 1:3

Search and Annotate

# Find content
kdx document locate invoice.kddb --pattern "\$[\d,]+\.\d{2}" --type word

# Tag a node
kdx document tag invoice.kddb --node-id 245 --name "invoice/amount"

# Create structured data
kdx document data create invoice.kddb --path "INVOICE"
kdx document data set-attribute invoice.kddb \
  --object-id 1 --tag "total" --value "1234.56" --type CURRENCY

Scripting with JSON

# JSON Lines output pipes well with jq
kdx document grep "revenue" report.kddb | jq -r '.content'

# Get stats for scripting
PAGE_COUNT=$(kdx document info report.kddb -o json | jq '.statistics.pageCount')

Output Formats

All document commands support the global -o flag:
kdx document info doc.kddb -o json    # JSON output
kdx document info doc.kddb -o yaml    # YAML output
kdx document info doc.kddb -o table   # Table output (default)
Search commands (grep, find, locate) output JSON Lines (JSONL) by default for streaming. Use --pretty for readable output.