Document Commands

The kdx document command provides a comprehensive set of tools for inspecting, searching, annotating, and extracting structured data from local Kodexa documents (KDDB files). These commands work offline without requiring a connection to the Kodexa platform.

What are KDDB Files?

KDDB (Kodexa Document Database) files are SQLite-based document containers that store:

Document structure - Hierarchical content nodes with types and content
Metadata - Document properties like UUID, version, and custom fields
Tags - Annotations on content nodes for extraction workflows
Data objects - Structured extracted data with attributes
Native files - Embedded binary files (PDFs, images, etc.)
Audit trail - Full revision history of data changes

Reading & Analysis

Commands for understanding document content without modifying anything.

Info

Document summary with metadata and statistics

Stats

Detailed statistics and node type breakdown

Text

Extract readable text with page markers

Grep & Lines

Search content with regex and retrieve lines

Find

Multi-criteria search (text, type, page, region)

Locate

Find nodes with match positions for tagging

Node

Inspect a single node by ID

Structure & Metadata

Commands for inspecting document structure, annotations, and spatial layout.

Print & Select

Tree view and selector queries

Audit

Revision history and change tracking

Spatial

Bounding box queries and region search

Native Files

List and extract embedded files

External Data

Manage external data key-value store

Metadata

View and modify document metadata

Data & Tagging

Commands that modify the document by adding tags, creating data objects, and setting attributes.

Data

Create and inspect data objects and attributes

Tag

Annotate content nodes with tags

Agentic Workflows

Agentic CLI Use

How AI agents use kdx document commands to analyze and annotate documents programmatically. Includes end-to-end workflow examples and best practices.

Quick Start

Explore a Document

# Get overview
kdx document info invoice.kddb

# Check structure
kdx document print invoice.kddb --depth 3

# Read page content
kdx document text invoice.kddb --pages 1:3

Search and Annotate

# Find content
kdx document locate invoice.kddb --pattern "\$[\d,]+\.\d{2}" --type word

# Tag a node
kdx document tag invoice.kddb --node-id 245 --name "invoice/amount"

# Create structured data
kdx document data create invoice.kddb --path "INVOICE"
kdx document data set-attribute invoice.kddb \
  --object-id 1 --tag "total" --value "1234.56" --type CURRENCY

Scripting with JSON

# JSON Lines output pipes well with jq
kdx document grep "revenue" report.kddb | jq -r '.content'

# Get stats for scripting
PAGE_COUNT=$(kdx document info report.kddb -o json | jq '.statistics.pageCount')

Output Formats

All document commands support the global -o flag:

kdx document info doc.kddb -o json    # JSON output
kdx document info doc.kddb -o yaml    # YAML output
kdx document info doc.kddb -o table   # Table output (default)

Search commands (grep, find, locate) output JSON Lines (JSONL) by default for streaming. Use --pretty for readable output.

Getting Started

Core Operations

Extended Commands

DevOps & GitOps

Document Commands

What are KDDB Files?

Reading & Analysis

Info

Stats

Text

Grep & Lines

Find

Locate

Node

Structure & Metadata

Print & Select

Tags

Audit

Spatial

Native Files

External Data

Metadata

Data & Tagging

Data

Tag

Agentic Workflows

Agentic CLI Use

Quick Start

Explore a Document

Search and Annotate

Scripting with JSON

Output Formats

Getting Started

Core Operations

Extended Commands

Document Commands

DevOps & GitOps

​What are KDDB Files?

​Reading & Analysis

Info

Stats

Text

Grep & Lines

Find

Locate

Node

​Structure & Metadata

Print & Select

Tags

Audit

Spatial

Native Files

External Data

Metadata

​Data & Tagging

Data

Tag

​Agentic Workflows

Agentic CLI Use

​Quick Start

​Explore a Document

​Search and Annotate

​Scripting with JSON

​Output Formats

What are KDDB Files?

Reading & Analysis

Structure & Metadata

Data & Tagging

Agentic Workflows

Quick Start

Explore a Document

Search and Annotate

Scripting with JSON

Output Formats