Skip to main content
The kdx document command provides a comprehensive set of tools for inspecting and manipulating local Kodexa documents (KDDB files) directly from your terminal. These commands work offline without requiring a connection to the Kodexa platform.

What are KDDB Files?

KDDB (Kodexa Document Database) files are SQLite-based document containers that store:
  • Document structure - Hierarchical content nodes with types and content
  • Metadata - Document properties like UUID, version, and custom fields
  • Native files - Embedded binary files (PDFs, images, etc.)
  • External data - Custom key-value data for extensions
  • Features - Node-level attributes like bounding boxes and tags

Available Commands

Quick Start

View Document Info

Get a summary of document contents:
kdx document info invoice.kddb
┌────────────┬─────────────────────────────────────────────┐
│   FIELD    │                    VALUE                    │
├────────────┼─────────────────────────────────────────────┤
│ uuid       │ e25dab60-cbdf-499f-857e-ff9c82a19d87        │
│ version    │ 6.0.0                                       │
│ metadata   │ {"connector":"folder","source_path":"/..."}│
│ statistics │ {"nodeCount":39,"pageCount":2}              │
└────────────┴─────────────────────────────────────────────┘
Visualize the document tree:
kdx document print invoice.kddb --depth 3
root
└── page
    └── content-area
        ├── line
        │   └── word: Invoice
        ├── line
        │   ├── word: Date:
        │   └── word: 2024-01-15
        └── line
            ├── word: Total:
            └── word: $1,234.56

Query Nodes by Type

Find all nodes of a specific type:
kdx document select invoice.kddb "//line"
┌────┬──────┬─────────────────┬───────┐
│ ID │ TYPE │     CONTENT     │ INDEX │
├────┼──────┼─────────────────┼───────┤
│ 4  │ line │ Invoice         │ 0     │
│ 6  │ line │ Date: 2024-01-15│ 1     │
│ 10 │ line │ Total: $1,234.56│ 2     │
└────┴──────┴─────────────────┴───────┘

Command Reference

Info

Display document summary including UUID, version, metadata, and statistics.
kdx document info <file.kddb>
Output includes:
  • Document UUID and version
  • Source metadata (connector, path)
  • Statistics (node count, page count)
  • Labels and custom metadata

Print

Pretty print the document structure as an ASCII tree.
kdx document print <file.kddb> [flags]
Flags:
FlagDescription
--page NPrint only page N (1-indexed)
--depth NLimit tree depth (0 = unlimited)
--featuresShow node features (bounding boxes, etc.)
Examples:
# Print first page only
kdx document print invoice.kddb --page 1

# Limit to 2 levels deep
kdx document print invoice.kddb --depth 2

# Show features on nodes
kdx document print invoice.kddb --features --depth 3

Select

Run a selector query to find matching nodes.
kdx document select <file.kddb> <selector>
Selector syntax:
  • //type - Select all nodes of a type
Examples:
# Find all paragraphs
kdx document select doc.kddb "//paragraph"

# Find all pages
kdx document select doc.kddb "//page"

# Find all words
kdx document select doc.kddb "//word"

# Find all lines
kdx document select doc.kddb "//line"

Output Formats

All document commands support the global -o flag for output format:
# JSON output for scripting
kdx document info invoice.kddb -o json

# YAML output
kdx document select invoice.kddb "//page" -o yaml

# Default table output
kdx document info invoice.kddb -o table

Use Cases

Document Inspection

Quickly inspect a document’s structure before processing:
# Get overview
kdx document info invoice.kddb

# Check structure
kdx document print invoice.kddb --depth 2

# Count specific node types
kdx document select invoice.kddb "//paragraph" | wc -l

Debugging Extraction Issues

When extraction isn’t working as expected:
# View full tree with features
kdx document print problematic.kddb --features

# Check what node types exist
kdx document info problematic.kddb

Scripting and Automation

Use JSON output for automated processing:
# Get metadata for processing
kdx document metadata get invoice.kddb -o json | jq '.uuid'

# Check if document has pages
PAGE_COUNT=$(kdx document info invoice.kddb -o json | jq '.statistics.pageCount')