Skip to main content
The kdx document print and kdx document select commands help you understand and navigate the hierarchical structure of KDDB documents. Pretty print the document structure as an ASCII tree.
kdx document print <file.kddb> [flags]

Flags

FlagDescriptionDefault
--page NPrint only page N (1-indexed)All pages
--depth NMaximum tree depth0 (unlimited)
--featuresShow node featuresfalse

Basic Usage

kdx document print invoice.kddb
root
└── page
    └── content-area
        ├── line
        │   └── word: Invoice
        ├── line
        │   ├── word: Date:
        │   └── word: 2024-01-15
        ├── paragraph
        │   ├── line
        │   │   └── word: Item
        │   └── line
        │       └── word: Description
        └── line
            ├── word: Total:
            └── word: $1,234.56

Limit Depth

Control how deep into the tree to display:
# Show only top 2 levels
kdx document print invoice.kddb --depth 2
root
└── page
    └── content-area
Focus on a single page:
# Print page 1 only
kdx document print invoice.kddb --page 1

# Print page 3 with limited depth
kdx document print invoice.kddb --page 3 --depth 4

Show Features

Display node features like bounding boxes and tags:
kdx document print invoice.kddb --features --depth 2
root
└── page
    [spatial:bbox=0,0,612,792]
    └── content-area
        [spatial:bbox=50,50,562,742]

Select Command

Query nodes by type using selector syntax.
kdx document select <file.kddb> <selector>

Selector Syntax

Currently supports type-based selectors:
SelectorDescription
//typeSelect all nodes of the given type

Examples

# Find all paragraphs
kdx document select doc.kddb "//paragraph"
┌────┬───────────┬──────────────────────────────────────┬───────┐
│ ID │   TYPE    │               CONTENT                │ INDEX │
├────┼───────────┼──────────────────────────────────────┼───────┤
│ 15 │ paragraph │ This is the first paragraph of text. │ 0     │
│ 28 │ paragraph │ Another important paragraph here.    │ 1     │
│ 45 │ paragraph │ The final paragraph with conclusion. │ 2     │
└────┴───────────┴──────────────────────────────────────┴───────┘
# Find all pages
kdx document select doc.kddb "//page"

# Find all lines
kdx document select doc.kddb "//line"

# Find all words
kdx document select doc.kddb "//word"

# Find all tables
kdx document select doc.kddb "//table"

Output Columns

ColumnDescription
IDInternal node ID
TYPENode type name
CONTENTNode text content (truncated to 80 chars)
INDEXPosition among siblings

Output Formats

Both commands support the global -o flag:
# JSON output
kdx document select invoice.kddb "//paragraph" -o json

# YAML output
kdx document print invoice.kddb -o yaml
JSON select output:
[
  {
    "id": 15,
    "type": "paragraph",
    "content": "This is the first paragraph of text.",
    "index": 0
  },
  {
    "id": 28,
    "type": "paragraph",
    "content": "Another important paragraph here.",
    "index": 1
  }
]

Common Node Types

Documents typically contain these node types:
TypeDescription
rootDocument root node
pagePage container
content-areaContent region on a page
paragraphParagraph of text
lineLine of text
wordIndividual word
tableTable structure
table-rowTable row
table-cellTable cell
imageImage placeholder

Use Cases

Document Analysis

Understand document structure before processing:
# Quick overview
kdx document print invoice.kddb --depth 3

# Count paragraphs
kdx document select invoice.kddb "//paragraph" -o json | jq 'length'

# List all page numbers
kdx document select invoice.kddb "//page" -o json | jq '.[].index'

Debugging Extraction

Investigate why extraction isn’t finding expected content:
# Check if lines exist
kdx document select problematic.kddb "//line"

# View full structure with features
kdx document print problematic.kddb --features

# Check specific page
kdx document print problematic.kddb --page 2 --features

Content Location

Find where specific content appears:
# Find all words (useful for searching)
kdx document select doc.kddb "//word" -o json | jq '.[] | select(.content | contains("Invoice"))'

Structure Validation

Verify document structure is as expected:
# Check page count
PAGE_COUNT=$(kdx document select doc.kddb "//page" -o json | jq 'length')
echo "Document has $PAGE_COUNT pages"

# Verify table structure exists
TABLES=$(kdx document select doc.kddb "//table" -o json | jq 'length')
if [ "$TABLES" -gt 0 ]; then
  echo "Found $TABLES tables"
fi

Batch Analysis

Analyze structure across multiple documents:
# Report structure summary for all documents
for kddb in *.kddb; do
  PAGES=$(kdx document select "$kddb" "//page" -o json 2>/dev/null | jq 'length')
  PARAGRAPHS=$(kdx document select "$kddb" "//paragraph" -o json 2>/dev/null | jq 'length')
  echo "$kddb: $PAGES pages, $PARAGRAPHS paragraphs"
done

Tips

Use --depth 2 or --depth 3 for a quick overview without overwhelming output on large documents.
Pipe JSON output to jq for powerful filtering and transformation of results.
Content shown in the tree view is truncated to 60 characters. Use select with JSON output for full content.