Skip to main content

Document Family Commands

The kdx document-family command provides extended operations for working with document families, including exporting extracted data.

Available Commands

CommandDescription
dataExport extracted JSON data from a document family
content listList all content objects in a document family
content downloadDownload a content object (kddb file)

Export Extracted Data

Retrieve the extracted and transformed data from a processed document family.
kdx document-family data <document-family-id> [flags]

Flags

FlagDefaultDescription
-o, --outputstdoutOutput file path
--include-idstrueInclude element IDs in output
--friendly-namesfalseUse friendly names for fields
--inline-auditsfalseInclude inline audit information
--include-exceptionsfalseInclude exception information

Examples

# Output to terminal
kdx document-family data a700ca3e-38e7-4e63-8974-59c2e9058cf3

# Save to file
kdx document-family data a700ca3e-38e7-4e63-8974-59c2e9058cf3 -o extracted.json

# With friendly field names
kdx document-family data a700ca3e-38e7-4e63-8974-59c2e9058cf3 --friendly-names

# Include all metadata
kdx document-family data a700ca3e-38e7-4e63-8974-59c2e9058cf3 \
  --include-ids \
  --inline-audits \
  --include-exceptions \
  -o full-export.json

Output Format

The command exports data as formatted JSON:
{
  "FinancialStatement": {
    "CompanyName": "Alphabet Inc.",
    "CDAD_IS_Currency": "USD",
    "CDAD_IS_Units_of_Currency": "Millions",
    "DebtBreakdownGrouping": [
      {
        "DebtBreakdown": [
          {
            "CategoryKey": "senior_unsecured_notes",
            "Description": "2014 Notes issuance",
            "Value": 1000,
            "_id": 330
          }
        ],
        "Group": "December 31, 2023",
        "_id": 329
      }
    ]
  }
}

With Element IDs

When --include-ids is enabled (default), each data element includes an _id field that can be used for traceability and auditing:
{
  "Value": 1000,
  "_id": 330
}

With Friendly Names

When --friendly-names is enabled, field names are converted to human-readable format:
{
  "Company Name": "Alphabet Inc.",
  "Currency": "USD",
  "Units of Currency": "Millions"
}

Content Object Operations

Document families contain one or more content objects (kddb files), each representing a version of the processed document. The content subcommand provides direct access to these content objects.

List Content Objects

View all content objects within a document family:
kdx document-family content list <document-family-id>
Example Output:
Content Objects in Document Family 70b894f5-8d32-4584-b780-89f89210e078:
─────────────────────────────────────────────────────────────────
ID                                     CREATED                   LABELS
─────────────────────────────────────────────────────────────────
abc12345-1111-2222-3333-444444444444   2026-01-26T23:18:07.454Z
def67890-5555-6666-7777-888888888888   2026-01-26T23:22:18.980Z
8efba773-6cc9-4903-95ea-2405a53df853   2026-01-26T23:31:48.777Z   (latest)
The (latest) marker indicates the most recent content object.

Download Content Object

Download a content object (kddb file) from a document family:
kdx document-family content download <document-family-id> [content-object-id] [flags]

Flags

FlagDescription
--latestDownload the most recent content object (no ID required)
-o, --outputOutput file path (default: content-<id>.kddb)

Examples

# Download the latest content object
kdx document-family content download 70b894f5-8d32-4584-b780-89f89210e078 --latest

# Download with custom filename
kdx document-family content download 70b894f5-8d32-4584-b780-89f89210e078 --latest -o document.kddb

# Download a specific content object by ID
kdx document-family content download 70b894f5-8d32-4584-b780-89f89210e078 8efba773-6cc9-4903-95ea-2405a53df853 -o specific-version.kddb
Example Output:
Using latest content object: 8efba773-6cc9-4903-95ea-2405a53df853
✓ Downloaded content object to document.kddb (13549568 bytes)

When to Use Content Download

The content download command is useful when:
  • Debugging: You need the raw kddb file to inspect with local tools
  • Backup: Creating snapshots of processed documents
  • Large Documents: The DFM export endpoint may timeout for document families with many content objects; direct download bypasses this limitation
  • Offline Analysis: Running local analysis tools on the processed document

Content Object vs Data Export

Use CaseCommand
Get extracted JSON data for integrationkdx document-family data
Download the raw kddb filekdx document-family content download
List available versionskdx document-family content list

How It Works

The data command:
  1. Fetches the document family to find its content objects
  2. Identifies the latest (most recent) content object
  3. Exports data from that content object via the API
  4. Formats and outputs the JSON
This ensures you always get the final processed state of the document.

Integration Examples

Export to File for Analysis

# Export data
kdx document-family data abc123 -o data.json

# Process with jq
cat data.json | jq '.FinancialStatement.CompanyName'
# Output: "Alphabet Inc."

Pipeline with Upload and Export

#!/bin/bash

# Upload document
RESULT=$(kdx store upload satori/project-processing:1.0.0 ./report.pdf)
DOC_ID=$(echo "$RESULT" | grep -oP 'created: \K[a-f0-9-]+')

# Wait for processing
kdx store watch $DOC_ID --timeout 600

# Export data
kdx document-family data $DOC_ID -o "${DOC_ID}.json"

echo "Exported data to ${DOC_ID}.json"

Bulk Export

# Export data for multiple document families
for id in $(kdx get document-families -o json | jq -r '.[].id'); do
  kdx document-family data $id -o "exports/${id}.json"
done

Troubleshooting

No Content Objects Found

Error: no content objects found in document family
Cause: The document hasn’t been processed yet or processing failed. Solution:
  • Check processing status with kdx store watch <id>
  • Verify the document family ID is correct

Empty Output

If the command returns empty JSON {}: Cause: The document may not have completed processing or the content object has no exported data. Solution:
  • Ensure the document has reached the PROCESSED label
  • Check if the processing pipeline includes data extraction steps

API Error

Error: API error (404): Document family not found
Solution: Verify the document family ID is correct and you have access to it.