Document Families Overview

What is a Document Family?
Accessing Document Families
1. Through Stores (File System Style)
2. Direct Access (By ID)
When to Use DocumentFamilies API
Key Operations
Get External Data
Update External Data
Get Processing Steps
Update Document Status
Add Knowledge Features
Touch Document
External Data Use Cases
ERP Integration
CRM Tracking
Workflow State
Processing Steps Explained
Document Status Values
Best Practices
Use External Data for System Integration
Choose the Right Access Method
Status Workflow
Next Steps

What is a Document Family?

A Document Family is Kodexa’s core entity representing a single document with all its versions, processing history, metadata, and extracted data. Every document uploaded to a store creates a document family.

Accessing Document Families

There are two ways to access documents in Kodexa:

1. Through Stores (File System Style)

Use the Stores API when you know the document’s path:

GET /api/stores/{orgSlug}/{slug}/fs/2024/invoice-001.pdf

Best for: Browsing, organizing, and managing documents by path

2. Direct Access (By ID)

Use the DocumentFamilies API when you have the document family ID:

GET /api/documentFamilies/{id}

Best for: Working with specific documents, processing results, external integrations

When to Use DocumentFamilies API

Use the /api/documentFamilies endpoints when you need to:

Access a specific document by ID - When you have the UUID from processing results or webhooks
Get external data - Retrieve data from external systems associated with the document
Check processing steps - View the complete processing pipeline and transformations
Update document status - Change workflow status (PROCESSING, COMPLETE, FAILED, etc.)
Manage knowledge features - Add or remove knowledge base entries linked to the document
Trigger events - Send document update notifications without modifying content

Key Operations

Get External Data

Documents can store data from external systems (ERP, CRM, databases):

from kodexa import KodexaPlatform

platform = KodexaPlatform(url="https://platform.kodexa.com", api_key="your-api-key")

# Get default external data
external_data = platform.get_document_external_data(
    family_id="550e8400-e29b-41d4-a716-446655440000"
)

# Get specific external data key
erp_data = platform.get_document_external_data(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    key="erp_system"
)

Update External Data

Store references or metadata from external systems:

# Update ERP reference
platform.update_document_external_data(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    data={
        "invoice_id": "INV-2024-001",
        "vendor_id": "V-12345",
        "posted_date": "2024-01-15",
        "status": "approved"
    },
    key="erp_system"
)

# Update CRM reference
platform.update_document_external_data(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    data={
        "opportunity_id": "OPP-789",
        "account_id": "ACC-456"
    },
    key="crm_system"
)

Get Processing Steps

View the complete processing pipeline:

# Get processing steps
steps = platform.get_document_steps(
    family_id="550e8400-e29b-41d4-a716-446655440000"
)

for step in steps:
    print(f"{step.step_type}: {step.status}")
    print(f"  Duration: {step.duration_ms}ms")
    if step.error:
        print(f"  Error: {step.error}")

Update Document Status

Change workflow status:

# Update status to processing
platform.update_document_status(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    status="PROCESSING"
)

# Update to complete
platform.update_document_status(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    status="COMPLETE"
)

# Mark as failed
platform.update_document_status(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    status="FAILED"
)

Add Knowledge Features

Link document to knowledge base:

# Add knowledge feature
knowledge_feature = platform.add_knowledge_feature(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    feature={
        "type": "vendor",
        "value": "ACME Corporation",
        "confidence": 0.95,
        "source": "extraction"
    }
)

# Remove knowledge feature
platform.remove_knowledge_feature(
    family_id="550e8400-e29b-41d4-a716-446655440000",
    feature=knowledge_feature
)

Touch Document

Trigger events without changes:

# Touch document to trigger event listeners
platform.touch_document_family(
    family_id="550e8400-e29b-41d4-a716-446655440000"
)

External Data Use Cases

External data provides a bridge between Kodexa and your business systems:

ERP Integration

# Store invoice posting details
external_data = {
    "invoice_number": "INV-2024-001",
    "gl_account": "1200-5000",
    "cost_center": "CC-100",
    "posted_date": "2024-01-15T10:30:00Z",
    "batch_id": "BATCH-2024-01-15-001"
}

platform.update_document_external_data(
    family_id=family_id,
    data=external_data,
    key="erp_posting"
)

CRM Tracking

# Link document to CRM opportunity
crm_data = {
    "opportunity_id": "OPP-12345",
    "account_id": "ACC-67890",
    "contact_id": "CON-54321",
    "stage": "proposal_sent",
    "probability": 75
}

platform.update_document_external_data(
    family_id=family_id,
    data=crm_data,
    key="crm_link"
)

Workflow State

# Store workflow state
workflow_data = {
    "workflow_id": "WF-001",
    "current_step": "approval",
    "assigned_to": "[email protected]",
    "due_date": "2024-01-20",
    "priority": "high"
}

platform.update_document_external_data(
    family_id=family_id,
    data=workflow_data,
    key="workflow"
)

Processing Steps Explained

Processing steps track every transformation:

[
  {
    "stepType": "UPLOAD",
    "status": "COMPLETE",
    "durationMs": 150,
    "timestamp": "2024-01-15T10:00:00Z"
  },
  {
    "stepType": "OCR",
    "status": "COMPLETE",
    "durationMs": 2300,
    "timestamp": "2024-01-15T10:00:01Z",
    "metadata": {
      "pages": 3,
      "confidence": 0.98
    }
  },
  {
    "stepType": "EXTRACTION",
    "status": "COMPLETE",
    "durationMs": 1500,
    "timestamp": "2024-01-15T10:00:03Z",
    "metadata": {
      "fieldsExtracted": 15,
      "assistant": "invoice-extractor-v2"
    }
  }
]

Document Status Values

Common status values for workflow management:

Status	Description	Use Case
`UPLOADED`	Document uploaded, awaiting processing	Initial state
`PROCESSING`	AI processing in progress	During extraction
`PROCESSED`	Processing complete, data extracted	Ready for review
`REVIEW`	Awaiting human review	Quality control
`APPROVED`	Reviewed and approved	Ready for export
`REJECTED`	Rejected during review	Needs correction
`FAILED`	Processing failed	Error handling
`ARCHIVED`	Archived for retention	Long-term storage

Best Practices

Use External Data for System Integration

✅ Good: Store external references
external_data = {
    "erp_id": "INV-2024-001",
    "posted": True,
    "post_date": "2024-01-15"
}

❌ Avoid: Duplicating document content
external_data = {
    "vendor": "ACME",  # Already in extracted data
    "amount": "1500"   # Already in extracted data
}

Choose the Right Access Method

✅ Good: Use store path when browsing
files = platform.list_store_files("my-org/invoices")

✅ Good: Use family ID when processing
data = platform.get_document_external_data(family_id)

❌ Avoid: Using family ID for browsing
# Don't iterate all families just to list documents

Status Workflow

✅ Good: Clear status progression
UPLOADED → PROCESSING → PROCESSED → REVIEW → APPROVED

❌ Avoid: Unclear status values
UPLOADED → DONE → FINISHED → OK

Next Steps

Create or update a key memory for the assistant Get a resource with the provided ID

⌘I

Introduction

Tasks

Projects

Assistants

DocumentFamilies

Taxonomies

Stores

Organizations

Executions

Data Forms

Account & User

What is a Document Family?

Accessing Document Families

1. Through Stores (File System Style)

2. Direct Access (By ID)

When to Use DocumentFamilies API

Key Operations

Get External Data

Update External Data

Get Processing Steps

Update Document Status

Add Knowledge Features

Touch Document

External Data Use Cases

ERP Integration

CRM Tracking

Workflow State

Processing Steps Explained

Document Status Values

Best Practices

Use External Data for System Integration

Choose the Right Access Method

Status Workflow

Next Steps

Introduction

Tasks

Projects

Assistants

DocumentFamilies

Taxonomies

Stores

Organizations

Executions

Data Forms

Account & User

​What is a Document Family?

​Accessing Document Families

​1. Through Stores (File System Style)

​2. Direct Access (By ID)

​When to Use DocumentFamilies API

​Key Operations

​Get External Data

​Update External Data

​Get Processing Steps

​Update Document Status

​Add Knowledge Features

​Touch Document

​External Data Use Cases

​ERP Integration

​CRM Tracking

​Workflow State

​Processing Steps Explained

​Document Status Values

​Best Practices

​Use External Data for System Integration

​Choose the Right Access Method

​Status Workflow

​Next Steps

What is a Document Family?

Accessing Document Families

1. Through Stores (File System Style)

2. Direct Access (By ID)

When to Use DocumentFamilies API

Key Operations

Get External Data

Update External Data

Get Processing Steps

Update Document Status

Add Knowledge Features

Touch Document

External Data Use Cases

ERP Integration

CRM Tracking

Workflow State

Processing Steps Explained

Document Status Values

Best Practices

Use External Data for System Integration

Choose the Right Access Method

Status Workflow

Next Steps