Skip to main content

What are Stores?

Stores in Kodexa are containers for organizing and managing documents. They provide a hierarchical file system structure similar to folders and files on your computer, but with powerful document processing and AI capabilities built in.

Store Types

Kodexa supports three types of stores:

Document Stores

Store and organize document families with their content objects, metadata, and extracted data. Common Use Cases:
  • Invoice processing pipelines
  • Contract management systems
  • Document archives and repositories
  • Content management workflows

Data Stores

Store extracted and structured data from documents, organized hierarchically with support for exceptions and validation. Common Use Cases:
  • Extracted invoice line items
  • Form data from PDF documents
  • Structured business data
  • Validated and reviewed datasets

Model Stores

Store AI models, training data, and model implementations for document processing. Common Use Cases:
  • Custom extraction models
  • Classification models
  • Model training datasets
  • Model versioning and deployment

File System Organization

Stores use a file system-style path structure for organizing documents:
store-name/
├── 2024/
│   ├── january/
│   │   ├── invoice-001.pdf
│   │   ├── invoice-002.pdf
│   │   └── invoice-003.pdf
│   └── february/
│       └── invoice-004.pdf
├── contracts/
│   └── vendor-agreement.pdf
└── archive/
    └── old-invoices.zip

Path Structure

  • Paths use forward slashes / as separators
  • Paths are case-sensitive
  • No leading slash (use 2024/invoice.pdf not /2024/invoice.pdf)
  • Special characters should be URL-encoded
Avoid Deep Nesting: While Kodexa supports nested paths, we recommend keeping path structures relatively flat (2-3 levels maximum). Deep nesting can impact performance and make document management more complex. Use metadata and labels for additional categorization instead of creating deeply nested folder structures.
Recommended Structure:
✅ Good: invoices/2024/invoice-001.pdf
✅ Good: contracts/vendor-agreements/acme-2024.pdf

❌ Avoid: invoices/2024/Q1/january/week1/day01/vendor-A/invoice-001.pdf

Document Families

In Kodexa, a Document Family represents a single document with all its versions and processing history:
{
  "id": "family-uuid",
  "path": "2024/january/invoice-001.pdf",
  "nativeContentObject": {...},
  "contentObjects": [...],
  "metadata": {...},
  "labels": [...]
}

Content Objects

Each document family contains one or more Content Objects representing different versions or transformations:
  • Native Content: Original uploaded file
  • Processed Content: Results from AI processing
  • Transformed Content: Format conversions (PDF → images, etc.)

Document Versioning

Content objects track the processing history:
# Upload creates initial content object
family = platform.upload_document("invoice.pdf")

# Processing creates new content object
family.process_with_assistant("extraction-assistant")

# Each transformation is preserved
for content in family.content_objects:
    print(f"{content.transition_type}: {content.content_type}")
# Output:
# UPLOAD: application/pdf
# PROCESSING: application/vnd.kodexa.document

Store Versions

Stores can have multiple versions for different environments or snapshots:
invoices:latest     # Current/production version
invoices:v2.0       # Specific version
invoices:staging    # Staging environment
Versions are useful for:
  • Testing: Test processing changes without affecting production
  • Rollback: Revert to previous state if needed
  • Snapshots: Freeze data at specific points in time
  • Environments: Separate dev/staging/production data

File System Operations

The Stores API provides familiar file system operations:
OperationEndpointDescription
DownloadGET /fs/{path}Download file content
Get MetadataGET /fs/{path}?metaGet document info without downloading
UploadPOST /fs/{path}Upload new document or version
Rename/MovePUT /fs/{path}?rename=newpathRename or move document
DeleteDELETE /fs/{path}Delete document family

Metadata and Labels

Custom Metadata

Attach arbitrary metadata to documents:
platform.upload_document(
    path="invoice.pdf",
    metadata={
        "vendor": "ACME Corp",
        "amount": "1500.00",
        "department": "accounting",
        "invoice_number": "INV-2024-001"
    }
)
Metadata is:
  • Searchable and filterable
  • Preserved across versions
  • Returned with document info
  • Useful for categorization

Labels

Apply classification labels:
family.add_label("processed")
family.add_label("verified")
family.add_label("urgent")
Labels enable:
  • Document classification
  • Workflow routing
  • Status tracking
  • Bulk operations

Document Status and Workflow

Track processing state:
family.status = "PROCESSING"   # AI processing in progress
family.status = "PROCESSED"    # Processing complete
family.status = "FAILED"       # Processing failed
family.status = "REVIEW"       # Awaiting human review
family.status = "APPROVED"     # Reviewed and approved

Assignment

Assign documents to users for review:
# Assign for review
family.assign_to(user, release_in_minutes=60)

# Auto-release after timeout
# OR manually release
family.remove_assignee()

Locking

Lock documents to prevent modifications:
# Lock during processing
family.lock()

# Unlock when done
family.unlock()

Pagination and Filtering

List and query documents with powerful filters:
# List all documents
families = store.list_families(
    page=0,
    page_size=50
)

# Filter by metadata
families = store.list_families(
    filter={
        "metadata.department": "accounting",
        "status": "PROCESSED"
    }
)

# Search by label
families = store.list_families(
    labels=["urgent", "verified"]
)

Integration with AI Assistants

Stores integrate seamlessly with Kodexa AI Assistants:
# Reprocess documents with an assistant
store.reprocess(
    family_ids=["family-1", "family-2"],
    assistant_ids=["extraction-assistant"]
)

# Set active assistant for a document
family.set_active_assistant(assistant)

# Get processing executions
executions = family.get_executions()

Best Practices

Path Organization

✅ Good:
  invoices/2024/Q1/vendor-A/invoice-001.pdf
  contracts/2024/clients/acme/agreement.pdf

❌ Avoid:
  invoice-2024-01-15-vendor-a-001.pdf (no hierarchy)
  INVOICES/Vendor A/Invoice #001.pdf (special chars, spaces)

Metadata Strategy

✅ Good:
  metadata = {
      "vendor_id": "V-12345",
      "amount": "1500.00",
      "date": "2024-01-15",
      "category": "office_supplies"
  }

❌ Avoid:
  metadata = {
      "info": "vendor acme 1500 dollars office supplies"  # unstructured
  }

Version Management

✅ Good:
  # Use semantic versions
  invoices:v1.0
  invoices:v2.0

  # Or environment names
  invoices:production
  invoices:staging

❌ Avoid:
  invoices:test123
  invoices:johns-version

Next Steps

I