What are Stores?
Stores in Kodexa are containers for organizing and managing documents. They provide a hierarchical file system structure similar to folders and files on your computer, but with powerful document processing and AI capabilities built in.
Store Types
Kodexa supports three types of stores:
Document Stores
Store and organize document families with their content objects, metadata, and extracted data.
Common Use Cases:
- Invoice processing pipelines
- Contract management systems
- Document archives and repositories
- Content management workflows
Data Stores
Store extracted and structured data from documents, organized hierarchically with support for exceptions and validation.
Common Use Cases:
- Extracted invoice line items
- Form data from PDF documents
- Structured business data
- Validated and reviewed datasets
Model Stores
Store AI models, training data, and model implementations for document processing.
Common Use Cases:
- Custom extraction models
- Classification models
- Model training datasets
- Model versioning and deployment
File System Organization
Stores use a file system-style path structure for organizing documents:
store-name/
├── 2024/
│ ├── january/
│ │ ├── invoice-001.pdf
│ │ ├── invoice-002.pdf
│ │ └── invoice-003.pdf
│ └── february/
│ └── invoice-004.pdf
├── contracts/
│ └── vendor-agreement.pdf
└── archive/
└── old-invoices.zip
Path Structure
- Paths use forward slashes
/
as separators
- Paths are case-sensitive
- No leading slash (use
2024/invoice.pdf
not /2024/invoice.pdf
)
- Special characters should be URL-encoded
Avoid Deep Nesting: While Kodexa supports nested paths, we recommend keeping path structures relatively flat (2-3 levels maximum). Deep nesting can impact performance and make document management more complex. Use metadata and labels for additional categorization instead of creating deeply nested folder structures.
Recommended Structure:
✅ Good: invoices/2024/invoice-001.pdf
✅ Good: contracts/vendor-agreements/acme-2024.pdf
❌ Avoid: invoices/2024/Q1/january/week1/day01/vendor-A/invoice-001.pdf
Document Families
In Kodexa, a Document Family represents a single document with all its versions and processing history:
{
"id": "family-uuid",
"path": "2024/january/invoice-001.pdf",
"nativeContentObject": {...},
"contentObjects": [...],
"metadata": {...},
"labels": [...]
}
Content Objects
Each document family contains one or more Content Objects representing different versions or transformations:
- Native Content: Original uploaded file
- Processed Content: Results from AI processing
- Transformed Content: Format conversions (PDF → images, etc.)
Document Versioning
Content objects track the processing history:
# Upload creates initial content object
family = platform.upload_document("invoice.pdf")
# Processing creates new content object
family.process_with_assistant("extraction-assistant")
# Each transformation is preserved
for content in family.content_objects:
print(f"{content.transition_type}: {content.content_type}")
# Output:
# UPLOAD: application/pdf
# PROCESSING: application/vnd.kodexa.document
Store Versions
Stores can have multiple versions for different environments or snapshots:
invoices:latest # Current/production version
invoices:v2.0 # Specific version
invoices:staging # Staging environment
Versions are useful for:
- Testing: Test processing changes without affecting production
- Rollback: Revert to previous state if needed
- Snapshots: Freeze data at specific points in time
- Environments: Separate dev/staging/production data
File System Operations
The Stores API provides familiar file system operations:
Operation | Endpoint | Description |
---|
Download | GET /fs/{path} | Download file content |
Get Metadata | GET /fs/{path}?meta | Get document info without downloading |
Upload | POST /fs/{path} | Upload new document or version |
Rename/Move | PUT /fs/{path}?rename=newpath | Rename or move document |
Delete | DELETE /fs/{path} | Delete document family |
Attach arbitrary metadata to documents:
platform.upload_document(
path="invoice.pdf",
metadata={
"vendor": "ACME Corp",
"amount": "1500.00",
"department": "accounting",
"invoice_number": "INV-2024-001"
}
)
Metadata is:
- Searchable and filterable
- Preserved across versions
- Returned with document info
- Useful for categorization
Labels
Apply classification labels:
family.add_label("processed")
family.add_label("verified")
family.add_label("urgent")
Labels enable:
- Document classification
- Workflow routing
- Status tracking
- Bulk operations
Document Status and Workflow
Track processing state:
family.status = "PROCESSING" # AI processing in progress
family.status = "PROCESSED" # Processing complete
family.status = "FAILED" # Processing failed
family.status = "REVIEW" # Awaiting human review
family.status = "APPROVED" # Reviewed and approved
Assignment
Assign documents to users for review:
# Assign for review
family.assign_to(user, release_in_minutes=60)
# Auto-release after timeout
# OR manually release
family.remove_assignee()
Locking
Lock documents to prevent modifications:
# Lock during processing
family.lock()
# Unlock when done
family.unlock()
Pagination and Filtering
List and query documents with powerful filters:
# List all documents
families = store.list_families(
page=0,
page_size=50
)
# Filter by metadata
families = store.list_families(
filter={
"metadata.department": "accounting",
"status": "PROCESSED"
}
)
# Search by label
families = store.list_families(
labels=["urgent", "verified"]
)
Integration with AI Assistants
Stores integrate seamlessly with Kodexa AI Assistants:
# Reprocess documents with an assistant
store.reprocess(
family_ids=["family-1", "family-2"],
assistant_ids=["extraction-assistant"]
)
# Set active assistant for a document
family.set_active_assistant(assistant)
# Get processing executions
executions = family.get_executions()
Best Practices
Path Organization
✅ Good:
invoices/2024/Q1/vendor-A/invoice-001.pdf
contracts/2024/clients/acme/agreement.pdf
❌ Avoid:
invoice-2024-01-15-vendor-a-001.pdf (no hierarchy)
INVOICES/Vendor A/Invoice #001.pdf (special chars, spaces)
✅ Good:
metadata = {
"vendor_id": "V-12345",
"amount": "1500.00",
"date": "2024-01-15",
"category": "office_supplies"
}
❌ Avoid:
metadata = {
"info": "vendor acme 1500 dollars office supplies" # unstructured
}
Version Management
✅ Good:
# Use semantic versions
invoices:v1.0
invoices:v2.0
# Or environment names
invoices:production
invoices:staging
❌ Avoid:
invoices:test123
invoices:johns-version
Next Steps