What is a Document Family?
A Document Family is Kodexa’s core entity representing a single document with all its versions, processing history, metadata, and extracted data. Every document uploaded to a store creates a document family.Accessing Document Families
Access documents directly using the DocumentFamilies API with the document family ID:When to Use DocumentFamilies API
Use the/api/document-families endpoints when you need to:
- Upload new content - Add new versions or documents to an existing family with knowledge features
- Access a specific document by ID - When you have the UUID from processing results or webhooks
- Get external data - Retrieve data from external systems associated with the document
- Check processing steps - View the complete processing pipeline and transformations
- Update document status - Change workflow status (PROCESSING, COMPLETE, FAILED, etc.)
- Manage knowledge features - Add or remove knowledge base entries linked to the document
- Trigger events - Send document update notifications without modifying content
Uploading Content to Document Families
The/api/document-families/{id}/newContent endpoint is the primary way to upload new content to an existing document family. This endpoint supports attaching knowledge features during upload.
Endpoint
Form Data Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
document | File | Yes | The Kodexa document file to upload |
sourceContentObjectId | String | No | ID of the source content object for the transition (defaults to latest) |
transitionType | String | No | Transition type: DERIVED, REVISED, etc. (defaults to DERIVED) |
dataStoreRef | String | No | Reference to a data store for extraction (e.g., “org/slug/version”) |
taxonomyRefs | String | No | Comma-separated taxonomy references for extraction |
documentVersion | String | No | Version string for the new content object |
actorType | String | No | Actor type for audit trail: USER, API, SYSTEM, ASSISTANT |
actorId | String | No | Actor ID for audit trail (defaults to current user ID) |
label | String | No | Label to add to the document family and content object |
Example: Upload with cURL
Example: Upload with Python
Example: Upload with Data Extraction
Managing Knowledge Features
Knowledge features allow you to attach structured metadata and classification information to document families. Features are linked to both theContentObject and the DocumentFamily.
Knowledge Feature Structure
- knowledgeFeatureRef: The slug of an existing
KnowledgeFeatureTypein your organization - properties: A map of key-value pairs specific to this feature instance
Example: Provider Feature
To set aprovider knowledge feature with a providerId:
Example: Multiple Features
You can work with multiple knowledge features:Add Knowledge Feature
Remove Knowledge Feature
Assess Document for Knowledge
Automatically assess a document family for applicable knowledge features and sets:- Extracts features from the content object
- Finds applicable knowledge sets based on organization and store-project relationships
- Associates new features with the document family
- Skips documents that are locked
Get Related Knowledge Items
Retrieve all knowledge items related to a document family through shared knowledge features:Get Applied Knowledge Sets
Get all knowledge sets that have been applied to a document family:Feature Deduplication
Knowledge features are deduplicated based on(featureType, properties):
- If a feature with the same type slug and identical properties already exists, the existing feature is reused
- If the properties differ, a new feature is created
- Features are linked to both the
ContentObjectand theDocumentFamily
provider and providerId will share a single KnowledgeFeature record.
Prerequisites for Knowledge Features
Before working with knowledge features:- Ensure the
KnowledgeFeatureTypeexists (e.g.,providertype must be created first) - The feature type slug in
knowledgeFeatureRefmust match exactly - If the feature type doesn’t exist, linking will fail silently with a warning in the logs
Filtering by Knowledge Expression
The list endpoint supports filtering document families by knowledge expressions using boolean logic:Expression Types
| Type | Description | Example |
|---|---|---|
FEATURE | Match documents with a specific feature | {"type":"FEATURE","slug":"document-type-abc123"} |
AND | Match documents with ALL specified features | {"type":"AND","children":[...]} |
OR | Match documents with ANY specified features | {"type":"OR","children":[...]} |
NOT | Match documents WITHOUT a feature | {"type":"NOT","children":[...]} |
Example: Filter by Single Feature
Example: Filter by Multiple Features (AND)
Key Operations
Get External Data
Documents can store data from external systems (ERP, CRM, databases):Update External Data
Store references or metadata from external systems:Get Processing Steps
View the complete processing pipeline:Update Document Status
Change workflow status:Touch Document
Trigger events without changes:External Data Use Cases
External data provides a bridge between Kodexa and your business systems:ERP Integration
CRM Tracking
Workflow State
Processing Steps Explained
Processing steps track every transformation:Document Status Values
Common status values for workflow management:| Status | Description | Use Case |
|---|---|---|
UPLOADED | Document uploaded, awaiting processing | Initial state |
PROCESSING | AI processing in progress | During extraction |
PROCESSED | Processing complete, data extracted | Ready for review |
REVIEW | Awaiting human review | Quality control |
APPROVED | Reviewed and approved | Ready for export |
REJECTED | Rejected during review | Needs correction |
FAILED | Processing failed | Error handling |
ARCHIVED | Archived for retention | Long-term storage |
Best Practices
Use External Data for System Integration
Choose the Right Access Method
Status Workflow
Reprocessing Documents
Trigger reprocessing of a document family with specific assistants:Exporting Document Families
Export a document family as a.dfm file:
Getting Data Exports
Export data objects from a document family in various formats:json- Standard JSON formatcsv- Comma-separated valuesxml- XML formatdatalake- NDJson for lakehouse/S3 storage with metadata wrapper
Next Steps
- Get Document Family
- Upload New Content - Upload content to existing families
- Get External Data
- Update Document Status
- Touch Document Family
