ProcessingStep
AProcessingStep represents a unit of work in a document processing pipeline. Steps form a DAG (directed acyclic graph) with parent-child relationships, enabling you to track the full lineage of how a document was processed.
Creating Steps
Fields
| Field | Type | Description |
|---|---|---|
id | str | UUID (auto-generated) |
name | str | Step name (required) |
start_timestamp | datetime | When the step started |
duration | int | Duration in milliseconds |
metadata | dict | Arbitrary key-value metadata |
presentation_metadata | dict | UI display hints |
children | List[ProcessingStep] | Child steps |
parents | List[ProcessingStep] | Parent steps |
internal_steps | List[ProcessingStep] | Internal sub-steps |
knowledge_items | List[KnowledgeItem] | Associated knowledge items |
Parent-Child Relationships
Build processing hierarchies:Merging Steps
Combine multiple processing branches:Serialization
Steps serialize to JSON with circular reference handling:to_dict() method uses a seen set to handle circular references from bidirectional parent-child links. The from_dict() method uses a step_cache to reconstruct these references.
KnowledgeItem
AKnowledgeItem represents a piece of knowledge produced or consumed during processing.
Fields
| Field | Type | Description |
|---|---|---|
id | str | UUID (auto-generated) |
title | str | Display title |
description | str | Description |
slug | str | URL-friendly identifier |
sequence_order | int | Ordering within a set |
knowledge_item_type_ref | str | Reference to the item type |
properties | dict | Arbitrary properties |
features | List[KnowledgeFeature] | Associated features |
KnowledgeFeature
AKnowledgeFeature attaches structured metadata to knowledge items.
Fields
| Field | Type | Description |
|---|---|---|
id | str | UUID (auto-generated) |
feature_type_ref | str | Reference to the feature type |
slug | str | URL-friendly identifier |
active | bool | Whether this feature is active (default: True) |
properties | dict | Core properties |
extended_properties | dict | Additional properties |
Attaching to Processing Steps
Knowledge items are associated with processing steps:PipelineContext
PipelineContext tracks the state of a running execution pipeline. It is primarily used by module developers building custom processing steps.
Fields
| Field | Type | Description |
|---|---|---|
execution_id | str | UUID for this execution (auto-generated if not provided) |
statistics | PipelineStatistics | Tracks documents processed |
output_document | Document | The output document |
content_objects | list | Content objects in the pipeline |
stop_on_exception | bool | Whether to halt on errors (default: True) |
current_document | Document | The document currently being processed |
document_family | DocumentFamily | The document family context |
document_store | Store | The associated document store |
Status Updates
Report progress during execution:Cancellation
Check for user-initiated cancellation:RemoteStep
RemoteStep wraps a reference to a module on the platform and can process documents remotely:
