Data Flow Step Conditionals

Data flow steps in Kodexa pipelines can include conditional expressions that determine whether a step should execute. This allows you to create dynamic pipelines that adapt their behavior based on document properties, metadata, or execution context.

Overview

When a pipeline step has a conditional property set, the expression is evaluated before the step executes. If the expression evaluates to true (or any truthy value), the step runs normally. If it evaluates to false (or any falsy value), the step is skipped and the pipeline continues to the next step.

Syntax

Conditionals use Python-like expression syntax powered by simpleeval. This provides a safe, sandboxed environment for evaluating expressions.

Supported Operators

Category	Operators
Comparison	`==`, `!=`, `<`, `>`, `<=`, `>=`
Logical	`and`, `or`, `not`
Membership	`in`, `not in`
Arithmetic	`+`, `-`, `*`, `/`, `%`
Boolean	`True`, `False`

Parentheses

Use parentheses to group expressions and control evaluation order:

(metadata.get('type') == 'invoice' or metadata.get('type') == 'receipt') and context.get('process_financials')

Available Variables

The conditional expression has access to three variables:

`context`

The execution context - a dictionary that can be populated when the execution is created or by the project configuration.

# Check if a context value exists
'process_all' in context

# Check context value
context.get('environment') == 'production'

# Access nested values
context.get('settings', {}).get('enable_ocr', False)

`metadata`

The document’s metadata dictionary. This contains any metadata properties stored on the document being processed.

# Check document type
metadata.get('document_type') == 'invoice'

# Check if metadata field exists
'customer_id' in metadata

# Check multiple conditions
metadata.get('status') == 'ready' and metadata.get('page_count', 0) > 0

`external_data`

A dictionary containing all external data stored on the document, keyed by their storage key. External data is commonly used to store processing results, taxonomy outputs, or any structured data associated with a document.

# Check default external data
external_data.get('default', {}).get('processed') == True

# Check if specific external data key exists
'extraction_results' in external_data

# Access nested external data
external_data.get('taxonomy', {}).get('document_classification') == 'financial'

# Check data from a specific step's output
external_data.get('ocr_results', {}).get('confidence', 0) > 0.9

Examples

Basic Conditionals

Skip step if document is already processed:

steps:
  - ref: kodexa/ocr-processor
    conditional: "metadata.get('ocr_complete') != True"

Only run for specific document types:

steps:
  - ref: kodexa/invoice-extractor
    conditional: "metadata.get('document_type') == 'invoice'"

Check if external data exists:

steps:
  - ref: kodexa/data-publisher
    conditional: "'extraction_results' in external_data"

Combining Conditions

Multiple metadata checks:

steps:
  - ref: kodexa/advanced-processor
    conditional: "metadata.get('status') == 'ready' and metadata.get('page_count', 0) <= 100"

Context and metadata combined:

steps:
  - ref: kodexa/expensive-operation
    conditional: "context.get('enable_full_processing', False) or metadata.get('priority') == 'high'"

Check external data quality:

steps:
  - ref: kodexa/validation-step
    conditional: "external_data.get('extraction', {}).get('confidence', 0) >= 0.85"

Environment-Based Conditions

Production-only steps:

steps:
  - ref: kodexa/audit-logger
    conditional: "context.get('environment') == 'production'"

Feature flags:

steps:
  - ref: kodexa/experimental-feature
    conditional: "context.get('feature_flags', {}).get('new_algorithm', False)"

Document State Checks

Check processing history:

steps:
  - ref: kodexa/reprocess-step
    conditional: "metadata.get('last_processed_version', 0) < 2"

Conditional based on document content:

steps:
  - ref: kodexa/multi-page-handler
    conditional: "metadata.get('page_count', 1) > 1"

Default Values

When accessing dictionary values that might not exist, always use .get() with a default value to prevent errors:

# Good - uses default value
metadata.get('field_name', 'default_value')
external_data.get('key', {}).get('nested', False)

# Risky - may cause KeyError if field doesn't exist
metadata['field_name']  # Avoid this pattern

Step Behavior When Skipped

When a step’s conditional evaluates to false:

The step status is set to SKIPPED
The document from the previous step (if any) is preserved and passed to the next step
Processing continues with the next step in the pipeline
The skip reason is logged for debugging

Error Handling

If a conditional expression fails to evaluate (syntax error, runtime error, etc.):

The step is automatically skipped
The error is logged with details
The step status indicates “Conditional evaluation failed”
Pipeline execution continues

This fail-safe behavior ensures that malformed conditionals don’t crash the entire pipeline.

Best Practices

Use descriptive metadata keys: Name your metadata fields clearly so conditionals are self-documenting.
Always provide defaults: Use .get(key, default) to handle missing values gracefully.
Keep expressions simple: Complex logic should be handled in step code, not conditionals.
Test conditionals: Verify your conditional logic works with various document states before deploying.
Document your conditionals: Add comments in your pipeline configuration explaining why each conditional exists.

Debugging Tips

Check the execution logs to see conditional evaluation results
Verify the document has the expected metadata/external data before the step runs
Use simple test expressions to isolate issues
Remember that None, empty strings "", empty dicts {}, and 0 are all falsy values

Introduction

Organization & Projects

Knowledge System

Resources

Modules

Data Forms

Overview

Syntax

Supported Operators

Parentheses

Available Variables

`context`

`metadata`

`external_data`

Examples

Basic Conditionals

Combining Conditions

Environment-Based Conditions

Document State Checks

Default Values

Step Behavior When Skipped

Error Handling

Best Practices

Debugging Tips

Introduction

Organization & Projects

Knowledge System

Resources

Modules

Data Forms

​Overview

​Syntax

​Supported Operators

​Parentheses

​Available Variables

​context

​metadata

​external_data

​Examples

​Basic Conditionals

​Combining Conditions

​Environment-Based Conditions

​Document State Checks

​Default Values

​Step Behavior When Skipped

​Error Handling

​Best Practices

​Debugging Tips

Overview

Syntax

Supported Operators

Parentheses

Available Variables

`context`

`metadata`

`external_data`

Examples

Basic Conditionals

Combining Conditions

Environment-Based Conditions

Document State Checks

Default Values

Step Behavior When Skipped

Error Handling

Best Practices

Debugging Tips