Skip to main content

Data Flow Step Conditionals

Data flow steps in Kodexa pipelines can include conditional expressions that determine whether a step should execute. This allows you to create dynamic pipelines that adapt their behavior based on document properties, metadata, or execution context.

Overview

When a pipeline step has a conditional property set, the expression is evaluated before the step executes. If the expression evaluates to true (or any truthy value), the step runs normally. If it evaluates to false (or any falsy value), the step is skipped and the pipeline continues to the next step.

Syntax

Conditionals use Python-like expression syntax powered by simpleeval. This provides a safe, sandboxed environment for evaluating expressions.

Supported Operators

CategoryOperators
Comparison==, !=, <, >, <=, >=
Logicaland, or, not
Membershipin, not in
Arithmetic+, -, *, /, %
BooleanTrue, False

Parentheses

Use parentheses to group expressions and control evaluation order:
(metadata.get('type') == 'invoice' or metadata.get('type') == 'receipt') and context.get('process_financials')

Available Variables

The conditional expression has access to three variables:

context

The execution context - a dictionary that can be populated when the execution is created or by the project configuration.
# Check if a context value exists
'process_all' in context

# Check context value
context.get('environment') == 'production'

# Access nested values
context.get('settings', {}).get('enable_ocr', False)

metadata

The document’s metadata dictionary. This contains any metadata properties stored on the document being processed.
# Check document type
metadata.get('document_type') == 'invoice'

# Check if metadata field exists
'customer_id' in metadata

# Check multiple conditions
metadata.get('status') == 'ready' and metadata.get('page_count', 0) > 0

external_data

A dictionary containing all external data stored on the document, keyed by their storage key. External data is commonly used to store processing results, taxonomy outputs, or any structured data associated with a document.
# Check default external data
external_data.get('default', {}).get('processed') == True

# Check if specific external data key exists
'extraction_results' in external_data

# Access nested external data
external_data.get('taxonomy', {}).get('document_classification') == 'financial'

# Check data from a specific step's output
external_data.get('ocr_results', {}).get('confidence', 0) > 0.9

Examples

Basic Conditionals

Skip step if document is already processed:
steps:
  - ref: kodexa/ocr-processor
    conditional: "metadata.get('ocr_complete') != True"
Only run for specific document types:
steps:
  - ref: kodexa/invoice-extractor
    conditional: "metadata.get('document_type') == 'invoice'"
Check if external data exists:
steps:
  - ref: kodexa/data-publisher
    conditional: "'extraction_results' in external_data"

Combining Conditions

Multiple metadata checks:
steps:
  - ref: kodexa/advanced-processor
    conditional: "metadata.get('status') == 'ready' and metadata.get('page_count', 0) <= 100"
Context and metadata combined:
steps:
  - ref: kodexa/expensive-operation
    conditional: "context.get('enable_full_processing', False) or metadata.get('priority') == 'high'"
Check external data quality:
steps:
  - ref: kodexa/validation-step
    conditional: "external_data.get('extraction', {}).get('confidence', 0) >= 0.85"

Environment-Based Conditions

Production-only steps:
steps:
  - ref: kodexa/audit-logger
    conditional: "context.get('environment') == 'production'"
Feature flags:
steps:
  - ref: kodexa/experimental-feature
    conditional: "context.get('feature_flags', {}).get('new_algorithm', False)"

Document State Checks

Check processing history:
steps:
  - ref: kodexa/reprocess-step
    conditional: "metadata.get('last_processed_version', 0) < 2"
Conditional based on document content:
steps:
  - ref: kodexa/multi-page-handler
    conditional: "metadata.get('page_count', 1) > 1"

Default Values

When accessing dictionary values that might not exist, always use .get() with a default value to prevent errors:
# Good - uses default value
metadata.get('field_name', 'default_value')
external_data.get('key', {}).get('nested', False)

# Risky - may cause KeyError if field doesn't exist
metadata['field_name']  # Avoid this pattern

Step Behavior When Skipped

When a step’s conditional evaluates to false:
  1. The step status is set to SKIPPED
  2. The document from the previous step (if any) is preserved and passed to the next step
  3. Processing continues with the next step in the pipeline
  4. The skip reason is logged for debugging

Error Handling

If a conditional expression fails to evaluate (syntax error, runtime error, etc.):
  1. The step is automatically skipped
  2. The error is logged with details
  3. The step status indicates “Conditional evaluation failed”
  4. Pipeline execution continues
This fail-safe behavior ensures that malformed conditionals don’t crash the entire pipeline.

Best Practices

  1. Use descriptive metadata keys: Name your metadata fields clearly so conditionals are self-documenting.
  2. Always provide defaults: Use .get(key, default) to handle missing values gracefully.
  3. Keep expressions simple: Complex logic should be handled in step code, not conditionals.
  4. Test conditionals: Verify your conditional logic works with various document states before deploying.
  5. Document your conditionals: Add comments in your pipeline configuration explaining why each conditional exists.

Debugging Tips

  • Check the execution logs to see conditional evaluation results
  • Verify the document has the expected metadata/external data before the step runs
  • Use simple test expressions to isolate issues
  • Remember that None, empty strings "", empty dicts {}, and 0 are all falsy values