Python Modules

Python modules (moduleType: model) are executable code that processes documents on the Kodexa platform. They are the core building block for inference, transformation, and event-driven workflows.

Project Structure

A Python module has a simple structure:

my-module/
├── module/
│   ├── __init__.py
│   └── module.py          # Main module implementation
├── module.yml              # Kodexa module definition
├── pyproject.toml          # Python project configuration
└── makefile                # Common tasks (optional)

Module Definition (module.yml)

Every module needs a module.yml that describes it to the platform:

slug: my-module
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: My Module
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*

Key fields:

slug: Unique identifier within your organization
orgSlug: Your Kodexa organization slug
moduleRuntimeRef: The runtime environment that executes your module (see Module Runtimes)
contents: Glob patterns for files to include in the deployment

Writing an Inference Module

The most common module type processes documents. The entry point is an infer function in module/module.py:

import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document):
    logger.info(f"Processing document: {document.uuid}")

    # Access document content
    all_text = document.content_node.get_all_content()

    # Add a label to the document
    document.add_label("processed")

    return document

Magic Parameter Injection

The Kodexa bridge automatically injects parameters based on your function signature. Declare only what you need:

def infer(document, project, pipeline_context, assistant, status_reporter=None):
    # All parameters are automatically injected
    ...

Available parameters:

Parameter	Type	Description
`document`	`Document`	The Kodexa document being processed
`model_base`	`str`	Path to the model’s base directory on disk
`pipeline_context`	`PipelineContext`	Pipeline execution context
`module_ref`	`str`	The module reference being executed
`module_options`	`dict`	The full resolved module options map
`assistant`	`Assistant`	The assistant for LLM interaction
`assistant_id`	`str`	The assistant ID for this execution
`project`	`Project`	The project this execution belongs to
`execution_id`	`str`	Current execution ID
`status_reporter`	`StatusReporter`	For posting live status updates to the UI

For document family, content object, and raw event metadata, use pipeline_context:

pipeline_context.document_family
pipeline_context.content_object
pipeline_context.document_store
pipeline_context.context

Inference Options

You can define options that are displayed in the UI and passed to your function:

# module.yml
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.85
      description: "Minimum confidence score for classification"
    - name: document_type
      type: string
      default: "invoice"
      description: "Expected document type"
  contents:
    - module/*

Options are injected as parameters by name:

def infer(document, confidence_threshold, document_type):
    logger.info(f"Classifying as {document_type} with threshold {confidence_threshold}")
    return document

Status Reporting

Use status_reporter for live UI updates during processing:

def infer(document, status_reporter=None, pipeline_context=None):
    if status_reporter:
        status_reporter.update("Extracting tables", status_type="processing")

    # Process pages with progress tracking
    pages = document.get_nodes()
    for i, page in enumerate(pages):
        if pipeline_context:
            pipeline_context.status_handler(
                f"Processing page {i+1}",
                i + 1,
                len(pages)
            )
        # ... process page ...

    return document

Status types: thinking, searching, planning, reviewing, processing, analyzing, writing, waiting.

Writing an Event Handler Module

Event handler modules react to platform events (e.g., document uploads or status changes). Add the eventAware flag and implement handle_event:

# module.yml
metadata:
  eventAware: true
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*

import logging

logger = logging.getLogger(__name__)

def handle_event(document=None, pipeline_context=None):
    context = pipeline_context.context if pipeline_context else {}
    logger.info(
        "Received event %s for document family %s",
        context.get("eventType"),
        context.get("documentFamilyId"),
    )
    return document

The bridge does not inject a standalone event parameter. Read raw event fields from pipeline_context.context. See Event Handling with Modules for more details.

Using Module Sidecars

Sidecars let you share code between modules. Reference other modules as sidecars in your module.yml:

metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  moduleSidecars:
    - kodexa/kodexa-langchain-module
  contents:
    - module/*

You can then import from the sidecar module in your code:

from kodexa_langchain.utils import get_bedrock_client
bedrock_client = get_bedrock_client(region="us-east-1")

See Module Sidecars for more details.

Targeting a Custom Package

By convention, the runtime imports the module/ package and calls infer. If your ZIP contains multiple Python packages, set metadata.modelRuntimeParameters.module so the bridge imports the correct one:

metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  modelRuntimeParameters:
    module: my_custom_module
  contents:
    - my_custom_module/*

Example: Document Classifier

Here’s a complete example of an inference module that classifies documents:

module.yml

slug: document-classifier
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: Document Classifier
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.7
      description: "Minimum confidence for classification"
  contents:
    - module/*

module/module.py

import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document, confidence_threshold=0.7, status_reporter=None):
    """Classify a document based on its content."""
    logger.info(f"Processing document: {document.uuid}")

    if status_reporter:
        status_reporter.update("Analyzing document content", status_type="analyzing")

    all_text = document.content_node.get_all_content().lower()

    # Simple keyword-based classification
    classifications = {
        "invoice": ["invoice", "total", "amount due", "bill to"],
        "contract": ["agreement", "parties", "terms and conditions"],
        "resume": ["resume", "curriculum vitae", "experience", "education"],
    }

    best_match = "unknown"
    best_score = 0

    for doc_type, keywords in classifications.items():
        matches = sum(1 for kw in keywords if kw in all_text)
        score = matches / len(keywords)
        if score > best_score and score >= confidence_threshold:
            best_match = doc_type
            best_score = score

    document.add_label("document_type", best_match)
    logger.info(f"Classified as: {best_match} (score: {best_score:.2f})")

    return document

Deploying

Deploy with the KDX CLI:

kdx apply -f module.yml

This will:

Create or update the module metadata in the platform
Package all files matching metadata.contents patterns
Upload the implementation

Debugging a Deployment

To keep the deployment ZIP for inspection, add keepZip: true to your metadata:

metadata:
  keepZip: true
  moduleRuntimeRef: kodexa/base-module-runtime
  ...

To download a deployed module’s metadata:

kdx get module my-org/my-module -o yaml

Using Your Module

Once deployed, you can add your module to an Assistant in your project through Studio. The module will appear in the list of available modules for your organization. See Working with Modules for more details.

Troubleshooting

Module import errors

Ensure your Python environment has the correct dependencies installed
Verify import statements use the correct package name
Check that your metadata.contents patterns match your project structure

Deployment failures

Verify KDX CLI is configured: kdx config --list
Check the orgSlug in module.yml matches your organization
Ensure module.yml is valid YAML
Run kdx apply from the directory containing module.yml

Module not working as expected

Add logging to your function to trace execution
Check that you’re returning the document from your infer function
Verify the module is receiving the expected document format
Review execution logs in Studio

Getting Started

Deployment

Kodexa Document

Modules

Excel Processing

Project Structure

Module Definition (module.yml)

Writing an Inference Module

Magic Parameter Injection

Inference Options

Status Reporting

Writing an Event Handler Module

Using Module Sidecars

Targeting a Custom Package

Example: Document Classifier

module.yml

module/module.py

Deploying

Debugging a Deployment

Using Your Module

Troubleshooting

Module import errors

Deployment failures

Module not working as expected

Getting Started

Deployment

Kodexa Document

Modules

Excel Processing

​Project Structure

​Module Definition (module.yml)

​Writing an Inference Module

​Magic Parameter Injection

​Inference Options

​Status Reporting

​Writing an Event Handler Module

​Using Module Sidecars

​Targeting a Custom Package

​Example: Document Classifier

​module.yml

​module/module.py

​Deploying

​Debugging a Deployment

​Using Your Module

​Troubleshooting

​Module import errors

​Deployment failures

​Module not working as expected

Project Structure

Module Definition (module.yml)

Writing an Inference Module

Magic Parameter Injection

Inference Options

Status Reporting

Writing an Event Handler Module

Using Module Sidecars

Targeting a Custom Package

Example: Document Classifier

module.yml

module/module.py

Deploying

Debugging a Deployment

Using Your Module

Troubleshooting

Module import errors

Deployment failures

Module not working as expected