Skip to main content
Python modules (moduleType: model) are executable code that processes documents on the Kodexa platform. They are the core building block for inference, transformation, and event-driven workflows.

Project Structure

A Python module has a simple structure:
my-module/
├── module/
│   ├── __init__.py
│   └── module.py          # Main module implementation
├── module.yml              # Kodexa module definition
├── pyproject.toml          # Python project configuration
└── makefile                # Common tasks (optional)

Module Definition (module.yml)

Every module needs a module.yml that describes it to the platform:
slug: my-module
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: My Module
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*
Key fields:
  • slug: Unique identifier within your organization
  • orgSlug: Your Kodexa organization slug
  • moduleRuntimeRef: The runtime environment that executes your module (see Module Runtimes)
  • contents: Glob patterns for files to include in the deployment

Writing an Inference Module

The most common module type processes documents. The entry point is an infer function in module/module.py:
import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document):
    logger.info(f"Processing document: {document.uuid}")

    # Access document content
    all_text = document.content_node.get_all_content()

    # Add a label to the document
    document.add_label("processed")

    return document

Magic Parameter Injection

The Kodexa bridge automatically injects parameters based on your function signature. Declare only what you need:
def infer(document, project, pipeline_context, assistant, status_reporter=None):
    # All parameters are automatically injected
    ...
Available parameters:
ParameterTypeDescription
documentDocumentThe Kodexa document being processed
document_familyDocumentFamilyThe document family, if available
model_basestrPath to the model’s base directory on disk
pipeline_contextPipelineContextPipeline execution context
model_storeModelStoreFor persisting/loading model artifacts
assistantAssistantThe assistant for LLM interaction
projectProjectThe project this execution belongs to
execution_idstrCurrent execution ID
status_reporterStatusReporterFor posting live status updates to the UI
eventdictThe triggering event (event handler only)

Inference Options

You can define options that are displayed in the UI and passed to your function:
# module.yml
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.85
      description: "Minimum confidence score for classification"
    - name: document_type
      type: string
      default: "invoice"
      description: "Expected document type"
  contents:
    - module/*
Options are injected as parameters by name:
def infer(document, confidence_threshold, document_type):
    logger.info(f"Classifying as {document_type} with threshold {confidence_threshold}")
    return document

Status Reporting

Use status_reporter for live UI updates during processing:
def infer(document, status_reporter=None, pipeline_context=None):
    if status_reporter:
        status_reporter.update("Extracting tables", status_type="processing")

    # Process pages with progress tracking
    pages = document.get_nodes()
    for i, page in enumerate(pages):
        if pipeline_context:
            pipeline_context.status_handler(
                f"Processing page {i+1}",
                i + 1,
                len(pages)
            )
        # ... process page ...

    return document
Status types: thinking, searching, planning, reviewing, processing, analyzing, writing, waiting.

Writing an Event Handler Module

Event handler modules react to platform events (e.g., document uploads, status changes). Add the eventAware flag and implement handle_event:
# module.yml
metadata:
  eventAware: true
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*
import logging

logger = logging.getLogger(__name__)

def handle_event(event):
    logger.info(f"Received event: {event}")
    # React to the event - e.g., trigger processing, update state
See Event Handling with Modules for more details.

Using Module Sidecars

Sidecars let you share code between modules. Reference other modules as sidecars in your module.yml:
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  moduleSidecars:
    - kodexa/kodexa-langchain-module
  contents:
    - module/*
You can then import from the sidecar module in your code:
from kodexa_langchain.utils import get_bedrock_client
bedrock_client = get_bedrock_client(region="us-east-1")
See Module Sidecars for more details.

Overriding Entry Points

By default, the runtime expects a module package with an infer function. Override this with moduleRuntimeParameters:
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  moduleRuntimeParameters:
    module: my_custom_module
    function: custom_infer
  contents:
    - my_custom_module/*

Example: Document Classifier

Here’s a complete example of an inference module that classifies documents:

module.yml

slug: document-classifier
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: Document Classifier
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.7
      description: "Minimum confidence for classification"
  contents:
    - module/*

module/module.py

import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document, confidence_threshold=0.7, status_reporter=None):
    """Classify a document based on its content."""
    logger.info(f"Processing document: {document.uuid}")

    if status_reporter:
        status_reporter.update("Analyzing document content", status_type="analyzing")

    all_text = document.content_node.get_all_content().lower()

    # Simple keyword-based classification
    classifications = {
        "invoice": ["invoice", "total", "amount due", "bill to"],
        "contract": ["agreement", "parties", "terms and conditions"],
        "resume": ["resume", "curriculum vitae", "experience", "education"],
    }

    best_match = "unknown"
    best_score = 0

    for doc_type, keywords in classifications.items():
        matches = sum(1 for kw in keywords if kw in all_text)
        score = matches / len(keywords)
        if score > best_score and score >= confidence_threshold:
            best_match = doc_type
            best_score = score

    document.add_label("document_type", best_match)
    logger.info(f"Classified as: {best_match} (score: {best_score:.2f})")

    return document

Deploying

Deploy with the KDX CLI:
kdx apply -f module.yml
This will:
  1. Create or update the module metadata in the platform
  2. Package all files matching metadata.contents patterns
  3. Upload the implementation

Debugging a Deployment

To keep the deployment ZIP for inspection, add keepZip: true to your metadata:
metadata:
  keepZip: true
  moduleRuntimeRef: kodexa/base-module-runtime
  ...
To download a deployed module’s metadata:
kdx get module my-org/my-module -o yaml

Using Your Module

Once deployed, you can add your module to an Assistant in your project through Studio. The module will appear in the list of available modules for your organization. See Working with Modules for more details.

Troubleshooting

Module import errors

  • Ensure your Python environment has the correct dependencies installed
  • Verify import statements use the correct package name
  • Check that your metadata.contents patterns match your project structure

Deployment failures

  • Verify KDX CLI is configured: kdx config --list
  • Check the orgSlug in module.yml matches your organization
  • Ensure module.yml is valid YAML
  • Run kdx apply from the directory containing module.yml

Module not working as expected

  • Add logging to your function to trace execution
  • Check that you’re returning the document from your infer function
  • Verify the module is receiving the expected document format
  • Review execution logs in Studio