Skip to main content
Python modules (moduleType: model) are executable code that processes documents on the Kodexa platform. They are the core building block for inference, transformation, and event-driven workflows.

Project Structure

A Python module has a simple structure:
my-module/
├── module/
│   ├── __init__.py
│   └── module.py          # Main module implementation
├── module.yml              # Kodexa module definition
├── pyproject.toml          # Python project configuration
└── makefile                # Common tasks (optional)

Module Definition (module.yml)

Every module needs a module.yml that describes it to the platform:
slug: my-module
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: My Module
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*
Key fields:
  • slug: Unique identifier within your organization
  • orgSlug: Your Kodexa organization slug
  • moduleRuntimeRef: The runtime environment that executes your module (see Module Runtimes)
  • contents: Glob patterns for files to include in the deployment

Writing an Inference Module

The most common module type processes documents. The entry point is an infer function in module/module.py:
import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document):
    logger.info(f"Processing document: {document.uuid}")

    # Access document content
    all_text = document.content_node.get_all_content()

    # Add a label to the document
    document.add_label("processed")

    return document

Magic Parameter Injection

The Kodexa bridge automatically injects parameters based on your function signature. Declare only what you need:
def infer(document, project, pipeline_context, assistant, status_reporter=None):
    # All parameters are automatically injected
    ...
Available parameters:
ParameterTypeDescription
documentDocumentThe Kodexa document being processed
model_basestrPath to the model’s base directory on disk
pipeline_contextPipelineContextPipeline execution context
module_refstrThe module reference being executed
module_optionsdictThe full resolved module options map
assistantAssistantThe assistant for LLM interaction
assistant_idstrThe assistant ID for this execution
projectProjectThe project this execution belongs to
execution_idstrCurrent execution ID
status_reporterStatusReporterFor posting live status updates to the UI
For document family, content object, and raw event metadata, use pipeline_context:
  • pipeline_context.document_family
  • pipeline_context.content_object
  • pipeline_context.document_store
  • pipeline_context.context

Inference Options

You can define options that are displayed in the UI and passed to your function:
# module.yml
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.85
      description: "Minimum confidence score for classification"
    - name: document_type
      type: string
      default: "invoice"
      description: "Expected document type"
  contents:
    - module/*
Options are injected as parameters by name:
def infer(document, confidence_threshold, document_type):
    logger.info(f"Classifying as {document_type} with threshold {confidence_threshold}")
    return document

Status Reporting

Use status_reporter for live UI updates during processing:
def infer(document, status_reporter=None, pipeline_context=None):
    if status_reporter:
        status_reporter.update("Extracting tables", status_type="processing")

    # Process pages with progress tracking
    pages = document.get_nodes()
    for i, page in enumerate(pages):
        if pipeline_context:
            pipeline_context.status_handler(
                f"Processing page {i+1}",
                i + 1,
                len(pages)
            )
        # ... process page ...

    return document
Status types: thinking, searching, planning, reviewing, processing, analyzing, writing, waiting.

Writing an Event Handler Module

Event handler modules react to platform events (e.g., document uploads or status changes). Add the eventAware flag and implement handle_event:
# module.yml
metadata:
  eventAware: true
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  contents:
    - module/*
import logging

logger = logging.getLogger(__name__)

def handle_event(document=None, pipeline_context=None):
    context = pipeline_context.context if pipeline_context else {}
    logger.info(
        "Received event %s for document family %s",
        context.get("eventType"),
        context.get("documentFamilyId"),
    )
    return document
The bridge does not inject a standalone event parameter. Read raw event fields from pipeline_context.context. See Event Handling with Modules for more details.

Using Module Sidecars

Sidecars let you share code between modules. Reference other modules as sidecars in your module.yml:
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  moduleSidecars:
    - kodexa/kodexa-langchain-module
  contents:
    - module/*
You can then import from the sidecar module in your code:
from kodexa_langchain.utils import get_bedrock_client
bedrock_client = get_bedrock_client(region="us-east-1")
See Module Sidecars for more details.

Targeting a Custom Package

By convention, the runtime imports the module/ package and calls infer. If your ZIP contains multiple Python packages, set metadata.modelRuntimeParameters.module so the bridge imports the correct one:
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  modelRuntimeParameters:
    module: my_custom_module
  contents:
    - my_custom_module/*

Example: Document Classifier

Here’s a complete example of an inference module that classifies documents:

module.yml

slug: document-classifier
version: 1.0.0
orgSlug: my-org
type: store
storeType: MODEL
name: Document Classifier
metadata:
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: confidence_threshold
      type: number
      default: 0.7
      description: "Minimum confidence for classification"
  contents:
    - module/*

module/module.py

import logging
from kodexa_document import Document

logger = logging.getLogger(__name__)

def infer(document, confidence_threshold=0.7, status_reporter=None):
    """Classify a document based on its content."""
    logger.info(f"Processing document: {document.uuid}")

    if status_reporter:
        status_reporter.update("Analyzing document content", status_type="analyzing")

    all_text = document.content_node.get_all_content().lower()

    # Simple keyword-based classification
    classifications = {
        "invoice": ["invoice", "total", "amount due", "bill to"],
        "contract": ["agreement", "parties", "terms and conditions"],
        "resume": ["resume", "curriculum vitae", "experience", "education"],
    }

    best_match = "unknown"
    best_score = 0

    for doc_type, keywords in classifications.items():
        matches = sum(1 for kw in keywords if kw in all_text)
        score = matches / len(keywords)
        if score > best_score and score >= confidence_threshold:
            best_match = doc_type
            best_score = score

    document.add_label("document_type", best_match)
    logger.info(f"Classified as: {best_match} (score: {best_score:.2f})")

    return document

Deploying

Deploy with the KDX CLI:
kdx apply -f module.yml
This will:
  1. Create or update the module metadata in the platform
  2. Package all files matching metadata.contents patterns
  3. Upload the implementation

Debugging a Deployment

To keep the deployment ZIP for inspection, add keepZip: true to your metadata:
metadata:
  keepZip: true
  moduleRuntimeRef: kodexa/base-module-runtime
  ...
To download a deployed module’s metadata:
kdx get module my-org/my-module -o yaml

Using Your Module

Once deployed, you can add your module to an Assistant in your project through Studio. The module will appear in the list of available modules for your organization. See Working with Modules for more details.

Troubleshooting

Module import errors

  • Ensure your Python environment has the correct dependencies installed
  • Verify import statements use the correct package name
  • Check that your metadata.contents patterns match your project structure

Deployment failures

  • Verify KDX CLI is configured: kdx config --list
  • Check the orgSlug in module.yml matches your organization
  • Ensure module.yml is valid YAML
  • Run kdx apply from the directory containing module.yml

Module not working as expected

  • Add logging to your function to trace execution
  • Check that you’re returning the document from your infer function
  • Verify the module is receiving the expected document format
  • Review execution logs in Studio