LLM & Model Manager - Kodexa Developer Portal

The LLM module provides a unified interface for calling large language models from your Python modules. All LLM calls are routed through the Kodexa AI Gateway, which handles provider routing, credential management, rate limiting, and cost tracking centrally.

Architecture

Instead of calling provider APIs directly (OpenAI, Bedrock, Gemini, etc.), all LLM requests go through a single gateway:

Your Module Code
    ↓
ModelManager (discovers available models)
    ↓
POST /api/ai/chat/completions (X-API-Key auth)
    ↓
Kodexa AI Gateway (routes to correct provider)
    ↓
Response (OpenAI-compatible format)

This means your module code never needs provider-specific API keys or SDKs. The platform manages all credentials centrally.

Quick Start

from kodexa_document.llm import ModelManager, ChatMessage

# Get the singleton manager
manager = ModelManager()

# Fetch a model by name
model = manager.get_model("gpt-4o")

# Build messages
messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is 2+2?"),
]

# Invoke
response_text, thinking_output, metrics = model.invoke(messages)

print(f"Response: {response_text}")
print(f"Tokens: {metrics.input_tokens} in, {metrics.output_tokens} out")

ModelManager

ModelManager is a singleton that discovers available models from the platform at runtime. On first use, it queries the platform’s cloud-models API and creates a GatewayModelProvider for each model.

Getting Models

from kodexa_document.llm import ModelManager

manager = ModelManager()

# Get a specific model by gateway name
model = manager.get_model("gpt-4o")

# Or by provider model ID (backward compatible)
model = manager.get_model("anthropic.claude-sonnet-4-20250514-v1:0")

# List all available models
for m in manager.get_models():
    print(f"{m.name} ({m.get_model_id()})")

Environment Variables

Variable	Required	Description
`KODEXA_URL`	Yes	Platform base URL (e.g., `https://platform.kodexa.ai`)
`KODEXA_ACCESS_TOKEN`	Yes	API key or access token
`KODEXA_LLM_READ_TIMEOUT`	No	HTTP read timeout in seconds (default: 300)
`KODEXA_LLM_CONNECT_TIMEOUT`	No	HTTP connect timeout in seconds (default: 30)

When running inside a Kodexa module execution, KODEXA_URL and KODEXA_ACCESS_TOKEN are automatically set by the platform. You do not need to configure them manually.

ChatMessage

Represents a message in a conversation with an LLM.

from kodexa_document.llm import ChatMessage

# Text message
msg = ChatMessage(role="user", content="Analyze this document")

# System instruction
system = ChatMessage(role="system", content="You are a document analyst")

# Image message (multimodal)
with open("page.jpg", "rb") as f:
    image_msg = ChatMessage(
        role="user",
        content=f.read(),
        media_type="image/jpeg"
    )

Fields

Field	Type	Description
`role`	`str`	Message role: `"user"`, `"assistant"`, or `"system"`
`content`	`str \| bytes`	Text content or image bytes
`media_type`	`str`	MIME type for image content (e.g., `"image/jpeg"`)
`page_key`	`int`	Optional page reference for image content
`cache_control`	`dict`	Optional cache control directives

Invoking Models

Basic Invocation

invoke() sends messages and returns the response synchronously.

response_text, thinking_output, metrics = model.invoke(
    messages=[
        ChatMessage(role="user", content="Summarize this text: ...")
    ],
    note="Document summary for invoice processing"
)

Parameters:

Parameter	Type	Description
`messages`	`List[ChatMessage]`	The conversation messages
`note`	`str`	Optional label for cost tracking
`enable_thinking_mode`	`bool`	Enable extended thinking (default: `False`)

Returns: Tuple[str, Optional[str], LLMUsageMetrics]

str — The response text
Optional[str] — Thinking output (if thinking mode enabled and supported)
LLMUsageMetrics — Token usage and timing

Async Invocation

response_text, thinking_output, metrics = await model.ainvoke(
    messages=[ChatMessage(role="user", content="Hello")],
    note="Async greeting"
)

Async invocation requires the httpx package. Install it with: pip install httpx

Streaming

stream_invoke() yields text chunks as they arrive from the model, useful for real-time display.

for chunk in model.stream_invoke(
    messages=[ChatMessage(role="user", content="Write a long analysis...")],
    note="Streaming analysis"
):
    print(chunk, end="", flush=True)

Thinking Mode

Some models (Claude 3.7+, Gemini 2.5+) support extended thinking, where the model shows its reasoning process.

response, thinking, metrics = model.invoke(
    messages=[ChatMessage(role="user", content="Solve this complex problem...")],
    enable_thinking_mode=True
)

if thinking:
    print(f"Reasoning: {thinking}")
print(f"Answer: {response}")

Function Calling / Structured Output

Use invoke_function() to extract structured data using a JSON schema. The model is instructed to call a function with arguments matching your schema.

schema = {
    "type": "object",
    "properties": {
        "vendor_name": {"type": "string", "description": "The vendor's name"},
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"},
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "amount": {"type": "number"}
                }
            }
        }
    },
    "required": ["vendor_name", "invoice_number", "total_amount"]
}

result, metrics = model.invoke_function(
    messages=[
        ChatMessage(role="user", content=f"Extract invoice data from:\n{document_text}")
    ],
    schema=schema,
    note="Invoice data extraction"
)

# result is a parsed dict matching your schema
print(f"Vendor: {result['vendor_name']}")
print(f"Total: ${result['total_amount']:.2f}")

An async version is also available:

result, metrics = await model.ainvoke_function(messages, schema)

Multimodal Input

Send images alongside text for visual document analysis:

import base64

# From file
with open("invoice_page1.jpg", "rb") as f:
    image_bytes = f.read()

messages = [
    ChatMessage(role="system", content="You are a document analysis expert."),
    ChatMessage(
        role="user",
        content=image_bytes,
        media_type="image/jpeg"
    ),
    ChatMessage(role="user", content="Extract the total amount from this invoice page."),
]

response, _, metrics = model.invoke(messages, note="Visual invoice extraction")

Images are automatically base64-encoded and sent in the OpenAI multimodal format.

PDF Documents

Set media_type="application/pdf" to send a PDF as a native document, letting the model read the whole file—text and layout—without rasterizing it to page images first.

# From file
with open("contract.pdf", "rb") as f:
    pdf_bytes = f.read()

messages = [
    ChatMessage(role="system", content="You are a contract analysis expert."),
    ChatMessage(
        role="user",
        content=pdf_bytes,
        media_type="application/pdf"
    ),
    ChatMessage(role="user", content="Summarize the termination clauses in this contract."),
]

response, _, metrics = model.invoke(messages, note="Contract PDF analysis")

The gateway sends PDFs as a native document content block rather than an image, so the model receives the document as-is.

PDF input is supported only on Bedrock Claude models (Claude 3.5 and later) and Google / Vertex AI Gemini models. Other providers—including Azure OpenAI—do not accept PDF documents; a PDF sent to an unsupported model is silently dropped and only the accompanying text messages reach the model. Use a PDF-capable model (or convert pages to images with media_type="image/...") when targeting other providers.

LLMUsageMetrics

Every invocation returns usage metrics for cost tracking and monitoring.

@dataclass
class LLMUsageMetrics:
    model_id: str       # Provider model ID
    input_tokens: int   # Input token count
    output_tokens: int  # Output token count
    duration_ms: int    # Total invocation duration

    def to_dict(self) -> dict:
        """Convert to dictionary for ProcessingStep metadata."""

Cost Tracking

Token usage is automatically recorded in the platform’s model interaction system. Use the note parameter to label interactions for billing visibility:

response, _, metrics = model.invoke(
    messages,
    note=f"Classification for task {task_id}"
)

# Metrics are also recorded by the platform for billing
print(f"Used {metrics.input_tokens + metrics.output_tokens} total tokens")

Capability Checking

Check what a model supports before calling specialized methods:

model = manager.get_model("gpt-4o")

if model.is_function_capable():
    result, metrics = model.invoke_function(messages, schema)

if model.is_thinking_mode_capable():
    response, thinking, metrics = model.invoke(messages, enable_thinking_mode=True)

# Model properties
print(f"Max tokens: {model.get_max_tokens()}")
print(f"Context window: {model.context_window}")
print(f"Model ID: {model.get_model_id()}")

Complete Module Example

Here’s how to use the LLM module in a Kodexa processing module:

import logging
from kodexa_document.llm import ModelManager, ChatMessage

logger = logging.getLogger(__name__)


def infer(document, pipeline_context=None, status_reporter=None):
    """Classify and summarize a document using the AI Gateway."""

    if status_reporter:
        status_reporter.update("Loading AI model", status_type="thinking")

    # Get an LLM model
    manager = ModelManager()
    model = manager.get_model("gpt-4o")

    if not model:
        logger.error("No LLM model available")
        return document

    # Extract document text
    root = document.content_node
    text = root.get_all_content(separator=" ") if root else ""

    if not text:
        logger.warning("Document has no text content")
        return document

    if status_reporter:
        status_reporter.update("Classifying document", status_type="analyzing")

    # Classify the document
    classification_schema = {
        "type": "object",
        "properties": {
            "document_type": {
                "type": "string",
                "enum": ["invoice", "contract", "receipt", "letter", "other"]
            },
            "confidence": {"type": "number"},
            "summary": {"type": "string"}
        },
        "required": ["document_type", "confidence", "summary"]
    }

    document_family = (
        pipeline_context.document_family
        if pipeline_context and pipeline_context.document_family
        else None
    )

    result, metrics = model.invoke_function(
        messages=[
            ChatMessage(
                role="system",
                content="Classify this document and provide a brief summary."
            ),
            ChatMessage(
                role="user",
                content=text[:5000]  # First 5000 chars
            ),
        ],
        schema=classification_schema,
        note=f"Document classification: {document_family.name if document_family else 'unknown'}"
    )

    # Store results in document metadata
    document.set_metadata("ai_document_type", result["document_type"])
    document.set_metadata("ai_confidence", result["confidence"])
    document.set_metadata("ai_summary", result["summary"])
    document.add_label(f"type-{result['document_type']}")

    logger.info(
        f"Classified as {result['document_type']} "
        f"(confidence: {result['confidence']:.2f}, "
        f"tokens: {metrics.input_tokens + metrics.output_tokens})"
    )

    return document

​Architecture

​Quick Start

​ModelManager

​Getting Models

​Environment Variables

​ChatMessage

​Fields

​Invoking Models

​Basic Invocation

​Async Invocation

​Streaming

​Thinking Mode

​Function Calling / Structured Output

​Multimodal Input

​PDF Documents

​LLMUsageMetrics

​Cost Tracking

​Capability Checking

​Complete Module Example

Architecture

Quick Start

ModelManager

Getting Models

Environment Variables

ChatMessage

Fields

Invoking Models

Basic Invocation

Async Invocation

Streaming

Thinking Mode

Function Calling / Structured Output

Multimodal Input

PDF Documents

LLMUsageMetrics

Cost Tracking

Capability Checking

Complete Module Example