Inference Model

Introduction

The cookie-cutter-kodexa-infer-model is a project template that helps you quickly set up a new Kodexa inference model project with the right structure and dependencies. This template creates a model that can be deployed to a Kodexa platform for document processing and data extraction.

This documentation will guide you through:

Installing the prerequisites
Creating a new project from the template
Understanding the project structure
Setting up your development environment in VS Code
Example usage scenarios

Prerequisites

Before using this cookiecutter template, ensure you have the following installed:

Python 3.11+: The template is designed to work with Python 3.11 or higher
Cookiecutter: The templating tool that will create your project
Git: For version control
Visual Studio Code: For development (recommended)
Poetry: For dependency management (recommended)
Kodexa CLI: For deploying models to Kodexa platform

Installing Required Tools

You can install the required tools using pip:

# Install cookiecutter
pip install cookiecutter

# Install poetry
pip install poetry

# Install Kodexa CLI Stable Version
pip install "kodexa-cli>=7.4,<7.5" --upgrade

Creating a New Project

Once you have the prerequisites installed, you can create a new project from the template by running:

cookiecutter https://github.com/kodexa-labs/cookie-cutter-kodexa-infer-model

You’ll be prompted to provide several configuration values defined in the cookiecutter.json file:

project_name [My Kodexa Infer Model]: Document Classifier
project_slug [document-classifier]: 
pkg_name [document_classifier]: 
project_short_description [Skeleton project created by Cookiecutter Kodexa Event Model]: A model that classifies documents by type
full_name [Kodexa Support]: Jane Smith
email [support@kodexa.com]: jane.smith@example.com
github_username [kodexa-ai]: janesmith
version [0.1.0]: 
org_slug [my-org]: janes-org

These values will be used to customize your project. Here’s what each prompt means:

project_name: The human-readable name of your project
project_slug: The slug for your model (automatically derived from project_name)
pkg_name: The Python package name (automatically derived from project_name)
project_short_description: A short description of what your model does
full_name: Your name or your organization’s name
email: Contact email for the project
github_username: Your GitHub username or organization
version: The initial version of your model
org_slug: The Kodexa organization slug where your model will be hosted

Project Structure

After running the cookiecutter command, a new directory with your project_slug name will be created with the following structure:

document-classifier/                  # Root directory (project_slug)
├── document_classifier/              # Python package (pkg_name)
│   ├── __init__.py                   # Package initialization
│   └── model.py                      # Main model implementation
├── .editorconfig                     # Editor configuration
├── .gitignore                        # Git ignore file
├── makefile                          # Makefile with common tasks
├── model.yml                         # Kodexa model deployment configuration
├── pyproject.toml                    # Poetry project configuration
└── README.md                         # Project readme

Key Files

`model.py`

This is the main file where you’ll implement your inference model. It comes with a sample implementation that:

Receives a Kodexa Document as input
Has access to the project, pipeline context, and assistant
Can add labels to the document
Can access the document’s source bytes
Returns the processed document

`model.yml`

This file defines how your model will be deployed to the Kodexa platform, including:

Model metadata
Runtime configuration
Access settings
Content to include in the deployment package

`pyproject.toml`

This file contains your project’s metadata and dependencies managed by Poetry, including:

Project information
Python version requirements
Dependencies (including Kodexa)
Development tools configuration (black, isort, flake8, mypy)

`makefile`

The makefile includes several useful commands:

make format: Format code using isort and black
make lint: Lint code using flake8 and mypy
make test: Run formatting, linting, and unit tests
make deploy: Deploy the model to Kodexa platform
make undeploy: Undeploy the model from Kodexa platform

Setting Up in Visual Studio Code

To set up your new project in Visual Studio Code:

Open VS Code
Choose “File > Open Folder” and select your newly created project directory
Open a terminal in VS Code (Terminal > New Terminal)
Install dependencies using Poetry:
```
poetry install
```
Activate the Poetry virtual environment:
```
poetry shell
```

Recommended VS Code Extensions

For the best development experience, install these VS Code extensions:

Python: The official Python extension
Pylance: Enhanced language support for Python
Python Test Explorer: For running tests
YAML: For editing YAML files like model.yml
Docker: For containerization if needed
Markdown All in One: For editing documentation

Implementing Your Model

The template creates a basic model implementation in pkg_name/model.py. The main entry point is the infer function:

def infer(document: Document, project: ProjectEndpoint, pipeline_context: PipelineContext, assistant: Assistant):
    # Your model implementation here
    document.add_label("my_first_model")
    return document

You should modify this function to implement your specific document processing logic. The function receives:

document: The Kodexa Document to process
project: The Kodexa project endpoint
pipeline_context: Context information about the current pipeline
assistant: The Kodexa assistant for interaction with large language models

Example: Implementing a Document Classifier

Here’s an example of how you might implement a simple document classifier:

1. Modify the model.py file

def infer(document: Document, project: ProjectEndpoint, pipeline_context: PipelineContext, assistant: Assistant):
    """
    Classify a document based on its content
    """
    logger.info(f"Processing document: {document.uuid}")
    
    # Get document text
    all_text = document.content_node.get_all_content()
    
    # Simple classification rules
    document_type = "unknown"
    
    if "invoice" in all_text.lower() and ("total" in all_text.lower() or "amount due" in all_text.lower()):
        document_type = "invoice"
    elif "agreement" in all_text.lower() and "parties" in all_text.lower():
        document_type = "contract"
    elif "resume" in all_text.lower() or "curriculum vitae" in all_text.lower():
        document_type = "resume"
    
    # Add classification as a label
    document.add_label("document_type", document_type)
    logger.info(f"Classified document as: {document_type}")
    
    return document

2. Deploy your model

Once you’re satisfied with your model, you can deploy it to the Kodexa platform:

make deploy

This will use the Kodexa CLI to deploy your model according to the configuration in model.yml.

Working with the Kodexa Platform

Deploying Your Model

The template includes commands to deploy and undeploy your model:

# Deploy the model
make deploy

# Undeploy the model
make undeploy

These commands use the Kodexa CLI and the configuration in model.yml to manage your model on the Kodexa platform.

Using Your Model in Data Flow

You can now go to Studio and add your model to an Assistant in your project.

Troubleshooting

Common Issues

”Cannot find module” errors

If you encounter module import errors, make sure:

Your Poetry environment is activated (poetry shell)
The package is installed in development mode (poetry install)
Your import statements use the correct package name

Deployment failures

If your model fails to deploy:

Check if your Kodexa CLI is configured correctly
Verify that the org_slug in model.yml is correct
Look for syntax errors in your Python code
Check if your model.yml is properly formatted

Model not working as expected

If your deployed model doesn’t work as expected:

Add more logging in your infer function to understand what’s happening
Check if your model is receiving the correct document format
Verify that you’re returning the document object from your infer function

Introduction

Kodexa CLI

Deployment

Notebooks

Cookie Cutters

Introduction

Prerequisites

Installing Required Tools

Creating a New Project

Project Structure

Key Files

`model.py`

`model.yml`

`pyproject.toml`

`makefile`

Setting Up in Visual Studio Code

Recommended VS Code Extensions

Implementing Your Model

Example: Implementing a Document Classifier

1. Modify the model.py file

2. Deploy your model

Working with the Kodexa Platform

Deploying Your Model

Using Your Model in Data Flow

Troubleshooting

Common Issues

”Cannot find module” errors

Deployment failures

Model not working as expected

Introduction

Kodexa CLI

Deployment

Notebooks

Cookie Cutters

​Introduction

​Prerequisites

​Installing Required Tools

​Creating a New Project

​Project Structure

​Key Files

​model.py

​model.yml

​pyproject.toml

​makefile

​Setting Up in Visual Studio Code

​Recommended VS Code Extensions

​Implementing Your Model

​Example: Implementing a Document Classifier

​1. Modify the model.py file

​2. Deploy your model

​Working with the Kodexa Platform

​Deploying Your Model

​Using Your Model in Data Flow

​Troubleshooting

​Common Issues

​”Cannot find module” errors

​Deployment failures

​Model not working as expected

Introduction

Prerequisites

Installing Required Tools

Creating a New Project

Project Structure

Key Files

`model.py`

`model.yml`

`pyproject.toml`

`makefile`

Setting Up in Visual Studio Code

Recommended VS Code Extensions

Implementing Your Model

Example: Implementing a Document Classifier

1. Modify the model.py file

2. Deploy your model

Working with the Kodexa Platform

Deploying Your Model

Using Your Model in Data Flow

Troubleshooting

Common Issues

”Cannot find module” errors

Deployment failures

Model not working as expected