Training a Module

In this section, we will further discuss how a Module Runtime is used to train a module that we have deployed. To allow a module to be trained, we need to add a few things to our module.yml file.

# A very simple first module that IS trainable

slug: my-module
version: 1.0.0
orgSlug: kodexa
type: store
storeType: MODEL
name: My Module
metadata:
  atomic: true
  trainable: true
  moduleRuntimeRef: kodexa/base-module-runtime
  type: module
  inferenceOptions:
    - name: my_option
      type: string
      default: "Hello World"
      description: "A simple option"
  trainingOptions:
    - name: my_training_option
      type: string
      default: "Hello World"
      description: "A simple option"
  contents:
    - module/*

The important change here is trainable: true. The module assistant allows the user to define a training store, can capture training options. If we want to allow our module to be trained, we also need to provide a train function.

import logging
logger = logging.getLogger(__name__)

def train(document, training_store, training_options, module_data):
    logger.info(f"Training option is {training_options['my_training_option']}")
    logger.info(f"Training store is {training_store} and I can store my data in {module_data}")
    return document

📘 Training Options It is important to note that in the inference we pass the training options as a parameters, however in the training we pass the training options as a dictionary.

The train function is called by the module runtime when the module is trained. The training_store is the store is the document store that the user has selected to use for training. The training_options are the options that the user has selected for training. The module_data is a directory that the module can use to store “trained module”. A module can store anything in the module_data directory. This contains the module_data directory will be stored in the module store as a Module Training. We place the responsibility of iterating over the documents in the training store and training module on the module. This provides flexibility in how the module wants to process all the training documents. Once completed, the module can save any “trained materials” in module_data.

Supporting Module Testing in the UI

One of the powerful features in Kodexa is support for the Robotic Assistant, allowing a user to label a document, and then test the module against that document. This allows the user to see how the module is performing against a document that they have labeled. To support this, we need to include an extra parameter in the train function. This parameter is additional_training_document. This will be an instance of a KodexaDocument.

import logging
logger = logging.getLogger(__name__)

def train(document, training_store, training_options, module_data, additional_training_document):
    logger.info(f"Training option is {training_options['my_training_option']}")
    logger.info(f"Training store is {training_store} and I can store my data in {module_data}")

    if isinstance(additional_training_document, KodexaDocument):
        logger.info(f"Additional training document is {additional_training_document}")
    return document

Adding this allows you to determine in the module how you wish to handle this additional training document.

📘 Additional Training Document The additional training document should always be a KodexaDocument. However, it is important to note that you need to confirm (using the path of the Kodexa Document) that you don’t pick up the same document from the training store.

We can see below an example of how you might write the logic to allow you to train:

for document_family in training_store.query(page_size=1000).content:
    logger.info(f'Using document {document_family.path}')

    if document_family.path == additional_training_document.metadata['path']:
        logger.info('Skipping additional training document')
        continue

    # Continue and train on Document
    pass

Using files you deployed with the Module

When you deploy a module, you can include files that will be deployed with the module. These files can be used by the module at runtime. To access these files, you can add the parameter module_base, this will be the folder where the module code has been deployed.

import logging
logger = logging.getLogger(__name__)

def train(document, training_store, training_options, module_data, additional_training_document, module_base):
    logger.info(f"Training option is {training_options['my_training_option']}")
    logger.info(f"Training store is {training_store} and I can store my data in {module_data}")

    logger.info(f"Module base is {module_base}")
    if isinstance(additional_training_document, KodexaDocument):
        logger.info(f"Additional training document is {additional_training_document}")
    return document

Introduction

Organization & Projects

Resources

Modules

Supporting Module Testing in the UI

Using files you deployed with the Module

Introduction

Organization & Projects

Resources

Modules

​Supporting Module Testing in the UI

​Using files you deployed with the Module

Supporting Module Testing in the UI

Using files you deployed with the Module