Skip to main contentModules are a key component in Kodexa. They are a way to bring intelligent processing to documents, supporting both pluggable parsing, transformation and labeling of documents while also being trainable.
While many products have a concept of a module, we think of modules in Kodexa as being a bit different. We think of modules as not just the code and processing, but how the user will experience training the module. This means that when we are looking at developing a module we are looking at the user experience, the training data and the module code.
Module Training Workflow
While Kodexa can support modules running in several ways, we have a standard workflow that we use to train modules.
This workflow is designed to allow you to build a module that can be trained and deployed in several different ways.
Anatomy of a Module
A module is made up of a few key concepts:
- Module Code — The code that is used to parse, transform and label documents
- Training Options — The options that are used to train the module
- Inference Options — The options that are used to run the module
- Module Taxonomy — The taxonomy that is used to define the structure of labels that the module uses to “guide” the user/training process
- Additional Taxonomy Options — Additional options that the module can add to Taxonomies that we will be using for extraction
These different parts of the module allow you to build and deploy modules that not only allow for flexible training and inference, but also provide rich ways in which you can capture knowledge about the documents from the user.
Trainable vs Non-Trainable Modules
Modules in Kodexa can either be trainable, meaning the user can label and train the module themselves using the UI or they can be “pre-trained” meaning that the module is already trained and the user can use it to label documents. An example of a “pre-trained” module would be something like the Azure Invoice Form Recognizer module. This module is already trained, and the user can use it to label invoices, but the user cannot train the module themselves.
There are a few key things to remember about how the user will interact with a module, depending on whether it is trainable or not. If a user wants to use a trainable module, then the module needs to be added to the project. Whereas, if the user would like to use a non-trainable module, then the module needs to be used in an assistant but does not need to be added to the project.