Search

Building Data Classes

Building Data Classes

One of the key aspects of working with a taxonomy and working with models is the concept that you can build Python Data Classes to allow you to take the output that has been bound to a document apply rules to it.

This is a powerful way in which you can work with the Kodexa LLM Data Labeling models.

If you look at an LLM Data Labeling model in a pipeline (in Manage Project / Data Flows) you will see on the options you can set the external data. This will store the data captured by the LLM in a special format that includes lineage to the document.

image

You can then use this structure if you want to apply rules or transform it.

To do this we can use the Kodexa CLI (see Installing the Kodexa CLIInstalling the Kodexa CLI).

If you go to your project, and find the data definition that you are using from your LLM (and you have Developer Tools enabled in the UI) you will see a copy and paste, use this to cut the JSON for your data definition (taxonomy).

image

Then paste the content into a file (we will call is taxonomy.json in our example).

image

If you have Kodexa CLI installed then you can open a terminal and run the command:

$ kodexa dataclasses taxonomy.json       

This will create a dataclasses.py file that will contain the structure of the objects.

image
💡

Always rename dataclasses.py since it might clash with the system package

You can now use the dataclasses to read the data that was set in your model in external data, for example:

def infer(document):
	
	bank_statement = document.get_external_data()['bank_statement'][0]

If you wish to change a value, always update the normalized_text not the value.

← Previous

Taxonomies

On this page