Taxonomies are used to define the structure of the data that is extracted from a document. They are used to define the structure of the data that is extracted from a document.
A taxonomy is made up of several nodes which we call Taxons (a made up word), each of which represents a piece of data or a grouping of nodes. This hierarchy of nodes is used to define the structure of the data that is extracted from a document.
Note that in the UI you will see taxonomies referred to as Data Definitions, and Taxon’s will be called Data Elements.
Taxonomy Types
Usually, a user will think of a taxonomy in terms of the data they want to extract from a document. However, there are many types of taxonomies that we use in Kodexa. This is because we see the labeling process as not simply being about identifying the data that you would like to extract, but also about labeling concepts, markers, or other information that can be used to help the extraction process.
While these labels serve different purposes, they are all defined in a taxonomy. The different types of taxonomies are:
- Content Taxonomy — This is a taxonomy that is used to define the structure of the data that is extracted from a document. Usually, this represents the structure that is understood by the business or in the use-case
- Processing Taxonomy — This is a taxonomy that is used in processing, most commonly these taxonomies are provided by assistants. If you add an assistant to a project, then you will usually be able to label using this taxonomy.
- Model Taxonomy — This is a taxonomy that has been provided by a model that is being used to train or run a model. These taxonomies become available to the label either when you add a model to the project or when you refer to a model through an assistant.
← Previous
Next →
On this page