There are situations where we want to be able add validation rules to a document.It is common to want to add validation rules to data elements in a data definition (in the API these refer to taxons in a Taxonomy), however sometimes based on analysis of a document we might want to add additional validation rules that are dynamic. The validation rules will be carried through the extraction from the document and into the data objects.This ensures that decisions made by a model while processing show up later in the user experience for human-in-the-loop processing. To begin, let’s consider a simple document created using Kodexa’s Document class:
Copy
Ask AI
from kodexa import Documentdocument = Document.from_text("Hello, world!")
Next, we’ll define a taxonomy with a nested taxon structure:
In this example, we’ve created a taxonomy called “Testing” with a parent taxon “person” and a child taxon “name”. The “person” taxon is marked as a group, allowing it to contain child taxons. Now, let’s add a validation rule to the “person” taxon:
Copy
Ask AI
from kodexa.model.objects import TaxonValidation, DocumentTaxonValidationvalidation_rule = TaxonValidation( name="NameRequired", description="Name is required", rule_formula="ifnull(name, '') != ''")document_validation = DocumentTaxonValidation( taxonomy_ref="test/test-taxonomy:1.0.0", taxon_path="person", validation=validation_rule)
This validation rule, named “NameRequired”, ensures that the “name” field is not empty for any instance of the “person” taxon. To apply this validation to our document, we use the set_validations method:
Copy
Ask AI
document.set_validations([document_validation])
We can then retrieve and verify the applied validations:
Copy
Ask AI
validations = document.get_validations()assert len(validations) == 1assert validations[0].taxonomy_ref == "test/test-taxonomy:1.0.0"assert validations[0].taxon_path == "person"assert validations[0].validation.name == "NameRequired"assert validations[0].validation.description == "Name is required"assert validations[0].validation.rule == "name is not None"
These assertions confirm that our validation has been successfully added to the document and can be retrieved as expected.By implementing such validations on nested taxon structures, we can ensure that specific data requirements are met within our documents, even for complex hierarchical data. This approach is particularly useful when dealing with documents that require consistent nested data structures or when implementing data quality checks in document processing pipelines.The nested structure allows for more granular control over data validation. For example, we could add additional validations to the “name” taxon specifically, or create more complex rules that span multiple levels of the taxon hierarchy.In conclusion, Kodexa’s ability to add validations to nested taxons provides a powerful mechanism for maintaining data integrity in document processing workflows. By leveraging these features, developers can create more robust and reliable document management systems that can handle complex, hierarchical data structures with ease.