Search

Document-Based Validation

Document-Based Validation

There are situations where we want to be able add validation rules to a document.

It is common to want to add validation rules to data elements in a data definition (in the API these refer to taxons in a Taxonomy), however sometimes based on analysis of a document we might want to add additional validation rules that are dynamic. The validation rules will be carried through the extraction from the document and into the data objects.

This ensures that decisions made by a model while processing show up later in the user experience for human-in-the-loop processing. To begin, let's consider a simple document created using Kodexa's Document class:

from kodexa import Document

document = Document.from_text("Hello, world!")

Next, we'll define a taxonomy with a nested taxon structure:

from kodexa.model.objects import Taxonomy, Taxon

taxonomy = Taxonomy(type="taxonomy", name="Testing", slug="testing", version="1.0.0")
person = Taxon(path="person", name="person", group=True)
person_name = Taxon(path="person/name", name="name")
person.children = [person_name]
taxonomy.taxons = [person]

In this example, we've created a taxonomy called "Testing" with a parent taxon "person" and a child taxon "name". The "person" taxon is marked as a group, allowing it to contain child taxons. Now, let's add a validation rule to the "person" taxon:

from kodexa.model.objects import TaxonValidation, DocumentTaxonValidation

validation_rule = TaxonValidation(
    name="NameRequired",
    description="Name is required",
    rule_formula="ifnull(name, '') != ''"
)

document_validation = DocumentTaxonValidation(
    taxonomy_ref="test/test-taxonomy:1.0.0",
    taxon_path="person",
    validation=validation_rule
)

This validation rule, named "NameRequired", ensures that the "name" field is not empty for any instance of the "person" taxon. To apply this validation to our document, we use the set_validations method:

document.set_validations([document_validation])

We can then retrieve and verify the applied validations:

validations = document.get_validations()

assert len(validations) == 1
assert validations[0].taxonomy_ref == "test/test-taxonomy:1.0.0"
assert validations[0].taxon_path == "person"
assert validations[0].validation.name == "NameRequired"
assert validations[0].validation.description == "Name is required"
assert validations[0].validation.rule == "name is not None"

These assertions confirm that our validation has been successfully added to the document and can be retrieved as expected.

By implementing such validations on nested taxon structures, we can ensure that specific data requirements are met within our documents, even for complex hierarchical data. This approach is particularly useful when dealing with documents that require consistent nested data structures or when implementing data quality checks in document processing pipelines.

The nested structure allows for more granular control over data validation. For example, we could add additional validations to the "name" taxon specifically, or create more complex rules that span multiple levels of the taxon hierarchy.

In conclusion, Kodexa's ability to add validations to nested taxons provides a powerful mechanism for maintaining data integrity in document processing workflows. By leveraging these features, developers can create more robust and reliable document management systems that can handle complex, hierarchical data structures with ease.

Next →

Storage

On this page