Tagging in Kodexa is a powerful feature that allows you to mark and annotate specific portions of content within your document nodes. Tags can be applied to entire nodes or specific portions of text, and can include additional metadata and relationships between tagged elements.
Tags can be grouped together using UUIDs to show they are related:
Copy
Ask AI
tag_uuid = str(uuid.uuid4())# Tag multiple related elements with the same UUIDdocument.content_node.tag('person_name', fixed_position=[0, 10], tag_uuid=tag_uuid)document.content_node.tag('person_age', fixed_position=[15, 17], tag_uuid=tag_uuid)
# Get all tags on a nodetags = node.get_tags()# Get specific tag valuesvalues = node.get_tag_values('address')# Get related tag valuesrelated_values = node.get_related_tag_values('person')
# Tag document type based on contentdocument.content_node.tag('document_type', value='invoice', data={ 'confidence': 0.98, 'classifier': 'invoice_classifier_v1'})
# Tag form fields with metadatadocument.content_node.tag('field', fixed_position=[100, 150], data={ 'field_name': 'total_amount', 'field_type': 'currency', 'required': True})
When working with tags, consider these common issues:
Position Errors: Ensure fixed positions are within content bounds
Regular Expression Matching: Test patterns thoroughly
Node Selection: Verify node existence before tagging
Content Accessibility: Check content availability before tagging
Copy
Ask AI
# Example of safe tagging with error handlingtry: if node.content: # Check if content exists if len(node.content) >= end_position: # Verify position node.tag('field', fixed_position=[start_position, end_position])except Exception as e: print(f"Tagging error: {str(e)}")
Use node_only=True when possible to reduce processing overhead
Batch related tags together using tag_uuid
Use specific selectors to limit the scope of tagging operations
Consider using tag instances for large groups of related nodes
Remember that tags are stored as features in the document’s persistence layer, so efficient tagging can improve overall document processing performance.