Overview
Tagging in Kodexa is a powerful feature that allows you to mark and annotate specific portions of content within your document nodes. Tags can be applied to entire nodes or specific portions of text, and can include additional metadata and relationships between tagged elements.Tag Structure
A tag in Kodexa consists of the following components:- Name: The identifier for the tag (e.g., ‘name’, ‘address’, ‘phone’)
- Value: The actual content being tagged
- Start/End Positions: Optional positions within the node’s content (if tagging specific text)
- UUID: Unique identifier for the tag instance
- Confidence: A score between 0 and 1 indicating tagging certainty
- Group UUID: Links related tags together
- Data: Additional JSON-serializable metadata
- Owner URI: Identifies the source that created the tag (e.g., a model reference)
Tagging Methods
1. Basic Node Tagging
The simplest form of tagging applies a tag to an entire node:2. Fixed Position Tagging
Tag specific portions of text using start and end positions:3. Regular Expression Tagging
Tag content that matches a specific pattern (Python):Advanced Tagging Features
Tag Groups
Tags can be grouped together using group UUIDs to show they are related:Tag Metadata
Additional data can be associated with tags (Python):Tag Confidence
You can specify confidence levels for tags:Tag Owner URI
Identify the source that created a tag:Working with Tagged Content
Retrieving Tags
Removing Tags
Tag Instances
Tag instances allow you to group multiple nodes under a single tag. This is useful when a piece of information spans multiple nodes:Finding Tagged Nodes
You can use selectors to find nodes with specific tags:Diagrams
Basic Tag Structure
Tag Relationships
Best Practices
- Use Meaningful Tag Names: Choose descriptive names that reflect the content being tagged.
- Group Related Tags: Use
group_uuid(Python) orgroupId(TypeScript) to group related pieces of information. - Include Confidence: When using automated tagging, include confidence scores.
- Add Metadata: Use the
dataparameter to store additional context about the tag. - Set Owner URI: When tagging from models or automated processes, set the
owner_urito track the tag source.
Common Patterns
Document Classification
Entity Extraction
Form Field Extraction
Tag Options Reference
Thetag() method in Python accepts these keyword arguments:
| Option | Type | Description |
|---|---|---|
content_re | str | Regular expression to match content |
fixed_position | list | [start, end] positions in content |
tag_uuid | str | UUID for the tag instance |
group_uuid | str | UUID to group related tags |
parent_group_uuid | str | Parent group UUID for hierarchical grouping |
confidence | float | Confidence score (0-1) |
value | str | Tagged value |
data | dict | Additional metadata |
cell_index | int | Cell index for table structures |
owner_uri | str | Source identifier for the tag |
tagWithOptions(name, options) with the TagOptions interface:
| Option | Type | Description |
|---|---|---|
start | number | Start position in content |
end | number | End position in content |
confidence | number | Confidence score (0-1) |
groupId | number | Group ID for related tags |
parentGroupId | number | Parent group ID |
cellIndex | number | Cell index for table structures |
Error Handling
When working with tags, consider these common issues:- Position Errors: Ensure fixed positions are within content bounds
- Regular Expression Matching: Test patterns thoroughly
- Node Selection: Verify node existence before tagging
- Content Accessibility: Check content availability before tagging
Performance Considerations
- Batch related tags together using
group_uuid - Use specific selectors to limit the scope of tagging operations
- Consider using tag instances for large groups of related nodes
- Use transactions when performing many tag operations together
