Early Access GitOps sync functionality and metadata structure may evolve. We recommend testing sync operations in non-production environments first.
Overview
The kdx sync command enables GitOps workflows for Kodexa platform metadata. Store your organization and project configurations in version control, review changes through pull requests, and promote configurations across environments with confidence.
Key Benefits
Version Control Track all metadata changes in Git with full history and audit trails
Code Review Review infrastructure changes through pull requests before deployment
Multi-Environment Promote tested configurations from dev → staging → production
Offline Validation Validate changes locally before pushing to remote environments
Repository Layout
Metadata is organized under a kodexa-metadata/ root directory with a hierarchical structure:
kodexa-metadata/
├── sync-config.yaml
├── README.md
└── {organization-slug}/
├── organization.yaml
├── taxonomies/
│ └── document-classification.yaml
├── stores/
│ ├── document-store.yaml
│ └── vector-store.yaml
├── knowledge-types/
│ └── conditional-knowledge.yaml
├── feature-types/
│ └── text-extraction.yaml
├── feature-instances/
│ ├── ocr-text-extraction-prod.yaml
│ └── fast-text-extraction-dev.yaml
├── models/
│ └── doc-classifier-v3.yaml
└── {project-slug}/
├── project.yaml
├── knowledge-items/
│ └── us-invoice-tax-calc.yaml
└── assistants/
└── invoice-assistant.yaml
Organization-Level Resources
These resources are shared across multiple projects:
Taxonomies - Classification hierarchies and metadata schemas
Stores - Document, vector, or graph storage configurations
Knowledge Types - Templates for rule definitions
Feature Types - Generic capability definitions
Feature Instances - Environment-specific deployments of features
Models - Machine learning models for classification or extraction
Project-Level Resources
Project-specific configurations:
Project metadata - References to shared resources and environment mappings
Knowledge Items - Project-specific knowledge implementations
Assistants - AI assistant configurations
Configuration File
sync-config.yaml
The configuration file defines which organizations and projects participate in GitOps:
metadata_dir : kodexa-metadata
source :
profile : prod
destination :
url : https://target.kodexa.ai
api_key : ${KODEXA_TARGET_API_KEY}
organizations :
- slug : acme
projects :
- finance-automation
- operations-bot
- slug : research
Configuration options:
metadata_dir - Directory containing metadata (default: kodexa-metadata)
source - Where to pull metadata from (profile or url + api_key)
destination - Where to push metadata to (profile or url + api_key)
organizations - List of organizations to sync
Omit projects to sync all projects in the organization
When no config file is provided, kdx sync searches parent directories for sync-config.yaml. If none is found, it defaults to kodexa-metadata/ in the working directory.
Sync Commands
Download metadata from a Kodexa environment into your local repository:
# Pull all configured organizations and projects
kdx sync pull
# Pull from a specific profile
kdx sync pull --from-profile prod
# Pull specific organization
kdx sync pull --organization acme
# Pull specific project
kdx sync pull --project acme/finance-automation
What pull does:
Resolves organization slugs to IDs
Downloads organization-level metadata
Validates feature types and instances
Downloads project-level metadata
Materializes YAML files in the metadata directory
Cleans up obsolete files (when syncing all projects)
Targeted pulls (specific org/project) leave other resources untouched. Full pulls clean up stale files.
Upload local metadata changes to a Kodexa environment:
# Push to configured destination
kdx sync push
# Push to a specific profile
kdx sync push --to-profile staging
# Dry run (validate without pushing)
kdx sync push --dry-run
# Push specific organization
kdx sync push --organization acme
What push does:
Reads metadata files from disk
Validates structure and references offline
Resolves organization and project slugs to IDs
Pushes organization metadata first
Pushes project metadata second
Validation
Push performs comprehensive validation before contacting the API:
All feature types and instances include slugs
Feature instances reference existing feature types
Project features entries map to valid feature types
Project feature_instances entries map to known instances
Environment flags are boolean values
No duplicate slugs within resource types
Validation failures abort the command with actionable error messages. Fix the YAML and re-run.
Common Workflows
Initial Setup
Set up GitOps for an existing environment:
# Initialize a new Git repository
mkdir kodexa-infra && cd kodexa-infra
git init
# Create sync configuration
cat > sync-config.yaml << EOF
metadata_dir: kodexa-metadata
source:
profile: prod
organizations:
- slug: my-org
EOF
# Pull existing metadata
kdx sync pull --from-profile prod
# Commit to version control
git add .
git commit -m "Initial metadata snapshot from production"
git remote add origin git@github.com:company/kodexa-infra.git
git push -u origin main
Development Workflow
Make and test changes locally:
# Create feature branch
git checkout -b feature/add-new-taxonomy
# Pull latest metadata
kdx sync pull --from-profile dev
# Make changes
vim kodexa-metadata/my-org/taxonomies/new-classification.yaml
# Validate changes locally
kdx sync push --dry-run
# If validation passes, commit
git add .
git commit -m "Add new document classification taxonomy"
git push origin feature/add-new-taxonomy
Promote changes across environments:
# Test in development
kdx sync push --to-profile dev
# After testing, promote to staging
kdx sync push --to-profile staging
# Verify in staging
kdx sync pull --from-profile staging
git diff # Should show no changes
# Promote to production
kdx sync push --to-profile prod
Pull Request Review
# Reviewer checks out branch
git checkout feature/add-new-taxonomy
# Validate locally
kdx sync push --dry-run
# Review changes
git diff main...feature/add-new-taxonomy
# Approve and merge if valid
Rollback Changes
# Revert to previous version
git revert HEAD
# Or reset to specific commit
git reset --hard abc1234
# Push reverted state back to environment
kdx sync push --to-profile prod
Advanced Usage
Environment-Specific Configurations
Use branches for environment-specific configurations:
# Main branch = production
git checkout main
kdx sync push --to-profile prod
# Staging branch
git checkout staging
# ... make staging-specific changes ...
kdx sync push --to-profile staging
# Development branch
git checkout dev
# ... make dev-specific changes ...
kdx sync push --to-profile dev
CI/CD Integration
Automate sync in GitHub Actions:
name : Sync to Production
on :
push :
branches : [ main ]
jobs :
sync :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v3
- name : Install KDX CLI
run : |
curl -L https://github.com/kodexa-ai/kdx-cli/releases/latest/download/kdx_linux_amd64.tar.gz | tar xz
sudo mv kdx /usr/local/bin/
- name : Validate metadata
run : kdx sync push --dry-run
env :
KODEXA_TARGET_API_KEY : ${{ secrets.PROD_API_KEY }}
- name : Push to production
run : kdx sync push --to-profile prod
env :
KODEXA_TARGET_API_KEY : ${{ secrets.PROD_API_KEY }}
Multi-Organization Management
Manage multiple organizations in one repository:
# sync-config.yaml
organizations :
- slug : engineering
projects :
- ml-platform
- data-pipeline
- slug : finance
projects :
- invoice-automation
- slug : operations
# Sync all organizations
kdx sync pull
# Sync specific organization
kdx sync pull --organization engineering
Partial Syncs
Sync only specific parts of your infrastructure:
# Pull only finance organization
kdx sync pull --organization finance
# Push only ml-platform project
kdx sync push --project engineering/ml-platform
# Pull multiple specific projects
kdx sync pull \
--project engineering/ml-platform \
--project engineering/data-pipeline
Validation Errors
Common validation errors and resolutions:
Error Cause Resolution
feature instance references unknown feature typeFeature instance YAML references a feature type that doesn’t exist Add the missing feature type under feature-types/ duplicate slugTwo resources have the same slug Ensure all slugs are unique within their resource type missing required field: slugResource definition missing slug field Add slug: field to the YAML environments mapping invalidEnvironment flags are not boolean Use true/false for environment mappings project references undefined featureProject lists feature not in feature types Add feature type or remove from project
Example: Fixing Feature Instance Error
Error:
Error: feature instance "ocr-extraction-prod" references unknown feature type "text-extraction"
Resolution:
# Add the missing feature type
cat > kodexa-metadata/my-org/feature-types/text-extraction.yaml << EOF
slug: text-extraction
name: Text Extraction
description: Extract text from documents
EOF
# Validate fix
kdx sync push --dry-run
Troubleshooting
Create the directory or specify its location:
# Create with pull
kdx sync pull
# Or specify location
kdx sync pull --metadata-dir /path/to/metadata
“Profile not found”
Ensure the profile exists:
kdx config list-profiles
# Create if missing
kdx config set-profile prod --url https://platform.kodexa.ai --api-key your-key
“API authentication failed”
Check your credentials:
# Verify profile
kdx config current-profile
# Test connection
kdx get workspaces --profile prod
# Update API key if needed
kdx config set-profile prod --url https://platform.kodexa.ai --api-key new-key
Changes not syncing
Clear cache and retry:
rm -rf ~/.kodexa/cache/
kdx api-resources --refresh
kdx sync push --dry-run
Merge conflicts
Resolve conflicts manually:
# Pull latest from environment
kdx sync pull --from-profile prod
# Resolve conflicts in YAML files
vim kodexa-metadata/my-org/taxonomies/conflicted-file.yaml
# Test resolution
kdx sync push --dry-run
# Commit resolution
git add .
git commit -m "Resolve merge conflict in taxonomy"
Best Practices
1. Use Dry Run First
Always validate before pushing:
kdx sync push --dry-run
# Review output
kdx sync push
2. Commit After Each Pull
Track what changed in the remote environment:
kdx sync pull --from-profile prod
git add .
git commit -m "Sync from prod: $( date )"
3. Small, Focused Changes
Make incremental changes rather than large batches:
# Good: one taxonomy at a time
git commit -m "Add invoice classification taxonomy"
# Avoid: many unrelated changes
git commit -m "Update all taxonomies, stores, and projects"
4. Test in Lower Environments
Test changes in dev/staging before production:
# Test in dev
kdx sync push --to-profile dev
# ... test ...
# Promote to staging
kdx sync push --to-profile staging
# ... test ...
# Finally production
kdx sync push --to-profile prod
5. Document Dependencies
Add comments to YAML files:
# project.yaml
slug : invoice-automation
name : Invoice Automation
# Depends on:
# - taxonomy: invoice-fields
# - store: invoice-documents
# - model: invoice-classifier-v2
6. Use Meaningful Commit Messages
# Good
git commit -m "Add OCR feature instance for production environment"
# Better
git commit -m "feat: add OCR feature instance for production
- Configured with high-accuracy model
- Enabled for invoice-automation project
- Memory limit set to 4GB"
Next Steps