Skip to main content
Early AccessGitOps sync functionality and metadata structure may evolve. We recommend testing sync operations in non-production environments first.

Overview

The kdx sync command enables GitOps workflows for Kodexa platform metadata. Store your organization and project configurations in version control, review changes through pull requests, and promote configurations across environments with confidence.

Key Benefits

Version Control

Track all metadata changes in Git with full history and audit trails

Code Review

Review infrastructure changes through pull requests before deployment

Multi-Environment

Promote tested configurations from dev → staging → production

Offline Validation

Validate changes locally before pushing to remote environments

Repository Layout

Metadata is organized under a kodexa-metadata/ root directory with a hierarchical structure:
kodexa-metadata/
├── sync-config.yaml
├── README.md
└── {organization-slug}/
    ├── organization.yaml
    ├── taxonomies/
    │   └── document-classification.yaml
    ├── stores/
    │   ├── document-store.yaml
    │   └── vector-store.yaml
    ├── knowledge-types/
    │   └── conditional-knowledge.yaml
    ├── feature-types/
    │   └── text-extraction.yaml
    ├── feature-instances/
    │   ├── ocr-text-extraction-prod.yaml
    │   └── fast-text-extraction-dev.yaml
    ├── models/
    │   └── doc-classifier-v3.yaml
    └── {project-slug}/
        ├── project.yaml
        ├── knowledge-items/
        │   └── us-invoice-tax-calc.yaml
        └── assistants/
            └── invoice-assistant.yaml

Organization-Level Resources

These resources are shared across multiple projects:
  • Taxonomies - Classification hierarchies and metadata schemas
  • Stores - Document, vector, or graph storage configurations
  • Knowledge Types - Templates for rule definitions
  • Feature Types - Generic capability definitions
  • Feature Instances - Environment-specific deployments of features
  • Models - Machine learning models for classification or extraction

Project-Level Resources

Project-specific configurations:
  • Project metadata - References to shared resources and environment mappings
  • Knowledge Items - Project-specific knowledge implementations
  • Assistants - AI assistant configurations

Configuration File

sync-config.yaml

The configuration file defines which organizations and projects participate in GitOps:
metadata_dir: kodexa-metadata

source:
  profile: prod

destination:
  url: https://target.kodexa.ai
  api_key: ${KODEXA_TARGET_API_KEY}

organizations:
  - slug: acme
    projects:
      - finance-automation
      - operations-bot
  - slug: research
Configuration options:
  • metadata_dir - Directory containing metadata (default: kodexa-metadata)
  • source - Where to pull metadata from (profile or url + api_key)
  • destination - Where to push metadata to (profile or url + api_key)
  • organizations - List of organizations to sync
    • Omit projects to sync all projects in the organization
When no config file is provided, kdx sync searches parent directories for sync-config.yaml. If none is found, it defaults to kodexa-metadata/ in the working directory.

Sync Commands

Pull Metadata

Download metadata from a Kodexa environment into your local repository:
# Pull all configured organizations and projects
kdx sync pull

# Pull from a specific profile
kdx sync pull --from-profile prod

# Pull specific organization
kdx sync pull --organization acme

# Pull specific project
kdx sync pull --project acme/finance-automation
What pull does:
  1. Resolves organization slugs to IDs
  2. Downloads organization-level metadata
  3. Validates feature types and instances
  4. Downloads project-level metadata
  5. Materializes YAML files in the metadata directory
  6. Cleans up obsolete files (when syncing all projects)
Targeted pulls (specific org/project) leave other resources untouched. Full pulls clean up stale files.

Push Metadata

Upload local metadata changes to a Kodexa environment:
# Push to configured destination
kdx sync push

# Push to a specific profile
kdx sync push --to-profile staging

# Dry run (validate without pushing)
kdx sync push --dry-run

# Push specific organization
kdx sync push --organization acme
What push does:
  1. Reads metadata files from disk
  2. Validates structure and references offline
  3. Resolves organization and project slugs to IDs
  4. Pushes organization metadata first
  5. Pushes project metadata second

Validation

Push performs comprehensive validation before contacting the API:
  • All feature types and instances include slugs
  • Feature instances reference existing feature types
  • Project features entries map to valid feature types
  • Project feature_instances entries map to known instances
  • Environment flags are boolean values
  • No duplicate slugs within resource types
Validation failures abort the command with actionable error messages. Fix the YAML and re-run.

Common Workflows

Initial Setup

Set up GitOps for an existing environment:
# Initialize a new Git repository
mkdir kodexa-infra && cd kodexa-infra
git init

# Create sync configuration
cat > sync-config.yaml << EOF
metadata_dir: kodexa-metadata
source:
  profile: prod
organizations:
  - slug: my-org
EOF

# Pull existing metadata
kdx sync pull --from-profile prod

# Commit to version control
git add .
git commit -m "Initial metadata snapshot from production"
git remote add origin git@github.com:company/kodexa-infra.git
git push -u origin main

Development Workflow

Make and test changes locally:
# Create feature branch
git checkout -b feature/add-new-taxonomy

# Pull latest metadata
kdx sync pull --from-profile dev

# Make changes
vim kodexa-metadata/my-org/taxonomies/new-classification.yaml

# Validate changes locally
kdx sync push --dry-run

# If validation passes, commit
git add .
git commit -m "Add new document classification taxonomy"
git push origin feature/add-new-taxonomy

Promotion Workflow

Promote changes across environments:
# Test in development
kdx sync push --to-profile dev

# After testing, promote to staging
kdx sync push --to-profile staging

# Verify in staging
kdx sync pull --from-profile staging
git diff  # Should show no changes

# Promote to production
kdx sync push --to-profile prod

Pull Request Review

# Reviewer checks out branch
git checkout feature/add-new-taxonomy

# Validate locally
kdx sync push --dry-run

# Review changes
git diff main...feature/add-new-taxonomy

# Approve and merge if valid

Rollback Changes

# Revert to previous version
git revert HEAD

# Or reset to specific commit
git reset --hard abc1234

# Push reverted state back to environment
kdx sync push --to-profile prod

Advanced Usage

Environment-Specific Configurations

Use branches for environment-specific configurations:
# Main branch = production
git checkout main
kdx sync push --to-profile prod

# Staging branch
git checkout staging
# ... make staging-specific changes ...
kdx sync push --to-profile staging

# Development branch
git checkout dev
# ... make dev-specific changes ...
kdx sync push --to-profile dev

CI/CD Integration

Automate sync in GitHub Actions:
name: Sync to Production

on:
  push:
    branches: [main]

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install KDX CLI
        run: |
          curl -L https://github.com/kodexa-ai/kdx-cli/releases/latest/download/kdx_linux_amd64.tar.gz | tar xz
          sudo mv kdx /usr/local/bin/
      
      - name: Validate metadata
        run: kdx sync push --dry-run
        env:
          KODEXA_TARGET_API_KEY: ${{ secrets.PROD_API_KEY }}
      
      - name: Push to production
        run: kdx sync push --to-profile prod
        env:
          KODEXA_TARGET_API_KEY: ${{ secrets.PROD_API_KEY }}

Multi-Organization Management

Manage multiple organizations in one repository:
# sync-config.yaml
organizations:
  - slug: engineering
    projects:
      - ml-platform
      - data-pipeline
  - slug: finance
    projects:
      - invoice-automation
  - slug: operations
# Sync all organizations
kdx sync pull

# Sync specific organization
kdx sync pull --organization engineering

Partial Syncs

Sync only specific parts of your infrastructure:
# Pull only finance organization
kdx sync pull --organization finance

# Push only ml-platform project
kdx sync push --project engineering/ml-platform

# Pull multiple specific projects
kdx sync pull \
  --project engineering/ml-platform \
  --project engineering/data-pipeline

Validation Errors

Common validation errors and resolutions:
ErrorCauseResolution
feature instance references unknown feature typeFeature instance YAML references a feature type that doesn’t existAdd the missing feature type under feature-types/
duplicate slugTwo resources have the same slugEnsure all slugs are unique within their resource type
missing required field: slugResource definition missing slug fieldAdd slug: field to the YAML
environments mapping invalidEnvironment flags are not booleanUse true/false for environment mappings
project references undefined featureProject lists feature not in feature typesAdd feature type or remove from project

Example: Fixing Feature Instance Error

Error:
Error: feature instance "ocr-extraction-prod" references unknown feature type "text-extraction"
Resolution:
# Add the missing feature type
cat > kodexa-metadata/my-org/feature-types/text-extraction.yaml << EOF
slug: text-extraction
name: Text Extraction
description: Extract text from documents
EOF

# Validate fix
kdx sync push --dry-run

Troubleshooting

”Metadata directory not found”

Create the directory or specify its location:
# Create with pull
kdx sync pull

# Or specify location
kdx sync pull --metadata-dir /path/to/metadata

“Profile not found”

Ensure the profile exists:
kdx config list-profiles

# Create if missing
kdx config set-profile prod --url https://platform.kodexa.ai --api-key your-key

“API authentication failed”

Check your credentials:
# Verify profile
kdx config current-profile

# Test connection
kdx get workspaces --profile prod

# Update API key if needed
kdx config set-profile prod --url https://platform.kodexa.ai --api-key new-key

Changes not syncing

Clear cache and retry:
rm -rf ~/.kodexa/cache/
kdx api-resources --refresh
kdx sync push --dry-run

Merge conflicts

Resolve conflicts manually:
# Pull latest from environment
kdx sync pull --from-profile prod

# Resolve conflicts in YAML files
vim kodexa-metadata/my-org/taxonomies/conflicted-file.yaml

# Test resolution
kdx sync push --dry-run

# Commit resolution
git add .
git commit -m "Resolve merge conflict in taxonomy"

Best Practices

1. Use Dry Run First

Always validate before pushing:
kdx sync push --dry-run
# Review output
kdx sync push

2. Commit After Each Pull

Track what changed in the remote environment:
kdx sync pull --from-profile prod
git add .
git commit -m "Sync from prod: $(date)"

3. Small, Focused Changes

Make incremental changes rather than large batches:
# Good: one taxonomy at a time
git commit -m "Add invoice classification taxonomy"

# Avoid: many unrelated changes
git commit -m "Update all taxonomies, stores, and projects"

4. Test in Lower Environments

Test changes in dev/staging before production:
# Test in dev
kdx sync push --to-profile dev
# ... test ...

# Promote to staging
kdx sync push --to-profile staging
# ... test ...

# Finally production
kdx sync push --to-profile prod

5. Document Dependencies

Add comments to YAML files:
# project.yaml
slug: invoice-automation
name: Invoice Automation
# Depends on:
#   - taxonomy: invoice-fields
#   - store: invoice-documents
#   - model: invoice-classifier-v2

6. Use Meaningful Commit Messages

# Good
git commit -m "Add OCR feature instance for production environment"

# Better
git commit -m "feat: add OCR feature instance for production

- Configured with high-accuracy model
- Enabled for invoice-automation project
- Memory limit set to 4GB"

Next Steps