Skip to main content

Customizing Extraction by Document Type

This guide walks through a complete example: using different extraction prompts for SEC 10K vs 10Q filings. The same pattern applies to any scenario where you need different processing for different document types.

The Goal

When processing SEC filings:
  • 10K documents should use prompts optimized for annual reports
  • 10Q documents should use prompts optimized for quarterly reports

What You’ll Create

Step 1: Create the Feature Type

First, define a Feature Type to classify documents by their SEC filing type.

Via YAML

# kodexa-resources/knowledge-feature-types/sec-filing-type.yaml
slug: sec-filing-type
name: SEC Filing Type
description: Classification of SEC filing documents (10K, 10Q, 8K, etc.)
active: true

options:
  - name: filingType
    type: string
    label: Filing Type
    description: The SEC filing type code
    required: true

extendedOptions:
  - name: filingName
    type: string
    label: Filing Name
    description: Human-readable filing name
  - name: frequency
    type: string
    label: Reporting Frequency
    description: Annual, Quarterly, etc.

Via UI

  1. Go to Knowledge > Feature Types
  2. Click Create Feature Type
  3. Enter:
    • Slug: sec-filing-type
    • Name: SEC Filing Type
    • Description: Classification of SEC filing documents
  4. Add option: filingType (string, required)
  5. Add extended option: filingName (string)
  6. Save

Step 2: Create Features

Create features for each filing type you want to handle.

Via YAML

# These are typically created via API or UI during document processing
# Here's the structure for reference:

# 10K Feature
featureType: sec-filing-type
properties:
  filingType: "10K"
extendedProperties:
  filingName: "Annual Report"
  frequency: "Annual"
active: true

# 10Q Feature
featureType: sec-filing-type
properties:
  filingType: "10Q"
extendedProperties:
  filingName: "Quarterly Report"
  frequency: "Quarterly"
active: true

Via UI

  1. Go to Knowledge > Features
  2. Click Create Feature
  3. Select Feature Type: SEC Filing Type
  4. Enter properties:
    • Filing Type: 10K
    • Filing Name: Annual Report
  5. Save
  6. Repeat for 10Q

Step 3: Create the Item Type

Define an Item Type for customizing extraction prompts.

Via YAML

# kodexa-resources/knowledge-item-types/extraction-prompt-override.yaml
slug: extraction-prompt-override
name: Extraction Prompt Override
description: Customize the extraction prompt for specific data elements

options:
  - name: targetTaxon
    type: string
    label: Target Taxon
    description: The taxon path this prompt applies to (e.g., "financial/revenue")
    required: true

  - name: promptText
    type: text
    label: Prompt Text
    description: The custom extraction prompt
    required: true

  - name: includeContext
    type: boolean
    label: Include Document Context
    description: Whether to include surrounding context in extraction
    default: true

  - name: confidenceThreshold
    type: number
    label: Confidence Threshold
    description: Minimum confidence score (0-1)
    default: 0.8

Via UI

  1. Go to Knowledge > Item Types
  2. Click Create Item Type
  3. Enter:
    • Slug: extraction-prompt-override
    • Name: Extraction Prompt Override
  4. Add options as shown above
  5. Save

Step 4: Create Knowledge Items

Create specific prompt configurations for each filing type.

Via YAML

# 10K Revenue Extraction Item
title: 10K Revenue Extraction Prompt
description: Optimized prompt for extracting revenue from annual reports
knowledgeItemType: extraction-prompt-override
active: true
properties:
  targetTaxon: "financial/total_revenue"
  promptText: |
    Extract the Total Revenue (also called Net Revenue or Total Sales) from this SEC 10K annual report.

    Instructions:
    1. Look in the Consolidated Statements of Operations
    2. Find the most recent fiscal year column
    3. Extract the Total Revenue line item
    4. Include the fiscal year end date

    Return format: {"value": <number>, "fiscalYearEnd": "<date>", "currency": "USD"}
  includeContext: true
  confidenceThreshold: 0.85
# 10Q Revenue Extraction Item
title: 10Q Revenue Extraction Prompt
description: Optimized prompt for extracting revenue from quarterly reports
knowledgeItemType: extraction-prompt-override
active: true
properties:
  targetTaxon: "financial/total_revenue"
  promptText: |
    Extract the Total Revenue from this SEC 10Q quarterly report.

    Instructions:
    1. Look in the Consolidated Statements of Operations
    2. Find the current quarter column (not year-to-date)
    3. Extract the Total Revenue line item
    4. Include the quarter end date

    Return format: {"value": <number>, "quarterEnd": "<date>", "currency": "USD"}
  includeContext: true
  confidenceThreshold: 0.8

Via UI

  1. Go to Knowledge > Items
  2. Click Create Item
  3. Select Item Type: Extraction Prompt Override
  4. Enter title, description, and properties
  5. Save
  6. Repeat for 10Q prompt

Step 5: Create Knowledge Sets

Connect features to items with Knowledge Sets.

Via YAML

# 10K Processing Rules
name: 10K Document Processing Rules
description: Apply 10K-specific extraction prompts
status: ACTIVE

# Conditions: when document has this feature
features:
  - featureTypeSlug: sec-filing-type
    properties:
      filingType: "10K"

# Actions: apply these items
items:
  - itemSlug: 10k-revenue-extraction-prompt
# 10Q Processing Rules
name: 10Q Document Processing Rules
description: Apply 10Q-specific extraction prompts
status: ACTIVE

features:
  - featureTypeSlug: sec-filing-type
    properties:
      filingType: "10Q"

items:
  - itemSlug: 10q-revenue-extraction-prompt

Via UI

  1. Go to Knowledge > Sets
  2. Click Create Knowledge Set
  3. Enter:
    • Name: 10K Document Processing Rules
    • Status: Active
  4. Add Feature condition: SEC Filing Type = 10K
  5. Add Item: 10K Revenue Extraction Prompt
  6. Save
  7. Repeat for 10Q

How It Works at Runtime

  1. Document is uploaded and classified as 10K
  2. Feature “10K” is linked to the document
  3. Processor queries Knowledge Sets for matching features
  4. Knowledge Set “10K Processing Rules” matches
  5. Item “10K Revenue Extraction Prompt” is retrieved
  6. Custom prompt is used for extraction

Complete GitOps Example

Here’s the full set of files for deploying via kdx sync:
kodexa-resources/
├── knowledge-feature-types/
│   └── sec-filing-type.yaml
├── knowledge-item-types/
│   └── extraction-prompt-override.yaml
└── projects/
    └── sec-processing/
        ├── knowledge-items/
        │   ├── 10k-revenue-prompt.yaml
        │   └── 10q-revenue-prompt.yaml
        └── knowledge-sets/
            ├── 10k-rules.yaml
            └── 10q-rules.yaml
Manifest:
# manifests/sec-processing.yaml
resources:
  knowledge-feature-types:
    - sec-filing-type
  knowledge-item-types:
    - extraction-prompt-override
  projects:
    - sec-processing
Deploy:
kdx sync deploy --target my-org --env prod

Next Steps