> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Customizing Extraction by Document Type

> Customize document extraction in Kodexa by routing different document types through different prompts, models, and processing rules at runtime.

This guide walks through a complete example: using different extraction prompts for SEC 10K vs 10Q filings. The same pattern applies to any scenario where you need different processing for different document types.

## The Goal

When processing SEC filings:

* **10K documents** should use prompts optimized for annual reports
* **10Q documents** should use prompts optimized for quarterly reports

## What You'll Create

```mermaid theme={null}
flowchart TD
    subgraph types["1. Define Types"]
        FT[Feature Type<br/>"SEC Filing Type"]
        IT[Item Type<br/>"Extraction Prompt"]
    end

    subgraph instances["2. Create Instances"]
        F1[Feature: "10K"]
        F2[Feature: "10Q"]
        I1[Item: "10K Prompt"]
        I2[Item: "10Q Prompt"]
    end

    subgraph sets["3. Connect via Sets"]
        KS1[Set: "10K Rules"]
        KS2[Set: "10Q Rules"]
    end

    FT --> F1
    FT --> F2
    IT --> I1
    IT --> I2

    F1 --> KS1
    KS1 --> I1

    F2 --> KS2
    KS2 --> I2
```

## Step 1: Create the Feature Type

First, define a Feature Type to classify documents by their SEC filing type.

### Via YAML

```yaml theme={null}
# kodexa-resources/knowledge-feature-types/sec-filing-type.yaml
slug: sec-filing-type
name: SEC Filing Type
description: Classification of SEC filing documents (10K, 10Q, 8K, etc.)
active: true

options:
  - name: filingType
    type: string
    label: Filing Type
    description: The SEC filing type code
    required: true

extendedOptions:
  - name: filingName
    type: string
    label: Filing Name
    description: Human-readable filing name
  - name: frequency
    type: string
    label: Reporting Frequency
    description: Annual, Quarterly, etc.
```

### Via UI

1. Go to **Knowledge > Feature Types**
2. Click **Create Feature Type**
3. Enter:
   * Slug: `sec-filing-type`
   * Name: `SEC Filing Type`
   * Description: `Classification of SEC filing documents`
4. Add option: `filingType` (string, required)
5. Add extended option: `filingName` (string)
6. Save

## Step 2: Create Features

Create features for each filing type you want to handle.

### Via YAML

```yaml theme={null}
# These are typically created via API or UI during document processing
# Here's the structure for reference:

# 10K Feature
featureType: sec-filing-type
properties:
  filingType: "10K"
extendedProperties:
  filingName: "Annual Report"
  frequency: "Annual"
active: true

# 10Q Feature
featureType: sec-filing-type
properties:
  filingType: "10Q"
extendedProperties:
  filingName: "Quarterly Report"
  frequency: "Quarterly"
active: true
```

### Via UI

1. Go to **Knowledge > Features**
2. Click **Create Feature**
3. Select Feature Type: `SEC Filing Type`
4. Enter properties:
   * Filing Type: `10K`
   * Filing Name: `Annual Report`
5. Save
6. Repeat for 10Q

## Step 3: Create the Item Type

Define an Item Type for customizing extraction prompts.

### Via YAML

```yaml theme={null}
# kodexa-resources/knowledge-item-types/extraction-prompt-override.yaml
slug: extraction-prompt-override
name: Extraction Prompt Override
description: Customize the extraction prompt for specific data elements

options:
  - name: targetTaxon
    type: string
    label: Target Taxon
    description: The taxon path this prompt applies to (e.g., "financial/revenue")
    required: true

  - name: promptText
    type: text
    label: Prompt Text
    description: The custom extraction prompt
    required: true

  - name: includeContext
    type: boolean
    label: Include Document Context
    description: Whether to include surrounding context in extraction
    default: true

  - name: confidenceThreshold
    type: number
    label: Confidence Threshold
    description: Minimum confidence score (0-1)
    default: 0.8
```

### Via UI

1. Go to **Knowledge > Item Types**
2. Click **Create Item Type**
3. Enter:
   * Slug: `extraction-prompt-override`
   * Name: `Extraction Prompt Override`
4. Add options as shown above
5. Save

## Step 4: Create Knowledge Items

Create specific prompt configurations for each filing type.

### Via YAML

```yaml theme={null}
# 10K Revenue Extraction Item
title: 10K Revenue Extraction Prompt
description: Optimized prompt for extracting revenue from annual reports
knowledgeItemType: extraction-prompt-override
active: true
properties:
  targetTaxon: "financial/total_revenue"
  promptText: |
    Extract the Total Revenue (also called Net Revenue or Total Sales) from this SEC 10K annual report.

    Instructions:
    1. Look in the Consolidated Statements of Operations
    2. Find the most recent fiscal year column
    3. Extract the Total Revenue line item
    4. Include the fiscal year end date

    Return format: {"value": <number>, "fiscalYearEnd": "<date>", "currency": "USD"}
  includeContext: true
  confidenceThreshold: 0.85
```

```yaml theme={null}
# 10Q Revenue Extraction Item
title: 10Q Revenue Extraction Prompt
description: Optimized prompt for extracting revenue from quarterly reports
knowledgeItemType: extraction-prompt-override
active: true
properties:
  targetTaxon: "financial/total_revenue"
  promptText: |
    Extract the Total Revenue from this SEC 10Q quarterly report.

    Instructions:
    1. Look in the Consolidated Statements of Operations
    2. Find the current quarter column (not year-to-date)
    3. Extract the Total Revenue line item
    4. Include the quarter end date

    Return format: {"value": <number>, "quarterEnd": "<date>", "currency": "USD"}
  includeContext: true
  confidenceThreshold: 0.8
```

### Via UI

1. Go to **Knowledge > Items**
2. Click **Create Item**
3. Select Item Type: `Extraction Prompt Override`
4. Enter title, description, and properties
5. Save
6. Repeat for 10Q prompt

## Step 5: Create Knowledge Sets

Connect features to items with Knowledge Sets.

### Via YAML

```yaml theme={null}
# 10K Processing Rules
name: 10K Document Processing Rules
description: Apply 10K-specific extraction prompts
status: ACTIVE

# Conditions: when document has this feature
features:
  - featureTypeSlug: sec-filing-type
    properties:
      filingType: "10K"

# Actions: apply these items
items:
  - itemSlug: 10k-revenue-extraction-prompt
```

```yaml theme={null}
# 10Q Processing Rules
name: 10Q Document Processing Rules
description: Apply 10Q-specific extraction prompts
status: ACTIVE

features:
  - featureTypeSlug: sec-filing-type
    properties:
      filingType: "10Q"

items:
  - itemSlug: 10q-revenue-extraction-prompt
```

### Via UI

1. Go to **Knowledge > Sets**
2. Click **Create Knowledge Set**
3. Enter:
   * Name: `10K Document Processing Rules`
   * Status: `Active`
4. Add Feature condition: SEC Filing Type = 10K
5. Add Item: 10K Revenue Extraction Prompt
6. Save
7. Repeat for 10Q

## How It Works at Runtime

```mermaid theme={null}
sequenceDiagram
    participant Doc as Document
    participant Proc as Processor
    participant KS as Knowledge System

    Doc->>Proc: Document uploaded
    Proc->>KS: Check features
    KS-->>Proc: Document has "10K" feature
    Proc->>KS: Find matching Knowledge Sets
    KS-->>Proc: Return "10K Processing Rules"
    Proc->>KS: Get Items from Set
    KS-->>Proc: Return "10K Revenue Prompt"
    Proc->>Doc: Apply customized extraction
```

1. Document is uploaded and classified as 10K
2. Feature "10K" is linked to the document
3. Processor queries Knowledge Sets for matching features
4. Knowledge Set "10K Processing Rules" matches
5. Item "10K Revenue Extraction Prompt" is retrieved
6. Custom prompt is used for extraction

## Complete GitOps Example

Here's the full set of files for deploying via `kdx sync`:

```text theme={null}
kodexa-resources/
├── knowledge-feature-types/
│   └── sec-filing-type.yaml
├── knowledge-item-types/
│   └── extraction-prompt-override.yaml
└── projects/
    └── sec-processing/
        ├── knowledge-items/
        │   ├── 10k-revenue-prompt.yaml
        │   └── 10q-revenue-prompt.yaml
        └── knowledge-sets/
            ├── 10k-rules.yaml
            └── 10q-rules.yaml
```

Manifest:

```yaml theme={null}
# manifests/sec-processing.yaml
resources:
  knowledge-feature-types:
    - sec-filing-type
  knowledge-item-types:
    - extraction-prompt-override
  projects:
    - sec-processing
```

Deploy:

```bash theme={null}
kdx sync deploy --target my-org --env prod
```

## Next Steps

* [Knowledge Feature Types](/concepts/knowledge_feature_types) - Full reference
* [Knowledge Item Types](/concepts/knowledge_item_types) - Full reference
* [Adding Validation Rules](/concepts/adding_validation_rules) - Another common use case
