Slug: llm-document-classifier Version: 1.0.0 Infer: Yes

Overview

LLM Document Classifier Model

The LLM Document Classifier model uses AI to automatically categorize documents based on their content. It examines document text and determines the most appropriate document type from a predefined list, providing intelligent document sorting and routing capabilities.

How It Works

  1. The model analyzes document content at the page level
  2. It compares the content against a list of document types with optional hints
  3. Using a powerful language model, it determines the most appropriate classification
  4. Optionally generates a concise summary of the document content
  5. Adds the classification and summary as features to the document

Options Configuration

OptionDescription
document_typesList of possible document types to classify against, with optional hints in parentheses
create_summaryWhen enabled, generates a one-sentence summary of the document content
classification_modelThe AI model used for classification and summarization

Process Flow

Document Type Configuration

The document_types option accepts a list of document types with optional hints to guide classification. For example:

- Invoice (hints: total amount, bill to, payment terms)
- Receipt (hints: paid, payment received, thank you)
- Contract (hints: agreement, terms and conditions, parties)
- Unknown

Including hints improves classification accuracy by telling the AI which keywords or patterns to look for when identifying each document type.

Classification Process

The classification process follows these steps:

  1. Content Extraction: Extracts text content from each page
  2. Prompt Creation: Constructs a prompt with your document types and instructions
  3. LLM Analysis: Sends the prompt and content to the AI model
  4. Response Processing: Parses the AI’s JSON response to extract the document type and summary
  5. Feature Addition: Adds the classification and summary as document features

Example Usage

This model is particularly useful for:

  • Automatically sorting incoming documents by type
  • Creating document metadata for search and filtering
  • Routing documents to appropriate processing pipelines
  • Generating concise document summaries for quick review
  • Building intelligent document management systems

Configuration Example

To classify business documents with summaries:

document_types: |
  - Invoice (hints: invoice number, amount due, bill to)
  - Purchase Order (hints: PO number, ordered items, quantities)
  - Shipping Manifest (hints: tracking numbers, weights, ship dates)
  - Contract (hints: agreement, party names, signatures)
  - Credit Memo (hints: credit, return, refund)
  - Unknown
create_summary: true
classification_model: "anthropic.claude-3-5-sonnet-20240620-v1:0"

Inference Options

The following options can be configured when using this model for inference:

NameLabelTypeDescriptionDefaultRequired
document_typesDocument TypesstringThe possible document types to use for classification. You can include hints in parentheses.- Document Type 1 (hints:)No
create_summaryCreate SummarybooleanWhether to create a summary of the documentTrueNo
classification_modelModelcloudModelThe model to use to classify and summarize of each page of the documentanthropic.claude-3-5-sonnet-20240620-v1:0No

Model Details

  • Provider: Kodexa