Document Preprocessor Model - Kodexa Developer Portal

On this page

Overview
Inference Options
Model Details

Slug: document-preprocessor Version: 1.0.0 Infer: Yes

Overview

No overview provided.

Inference Options

The following options can be configured when using this model for inference:

Name	Label	Type	Description	Default	Required
`document_types`	Document Types	string	The possible document types to use for classification. You can include hints in parentheses.	- Document Type 1 (hints:)	No
`should_split_multiple_documents_in_one_page`	Split multiple documents scanned into one page?	boolean	Whether to split multiple documents scanned into one page out into separate pages	False	No
`should_correct_rotation`	Correct rotation of documents?	boolean	Whether to correct the rotation of a page that is rotated by 90, 180, or 270 degrees	False	No
`should_deskew_images`	Deskew images?	boolean	Whether to deskew images with small rotation/skew issues	False	No
`should_exclude_blank_pages`	Exclude blank pages?	boolean	Whether to exclude blank pages	False	No
`should_exclude_duplicate_pages`	Exclude duplicate pages?	boolean	Whether to exclude duplicate pages	False	No
`will_multipage_documents_be_sequential`	Will multi-page documents always be sequential?	boolean	Whether the multi-page documents are always sequential in the source document or could possibly be out of order or separated by other pages	False	No
`should_find_multiple_page_documents`	Find multi-page documents?	boolean	Whether to find multi-page documents (feature classification depends on this being true)	False	No
`should_reorder_pages`	Reorder pages?	boolean	Whether to reorder pages based on the multi-page results	False	No
`should_process_each_document_separately`	Process each document separately?	boolean	Whether to process each document separately after identifying multi-page documents	False	No
`multiple_document_in_one_page_identifier_model`	Multiple documents in one page identifier LLM model	string	The model to use to identify pages with multiple documents scanned into them	gemini-2.0-flash-001	No
`document_classifier_model`	Document classifier LLM model	string	The model to use to classify documents	gemini-2.0-flash-001	No
`multiple_page_document_identifier_model`	Multiple page document identifier LLM model	string	The model to use to identify documents that are scanned across multiple pages	gemini-2.0-flash-001	No
`target_store`	Target Store	documentStore	The store that should receive the processed document	-	No

Model Details

Provider: Kodexa