Document Preprocessor Model
Preprocesses the document by splitting, rotating, deskewing and classifying and more
Slug: document-preprocessor
Version: 1.0.0
Infer: Yes
Overview
No overview provided.
Inference Options
The following options can be configured when using this model for inference:
Name | Label | Type | Description | Default | Required |
---|---|---|---|---|---|
document_types | Document Types | string | The possible document types to use for classification. You can include hints in parentheses. | - Document Type 1 (hints:) | No |
should_split_multiple_documents_in_one_page | Split multiple documents scanned into one page? | boolean | Whether to split multiple documents scanned into one page out into separate pages | False | No |
should_correct_rotation | Correct rotation of documents? | boolean | Whether to correct the rotation of a page that is rotated by 90, 180, or 270 degrees | False | No |
should_deskew_images | Deskew images? | boolean | Whether to deskew images with small rotation/skew issues | False | No |
should_exclude_blank_pages | Exclude blank pages? | boolean | Whether to exclude blank pages | False | No |
should_exclude_duplicate_pages | Exclude duplicate pages? | boolean | Whether to exclude duplicate pages | False | No |
will_multipage_documents_be_sequential | Will multi-page documents always be sequential? | boolean | Whether the multi-page documents are always sequential in the source document or could possibly be out of order or separated by other pages | False | No |
should_find_multiple_page_documents | Find multi-page documents? | boolean | Whether to find multi-page documents (feature classification depends on this being true) | False | No |
should_reorder_pages | Reorder pages? | boolean | Whether to reorder pages based on the multi-page results | False | No |
should_process_each_document_separately | Process each document separately? | boolean | Whether to process each document separately after identifying multi-page documents | False | No |
multiple_document_in_one_page_identifier_model | Multiple documents in one page identifier LLM model | string | The model to use to identify pages with multiple documents scanned into them | gemini-2.0-flash-001 | No |
document_classifier_model | Document classifier LLM model | string | The model to use to classify documents | gemini-2.0-flash-001 | No |
multiple_page_document_identifier_model | Multiple page document identifier LLM model | string | The model to use to identify documents that are scanned across multiple pages | gemini-2.0-flash-001 | No |
target_store | Target Store | documentStore | The store that should receive the processed document | - | No |
Model Details
- Provider: Kodexa