Slug: document-preprocessor Version: 1.0.0 Infer: Yes

Overview

No overview provided.


Inference Options

The following options can be configured when using this model for inference:

NameLabelTypeDescriptionDefaultRequired
document_typesDocument TypesstringThe possible document types to use for classification. You can include hints in parentheses.- Document Type 1 (hints:)No
should_split_multiple_documents_in_one_pageSplit multiple documents scanned into one page?booleanWhether to split multiple documents scanned into one page out into separate pagesFalseNo
should_correct_rotationCorrect rotation of documents?booleanWhether to correct the rotation of a page that is rotated by 90, 180, or 270 degreesFalseNo
should_deskew_imagesDeskew images?booleanWhether to deskew images with small rotation/skew issuesFalseNo
should_exclude_blank_pagesExclude blank pages?booleanWhether to exclude blank pagesFalseNo
should_exclude_duplicate_pagesExclude duplicate pages?booleanWhether to exclude duplicate pagesFalseNo
will_multipage_documents_be_sequentialWill multi-page documents always be sequential?booleanWhether the multi-page documents are always sequential in the source document or could possibly be out of order or separated by other pagesFalseNo
should_find_multiple_page_documentsFind multi-page documents?booleanWhether to find multi-page documents (feature classification depends on this being true)FalseNo
should_reorder_pagesReorder pages?booleanWhether to reorder pages based on the multi-page resultsFalseNo
should_process_each_document_separatelyProcess each document separately?booleanWhether to process each document separately after identifying multi-page documentsFalseNo
multiple_document_in_one_page_identifier_modelMultiple documents in one page identifier LLM modelstringThe model to use to identify pages with multiple documents scanned into themgemini-2.0-flash-001No
document_classifier_modelDocument classifier LLM modelstringThe model to use to classify documentsgemini-2.0-flash-001No
multiple_page_document_identifier_modelMultiple page document identifier LLM modelstringThe model to use to identify documents that are scanned across multiple pagesgemini-2.0-flash-001No
target_storeTarget StoredocumentStoreThe store that should receive the processed document-No

Model Details

  • Provider: Kodexa