Splits, rotates and classifies the document
splitter-rotator-classifier
Version: 1.0.0
Infer: Yes
Option | Description |
---|---|
document_types | List of document types to classify against (can include hints in parentheses) |
split_multiple_documents_in_one_page | When enabled, detects and splits pages containing multiple distinct documents |
correct_rotation | Analyzes and corrects page orientation for proper reading |
exclude_blank_pages | Filters out completely blank or mostly white pages |
exclude_duplicate_pages | Removes duplicate pages based on visual similarity |
reorder_pages | Intelligently groups pages that belong to the same multi-page document |
multiple_document_in_one_page_identifier_model | LLM model used to verify if a page contains multiple documents |
document_classifier_model | LLM model used to determine document types |
multiple_page_document_identifier_model | LLM model used to identify pages belonging to the same document |
target_store | Document store to receive the processed documents |
Name | Label | Type | Description | Default | Required |
---|---|---|---|---|---|
document_types | Document Types | string | The possible document types to use for classification. You can include hints in parentheses. | - Document Type 1 (hints:) | No |
split_multiple_documents_in_one_page | Split multiple documents in one page? | boolean | Whether to split multiple documents in one page | True | No |
correct_rotation | Correct rotation of documents? | boolean | Whether to correct the rotation of the document | True | No |
exclude_blank_pages | Exclude blank pages? | boolean | Whether to exclude blank pages | True | No |
exclude_duplicate_pages | Exclude duplicate pages? | boolean | Whether to exclude duplicate pages | True | No |
reorder_pages | Reorder pages? | boolean | Whether to reorder pages based on the multi-page results | True | No |
multiple_document_in_one_page_identifier_model | Multiple documents in one page identifier LLM model | string | The model to use to identify pages with multiple documents scanned into them | gemini-2.0-flash-001 | No |
document_classifier_model | Document classifier LLM model | string | The model to use to classify documents | gemini-2.0-flash-001 | No |
multiple_page_document_identifier_model | Multiple page document identifier LLM model | string | The model to use to identify documents that are scanned across multiple pages | gemini-2.0-flash-001 | No |
target_store | Target Store | documentStore | The store that should receive the processed document | - | No |