Slug: tif-to-pdf-model Version: 1.0.0 Infer: Yes

Overview

TIF to PDF Model

The TIF to PDF model converts TIFF image files to PDF documents while preserving the visual content and enabling text extraction. This model is particularly useful for integrating scanned TIFF documents into PDF-based workflows and making their content searchable and processable.

How It Works

The model reads the input TIFF file (supporting both single and multi-page TIFFs)
For each page in the TIFF file:
- The image is extracted as a separate frame
- Image quality is preserved during conversion
All frames are combined into a single PDF document
The PDF is processed with pdfplumber to extract text and structure
A fully structured Kodexa document is created with:
- Page nodes representing each page
- Content area nodes containing the text
- Line and word nodes capturing the text content and positioning

Process Flow

Document Structure

The resulting document will have the following structure:

document
└── page (for each TIFF frame)
    └── content-area
        └── line
            └── word

Each node includes:

Bounding box coordinates: Precise positioning information
Text content: For word nodes, the extracted text
PDF mixin: PDF-specific features and capabilities

Use Cases

This model is particularly useful for:

Legacy Document Conversion: Converting TIFF archives to more usable PDF format
Document Standardization: Standardizing mixed-format documents to PDF
OCR Integration: Preparing scanned documents for OCR processing
Workflow Integration: Incorporating TIFF-based documents into PDF workflows
Document Processing Pipelines: Enabling further processing of TIFF-based content

Technical Details

The model uses img2pdf for high-quality TIFF to PDF conversion
PIL/Pillow is used for TIFF frame extraction and processing
The conversion preserves the original image quality and resolution
Text extraction is performed using pdfplumber on the converted PDF
The model works with both single-page and multi-page TIFF files
Processing is optimized for memory efficiency even with large TIFF files
The original document metadata is preserved in the resulting PDF

Limitations

Text extraction quality depends on the clarity of the original TIFF image
Very large TIFF files may require additional processing time
Compression artifacts in the original TIFF may affect text extraction quality

Model Details

Provider: Kodexa

Introduction

​Overview

​TIF to PDF Model

​How It Works

​Process Flow

​Document Structure

​Use Cases

​Technical Details

​Limitations

​Model Details