TIF to PDF Model
Creates a PDF from the given TIF
Slug: tif-to-pdf-model
Version: 1.0.0
Infer: Yes
Overview
TIF to PDF Model
The TIF to PDF model converts TIFF image files to PDF documents while preserving the visual content and enabling text extraction. This model is particularly useful for integrating scanned TIFF documents into PDF-based workflows and making their content searchable and processable.
How It Works
- The model reads the input TIFF file (supporting both single and multi-page TIFFs)
- For each page in the TIFF file:
- The image is extracted as a separate frame
- Image quality is preserved during conversion
- All frames are combined into a single PDF document
- The PDF is processed with pdfplumber to extract text and structure
- A fully structured Kodexa document is created with:
- Page nodes representing each page
- Content area nodes containing the text
- Line and word nodes capturing the text content and positioning
Process Flow
Document Structure
The resulting document will have the following structure:
Each node includes:
- Bounding box coordinates: Precise positioning information
- Text content: For word nodes, the extracted text
- PDF mixin: PDF-specific features and capabilities
Use Cases
This model is particularly useful for:
- Legacy Document Conversion: Converting TIFF archives to more usable PDF format
- Document Standardization: Standardizing mixed-format documents to PDF
- OCR Integration: Preparing scanned documents for OCR processing
- Workflow Integration: Incorporating TIFF-based documents into PDF workflows
- Document Processing Pipelines: Enabling further processing of TIFF-based content
Technical Details
- The model uses img2pdf for high-quality TIFF to PDF conversion
- PIL/Pillow is used for TIFF frame extraction and processing
- The conversion preserves the original image quality and resolution
- Text extraction is performed using pdfplumber on the converted PDF
- The model works with both single-page and multi-page TIFF files
- Processing is optimized for memory efficiency even with large TIFF files
- The original document metadata is preserved in the resulting PDF
Limitations
- Text extraction quality depends on the clarity of the original TIFF image
- Very large TIFF files may require additional processing time
- Compression artifacts in the original TIFF may affect text extraction quality
Model Details
- Provider: Kodexa