Large File Uploads - Kodexa Developer Portal

Direct uploads to an intake endpoint are limited to 256 MB by the API gateway. For larger files, Kodexa provides two mechanisms:

Presigned URL upload — the server generates a temporary S3 URL; the client uploads directly to S3, bypassing the API gateway entirely
Multipart upload — the file is split into chunks (parts), each uploaded independently via presigned URLs, then assembled server-side

Both require the Enable large uploads option to be turned on in the intake settings.

Enabling Large Uploads

In Kodexa Studio, open the intake settings and check Enable large uploads. This exposes the presigned and multipart upload endpoints for that intake. When disabled (the default), these endpoints return 403 Forbidden. Large uploads use the same intake API tokens as direct uploads — no additional credentials are needed.

Presigned URL Upload

Best for: files between 256 MB and 5 GB over reliable connections.

Request a presigned URL

Call the presigned upload request endpoint with the file details:

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-request?path=large-report.pdf&contentType=application/pdf&contentLength=524288000" \
  -H "x-api-key: kit_your-intake-token"

Response:

{
  "uploadUrl": "https://s3.amazonaws.com/bucket/presigned-uploads/...",
  "s3Key": "presigned-uploads/intake-id/upload-id/large-report.pdf",
  "uploadId": "upload-id",
  "expiresAt": "2026-05-05T15:00:00Z"
}

The presigned URL expires after 15 minutes.

Upload to S3

PUT the file directly to the presigned URL. This bypasses the Kodexa API — the file goes straight to S3:

curl -X PUT "${uploadUrl}" \
  -H "Content-Type: application/pdf" \
  --data-binary @large-report.pdf

You can track upload progress using standard HTTP upload progress mechanisms.

Complete the upload

Tell Kodexa to process the uploaded file:

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-complete?s3Key=${s3Key}&path=large-report.pdf" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {"department": "finance"},
    "labels": "REVIEW",
    "documentVersion": "1.0"
  }'

The response is the created document family, identical to a direct upload response. All intake features work normally: metadata merging, intake scripts, activity plans, knowledge features.

Complete Request Body

The presigned complete endpoint accepts an optional JSON body with the same parameters as a direct upload:

Field	Type	Description
`metadata`	object	Per-upload metadata (highest priority in merge)
`extraFields`	object	Additional metadata fields (lowest priority in merge)
`externalData`	object	External data injected into the KDDB document
`labels`	string	Comma-separated labels
`statusId`	string	Document status ID
`knowledgeFeatures`	string	Knowledge feature IDs JSON
`documentVersion`	string	Document version
`filename`	string	Original filename (used by intake scripts; defaults to path basename)

Multipart Upload

Best for: files over 5 GB, unreliable connections, or when you need resumability. Multipart uploads split the file into chunks (minimum 5 MB each, except the last chunk) and upload each independently. If the connection drops, you only re-upload the failed chunks.

Initiate the upload

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-request?path=huge-dataset.csv&contentType=text/csv&contentLength=10737418240" \
  -H "x-api-key: kit_your-intake-token"

Response:

{
  "uploadId": "multipart-upload-id",
  "s3Key": "presigned-uploads/intake-id/upload-id/huge-dataset.csv",
  "expiresAt": "2026-05-05T15:00:00Z"
}

Get part URLs (batched)

Request presigned URLs for multiple parts in one call. Each part should be at least 5 MB (except the last). You can request up to 500 part URLs per call.

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-part-urls?s3Key=${s3Key}&uploadId=${uploadId}" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3, 4, 5]'

Response:

{
  "parts": [
    {"partNumber": 1, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
    {"partNumber": 2, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
    ...
  ]
}

Each part URL expires after 15 minutes. Request part URLs just-in-time rather than all at once for large uploads.

Upload each part

PUT each chunk to its presigned URL. Record the ETag from each response header:

# Upload part 1 (first 50MB)
curl -X PUT "${partUrl1}" \
  -H "Content-Type: text/csv" \
  --data-binary @part1.bin \
  -D -  # Print response headers to capture ETag

The ETag header in the response identifies the uploaded part.

Complete the upload

Send all part numbers and ETags to finalize:

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-complete?s3Key=${s3Key}&uploadId=${uploadId}&path=huge-dataset.csv" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [
      {"partNumber": 1, "etag": "\"abc123...\""},
      {"partNumber": 2, "etag": "\"def456...\""},
      {"partNumber": 3, "etag": "\"ghi789...\""}
    ],
    "metadata": {"source": "data-pipeline"}
  }'

The response is the created document family.

Resuming an Interrupted Upload

If the upload is interrupted, you don’t need to start over:

Keep the uploadId and s3Key from step 1
Request new part URLs for any parts that weren’t uploaded
Upload only the missing parts
Complete with all parts (already-uploaded parts remain in S3)

Aborting a Multipart Upload

To cancel an in-progress multipart upload and free S3 resources:

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-abort?s3Key=${s3Key}&uploadId=${uploadId}" \
  -H "x-api-key: kit_your-intake-token"

Returns 204 No Content.

S3 Constraints

Constraint	Limit
Minimum part size	5 MB (except last part)
Maximum parts	10,000
Maximum file size	5 TB
Part URL expiry	15 minutes
Part URLs per request	500

Python Example

import requests
import os
import math

BASE_URL = "https://platform.kodexa-enterprise.com"
TOKEN = "kit_your-intake-token"
HEADERS = {"x-api-key": TOKEN}
PART_SIZE = 50 * 1024 * 1024  # 50 MB per part

def upload_large_file(org_slug, intake_slug, file_path):
    file_size = os.path.getsize(file_path)
    filename = os.path.basename(file_path)

    if file_size <= 256 * 1024 * 1024:
        # Direct upload for files under 256 MB
        with open(file_path, "rb") as f:
            resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}",
                headers=HEADERS,
                files={"file": (filename, f)},
            )
        return resp.json()

    # Step 1: Initiate multipart upload
    resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-request",
        headers=HEADERS,
        params={"path": filename, "contentType": "application/octet-stream", "contentLength": file_size},
    )
    resp.raise_for_status()
    init = resp.json()
    s3_key = init["s3Key"]
    upload_id = init["uploadId"]

    # Step 2: Upload parts
    total_parts = math.ceil(file_size / PART_SIZE)
    completed_parts = []

    with open(file_path, "rb") as f:
        for batch_start in range(0, total_parts, 500):
            batch_end = min(batch_start + 500, total_parts)
            part_numbers = list(range(batch_start + 1, batch_end + 1))

            # Get presigned URLs for this batch
            urls_resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-part-urls",
                headers={**HEADERS, "Content-Type": "application/json"},
                params={"s3Key": s3_key, "uploadId": upload_id},
                json=part_numbers,
            )
            urls_resp.raise_for_status()
            part_urls = urls_resp.json()["parts"]

            for part_info in part_urls:
                chunk = f.read(PART_SIZE)
                put_resp = requests.put(part_info["uploadUrl"], data=chunk)
                put_resp.raise_for_status()
                completed_parts.append({
                    "partNumber": part_info["partNumber"],
                    "etag": put_resp.headers["ETag"],
                })
                print(f"  Uploaded part {part_info['partNumber']}/{total_parts}")

    # Step 3: Complete
    complete_resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-complete",
        headers={**HEADERS, "Content-Type": "application/json"},
        params={"s3Key": s3_key, "uploadId": upload_id, "path": filename},
        json={"parts": completed_parts},
    )
    complete_resp.raise_for_status()
    return complete_resp.json()

TypeScript Example

const BASE_URL = "https://platform.kodexa-enterprise.com";
const TOKEN = "kit_your-intake-token";
const PART_SIZE = 50 * 1024 * 1024; // 50 MB

async function uploadLargeFile(orgSlug: string, intakeSlug: string, file: File) {
  const headers = { "x-api-key": TOKEN };

  if (file.size <= 256 * 1024 * 1024) {
    // Direct upload for files under 256 MB
    const form = new FormData();
    form.append("file", file);
    const resp = await fetch(`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}`, {
      method: "POST", headers, body: form,
    });
    return resp.json();
  }

  // Step 1: Initiate multipart upload
  const params = new URLSearchParams({
    path: file.name,
    contentType: file.type || "application/octet-stream",
    contentLength: String(file.size),
  });
  const initResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-request?${params}`,
    { method: "POST", headers },
  );
  const { s3Key, uploadId } = await initResp.json();

  // Step 2: Upload parts
  const totalParts = Math.ceil(file.size / PART_SIZE);
  const completedParts: { partNumber: number; etag: string }[] = [];

  for (let batchStart = 0; batchStart < totalParts; batchStart += 500) {
    const batchEnd = Math.min(batchStart + 500, totalParts);
    const partNumbers = Array.from({ length: batchEnd - batchStart }, (_, i) => batchStart + i + 1);

    const urlParams = new URLSearchParams({ s3Key, uploadId });
    const urlsResp = await fetch(
      `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-part-urls?${urlParams}`,
      { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(partNumbers) },
    );
    const { parts: partUrls } = await urlsResp.json();

    for (const partInfo of partUrls) {
      const start = (partInfo.partNumber - 1) * PART_SIZE;
      const end = Math.min(start + PART_SIZE, file.size);
      const chunk = file.slice(start, end);

      const putResp = await fetch(partInfo.uploadUrl, { method: "PUT", body: chunk });
      completedParts.push({ partNumber: partInfo.partNumber, etag: putResp.headers.get("ETag")! });
    }
  }

  // Step 3: Complete
  const completeParams = new URLSearchParams({ s3Key, uploadId, path: file.name });
  const completeResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-complete?${completeParams}`,
    {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify({ parts: completedParts }),
    },
  );
  return completeResp.json();
}

Endpoint Reference

Endpoint	Method	Description
`/api/intake/{org}/{slug}/presigned-upload-request`	POST	Get a presigned S3 PUT URL
`/api/intake/{org}/{slug}/presigned-upload-complete`	POST	Finalize a presigned upload
`/api/intake/{org}/{slug}/multipart-upload-request`	POST	Initiate a multipart upload
`/api/intake/{org}/{slug}/multipart-upload-part-urls`	POST	Get presigned URLs for parts (batched)
`/api/intake/{org}/{slug}/multipart-upload-complete`	POST	Finalize a multipart upload
`/api/intake/{org}/{slug}/multipart-upload-abort`	POST	Cancel a multipart upload

All endpoints accept the same authentication as the direct intake upload: either an intake API token (kit_*) or a platform API key with appropriate permissions.

Documentation Index

​Enabling Large Uploads

​Presigned URL Upload

​Complete Request Body

​Multipart Upload

​Resuming an Interrupted Upload

​Aborting a Multipart Upload

​S3 Constraints

​Python Example

​TypeScript Example

​Endpoint Reference

Enabling Large Uploads

Presigned URL Upload

Complete Request Body

Multipart Upload

Resuming an Interrupted Upload

Aborting a Multipart Upload

S3 Constraints

Python Example

TypeScript Example

Endpoint Reference