Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt

Use this file to discover all available pages before exploring further.

Direct uploads to an intake endpoint are limited to 256 MB by the API gateway. For larger files, Kodexa provides two mechanisms:
  • Presigned URL upload — the server generates a temporary S3 URL; the client uploads directly to S3, bypassing the API gateway entirely
  • Multipart upload — the file is split into chunks (parts), each uploaded independently via presigned URLs, then assembled server-side
Both require the Enable large uploads option to be turned on in the intake settings.

Enabling Large Uploads

In Kodexa Studio, open the intake settings and check Enable large uploads. This exposes the presigned and multipart upload endpoints for that intake. When disabled (the default), these endpoints return 403 Forbidden. Large uploads use the same intake API tokens as direct uploads — no additional credentials are needed.

Presigned URL Upload

Best for: files between 256 MB and 5 GB over reliable connections.
1

Request a presigned URL

Call the presigned upload request endpoint with the file details:
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-request?path=large-report.pdf&contentType=application/pdf&contentLength=524288000" \
  -H "x-api-key: kit_your-intake-token"
Response:
{
  "uploadUrl": "https://s3.amazonaws.com/bucket/presigned-uploads/...",
  "s3Key": "presigned-uploads/intake-id/upload-id/large-report.pdf",
  "uploadId": "upload-id",
  "expiresAt": "2026-05-05T15:00:00Z"
}
The presigned URL expires after 15 minutes.
2

Upload to S3

PUT the file directly to the presigned URL. This bypasses the Kodexa API — the file goes straight to S3:
curl -X PUT "${uploadUrl}" \
  -H "Content-Type: application/pdf" \
  --data-binary @large-report.pdf
You can track upload progress using standard HTTP upload progress mechanisms.
3

Complete the upload

Tell Kodexa to process the uploaded file:
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-complete?s3Key=${s3Key}&path=large-report.pdf" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {"department": "finance"},
    "labels": "REVIEW",
    "documentVersion": "1.0"
  }'
The response is the created document family, identical to a direct upload response. All intake features work normally: metadata merging, intake scripts, activity plans, knowledge features.

Complete Request Body

The presigned complete endpoint accepts an optional JSON body with the same parameters as a direct upload:
FieldTypeDescription
metadataobjectPer-upload metadata (highest priority in merge)
extraFieldsobjectAdditional metadata fields (lowest priority in merge)
externalDataobjectExternal data injected into the KDDB document
labelsstringComma-separated labels
statusIdstringDocument status ID
knowledgeFeaturesstringKnowledge feature IDs JSON
documentVersionstringDocument version
filenamestringOriginal filename (used by intake scripts; defaults to path basename)

Multipart Upload

Best for: files over 5 GB, unreliable connections, or when you need resumability. Multipart uploads split the file into chunks (minimum 5 MB each, except the last chunk) and upload each independently. If the connection drops, you only re-upload the failed chunks.
1

Initiate the upload

curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-request?path=huge-dataset.csv&contentType=text/csv&contentLength=10737418240" \
  -H "x-api-key: kit_your-intake-token"
Response:
{
  "uploadId": "multipart-upload-id",
  "s3Key": "presigned-uploads/intake-id/upload-id/huge-dataset.csv",
  "expiresAt": "2026-05-05T15:00:00Z"
}
2

Get part URLs (batched)

Request presigned URLs for multiple parts in one call. Each part should be at least 5 MB (except the last). You can request up to 500 part URLs per call.
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-part-urls?s3Key=${s3Key}&uploadId=${uploadId}" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3, 4, 5]'
Response:
{
  "parts": [
    {"partNumber": 1, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
    {"partNumber": 2, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
    ...
  ]
}
Each part URL expires after 15 minutes. Request part URLs just-in-time rather than all at once for large uploads.
3

Upload each part

PUT each chunk to its presigned URL. Record the ETag from each response header:
# Upload part 1 (first 50MB)
curl -X PUT "${partUrl1}" \
  -H "Content-Type: text/csv" \
  --data-binary @part1.bin \
  -D -  # Print response headers to capture ETag
The ETag header in the response identifies the uploaded part.
4

Complete the upload

Send all part numbers and ETags to finalize:
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-complete?s3Key=${s3Key}&uploadId=${uploadId}&path=huge-dataset.csv" \
  -H "x-api-key: kit_your-intake-token" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [
      {"partNumber": 1, "etag": "\"abc123...\""},
      {"partNumber": 2, "etag": "\"def456...\""},
      {"partNumber": 3, "etag": "\"ghi789...\""}
    ],
    "metadata": {"source": "data-pipeline"}
  }'
The response is the created document family.

Resuming an Interrupted Upload

If the upload is interrupted, you don’t need to start over:
  1. Keep the uploadId and s3Key from step 1
  2. Request new part URLs for any parts that weren’t uploaded
  3. Upload only the missing parts
  4. Complete with all parts (already-uploaded parts remain in S3)

Aborting a Multipart Upload

To cancel an in-progress multipart upload and free S3 resources:
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-abort?s3Key=${s3Key}&uploadId=${uploadId}" \
  -H "x-api-key: kit_your-intake-token"
Returns 204 No Content.

S3 Constraints

ConstraintLimit
Minimum part size5 MB (except last part)
Maximum parts10,000
Maximum file size5 TB
Part URL expiry15 minutes
Part URLs per request500

Python Example

import requests
import os
import math

BASE_URL = "https://platform.kodexa-enterprise.com"
TOKEN = "kit_your-intake-token"
HEADERS = {"x-api-key": TOKEN}
PART_SIZE = 50 * 1024 * 1024  # 50 MB per part

def upload_large_file(org_slug, intake_slug, file_path):
    file_size = os.path.getsize(file_path)
    filename = os.path.basename(file_path)

    if file_size <= 256 * 1024 * 1024:
        # Direct upload for files under 256 MB
        with open(file_path, "rb") as f:
            resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}",
                headers=HEADERS,
                files={"file": (filename, f)},
            )
        return resp.json()

    # Step 1: Initiate multipart upload
    resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-request",
        headers=HEADERS,
        params={"path": filename, "contentType": "application/octet-stream", "contentLength": file_size},
    )
    resp.raise_for_status()
    init = resp.json()
    s3_key = init["s3Key"]
    upload_id = init["uploadId"]

    # Step 2: Upload parts
    total_parts = math.ceil(file_size / PART_SIZE)
    completed_parts = []

    with open(file_path, "rb") as f:
        for batch_start in range(0, total_parts, 500):
            batch_end = min(batch_start + 500, total_parts)
            part_numbers = list(range(batch_start + 1, batch_end + 1))

            # Get presigned URLs for this batch
            urls_resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-part-urls",
                headers={**HEADERS, "Content-Type": "application/json"},
                params={"s3Key": s3_key, "uploadId": upload_id},
                json=part_numbers,
            )
            urls_resp.raise_for_status()
            part_urls = urls_resp.json()["parts"]

            for part_info in part_urls:
                chunk = f.read(PART_SIZE)
                put_resp = requests.put(part_info["uploadUrl"], data=chunk)
                put_resp.raise_for_status()
                completed_parts.append({
                    "partNumber": part_info["partNumber"],
                    "etag": put_resp.headers["ETag"],
                })
                print(f"  Uploaded part {part_info['partNumber']}/{total_parts}")

    # Step 3: Complete
    complete_resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-complete",
        headers={**HEADERS, "Content-Type": "application/json"},
        params={"s3Key": s3_key, "uploadId": upload_id, "path": filename},
        json={"parts": completed_parts},
    )
    complete_resp.raise_for_status()
    return complete_resp.json()

TypeScript Example

const BASE_URL = "https://platform.kodexa-enterprise.com";
const TOKEN = "kit_your-intake-token";
const PART_SIZE = 50 * 1024 * 1024; // 50 MB

async function uploadLargeFile(orgSlug: string, intakeSlug: string, file: File) {
  const headers = { "x-api-key": TOKEN };

  if (file.size <= 256 * 1024 * 1024) {
    // Direct upload for files under 256 MB
    const form = new FormData();
    form.append("file", file);
    const resp = await fetch(`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}`, {
      method: "POST", headers, body: form,
    });
    return resp.json();
  }

  // Step 1: Initiate multipart upload
  const params = new URLSearchParams({
    path: file.name,
    contentType: file.type || "application/octet-stream",
    contentLength: String(file.size),
  });
  const initResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-request?${params}`,
    { method: "POST", headers },
  );
  const { s3Key, uploadId } = await initResp.json();

  // Step 2: Upload parts
  const totalParts = Math.ceil(file.size / PART_SIZE);
  const completedParts: { partNumber: number; etag: string }[] = [];

  for (let batchStart = 0; batchStart < totalParts; batchStart += 500) {
    const batchEnd = Math.min(batchStart + 500, totalParts);
    const partNumbers = Array.from({ length: batchEnd - batchStart }, (_, i) => batchStart + i + 1);

    const urlParams = new URLSearchParams({ s3Key, uploadId });
    const urlsResp = await fetch(
      `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-part-urls?${urlParams}`,
      { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(partNumbers) },
    );
    const { parts: partUrls } = await urlsResp.json();

    for (const partInfo of partUrls) {
      const start = (partInfo.partNumber - 1) * PART_SIZE;
      const end = Math.min(start + PART_SIZE, file.size);
      const chunk = file.slice(start, end);

      const putResp = await fetch(partInfo.uploadUrl, { method: "PUT", body: chunk });
      completedParts.push({ partNumber: partInfo.partNumber, etag: putResp.headers.get("ETag")! });
    }
  }

  // Step 3: Complete
  const completeParams = new URLSearchParams({ s3Key, uploadId, path: file.name });
  const completeResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-complete?${completeParams}`,
    {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify({ parts: completedParts }),
    },
  );
  return completeResp.json();
}

Endpoint Reference

EndpointMethodDescription
/api/intake/{org}/{slug}/presigned-upload-requestPOSTGet a presigned S3 PUT URL
/api/intake/{org}/{slug}/presigned-upload-completePOSTFinalize a presigned upload
/api/intake/{org}/{slug}/multipart-upload-requestPOSTInitiate a multipart upload
/api/intake/{org}/{slug}/multipart-upload-part-urlsPOSTGet presigned URLs for parts (batched)
/api/intake/{org}/{slug}/multipart-upload-completePOSTFinalize a multipart upload
/api/intake/{org}/{slug}/multipart-upload-abortPOSTCancel a multipart upload
All endpoints accept the same authentication as the direct intake upload: either an intake API token (kit_*) or a platform API key with appropriate permissions.