> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Large File Uploads

> Upload files over 256 MB to Kodexa intakes using presigned S3 URLs for single-file uploads or multipart uploads for resumable, chunked transfers.

Direct uploads to an intake endpoint are limited to **256 MB** by the API gateway. For larger files, Kodexa provides two mechanisms:

* **Presigned URL upload** — the server generates a temporary S3 URL; the client uploads directly to S3, bypassing the API gateway entirely
* **Multipart upload** — the file is split into chunks (parts), each uploaded independently via presigned URLs, then assembled server-side

Both require the **Enable large uploads** option to be turned on in the intake settings.

## Enabling Large Uploads

In Kodexa Studio, open the intake settings and check **Enable large uploads**. This exposes the presigned and multipart upload endpoints for that intake. When disabled (the default), these endpoints return `403 Forbidden`.

Large uploads use the same intake API tokens as direct uploads — no additional credentials are needed.

## Presigned URL Upload

Best for: files between 256 MB and 5 GB over reliable connections.

<Steps>
  <Step title="Request a presigned URL">
    Call the presigned upload request endpoint with the file details:

    ```bash theme={null}
    curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-request?path=large-report.pdf&contentType=application/pdf&contentLength=524288000" \
      -H "x-api-key: kit_your-intake-token"
    ```

    Response:

    ```json theme={null}
    {
      "uploadUrl": "https://s3.amazonaws.com/bucket/presigned-uploads/...",
      "s3Key": "presigned-uploads/intake-id/upload-id/large-report.pdf",
      "uploadId": "upload-id",
      "expiresAt": "2026-05-05T15:00:00Z"
    }
    ```

    The presigned URL expires after **15 minutes**.
  </Step>

  <Step title="Upload to S3">
    PUT the file directly to the presigned URL. This bypasses the Kodexa API — the file goes straight to S3:

    ```bash theme={null}
    curl -X PUT "${uploadUrl}" \
      -H "Content-Type: application/pdf" \
      --data-binary @large-report.pdf
    ```

    You can track upload progress using standard HTTP upload progress mechanisms.
  </Step>

  <Step title="Complete the upload">
    Tell Kodexa to process the uploaded file:

    ```bash theme={null}
    curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-complete?s3Key=${s3Key}&path=large-report.pdf" \
      -H "x-api-key: kit_your-intake-token" \
      -H "Content-Type: application/json" \
      -d '{
        "metadata": {"department": "finance"},
        "labels": "REVIEW",
        "documentVersion": "1.0"
      }'
    ```

    The response is the created document family, identical to a direct upload response. All intake features work normally: metadata merging, intake scripts, activity plans, knowledge features.
  </Step>
</Steps>

### Complete Request Body

The presigned complete endpoint accepts an optional JSON body with the same parameters as a direct upload:

| Field               | Type   | Description                                                           |
| ------------------- | ------ | --------------------------------------------------------------------- |
| `metadata`          | object | Per-upload metadata (highest priority in merge)                       |
| `extraFields`       | object | Additional metadata fields (lowest priority in merge)                 |
| `externalData`      | object | External data injected into the KDDB document                         |
| `labels`            | string | Comma-separated labels                                                |
| `statusId`          | string | Document status ID                                                    |
| `knowledgeFeatures` | string | Knowledge feature IDs JSON                                            |
| `documentVersion`   | string | Document version                                                      |
| `filename`          | string | Original filename (used by intake scripts; defaults to path basename) |

## Multipart Upload

Best for: files over 5 GB, unreliable connections, or when you need resumability.

Multipart uploads split the file into chunks (minimum 5 MB each, except the last chunk) and upload each independently. If the connection drops, you only re-upload the failed chunks.

<Steps>
  <Step title="Initiate the upload">
    ```bash theme={null}
    curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-request?path=huge-dataset.csv&contentType=text/csv&contentLength=10737418240" \
      -H "x-api-key: kit_your-intake-token"
    ```

    Response:

    ```json theme={null}
    {
      "uploadId": "multipart-upload-id",
      "s3Key": "presigned-uploads/intake-id/upload-id/huge-dataset.csv",
      "expiresAt": "2026-05-05T15:00:00Z"
    }
    ```
  </Step>

  <Step title="Get part URLs (batched)">
    Request presigned URLs for multiple parts in one call. Each part should be at least **5 MB** (except the last). You can request up to **500** part URLs per call.

    ```bash theme={null}
    curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-part-urls?s3Key=${s3Key}&uploadId=${uploadId}" \
      -H "x-api-key: kit_your-intake-token" \
      -H "Content-Type: application/json" \
      -d '[1, 2, 3, 4, 5]'
    ```

    Response:

    ```json theme={null}
    {
      "parts": [
        {"partNumber": 1, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
        {"partNumber": 2, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
        ...
      ]
    }
    ```

    Each part URL expires after **15 minutes**. Request part URLs just-in-time rather than all at once for large uploads.
  </Step>

  <Step title="Upload each part">
    PUT each chunk to its presigned URL. Record the `ETag` from each response header:

    ```bash theme={null}
    # Upload part 1 (first 50MB)
    curl -X PUT "${partUrl1}" \
      -H "Content-Type: text/csv" \
      --data-binary @part1.bin \
      -D -  # Print response headers to capture ETag
    ```

    The `ETag` header in the response identifies the uploaded part.
  </Step>

  <Step title="Complete the upload">
    Send all part numbers and ETags to finalize:

    ```bash theme={null}
    curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-complete?s3Key=${s3Key}&uploadId=${uploadId}&path=huge-dataset.csv" \
      -H "x-api-key: kit_your-intake-token" \
      -H "Content-Type: application/json" \
      -d '{
        "parts": [
          {"partNumber": 1, "etag": "\"abc123...\""},
          {"partNumber": 2, "etag": "\"def456...\""},
          {"partNumber": 3, "etag": "\"ghi789...\""}
        ],
        "metadata": {"source": "data-pipeline"}
      }'
    ```

    The response is the created document family.
  </Step>
</Steps>

### Resuming an Interrupted Upload

If the upload is interrupted, you don't need to start over:

1. Keep the `uploadId` and `s3Key` from step 1
2. Request new part URLs for any parts that weren't uploaded
3. Upload only the missing parts
4. Complete with all parts (already-uploaded parts remain in S3)

### Aborting a Multipart Upload

To cancel an in-progress multipart upload and free S3 resources:

```bash theme={null}
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-abort?s3Key=${s3Key}&uploadId=${uploadId}" \
  -H "x-api-key: kit_your-intake-token"
```

Returns `204 No Content`.

## S3 Constraints

| Constraint            | Limit                   |
| --------------------- | ----------------------- |
| Minimum part size     | 5 MB (except last part) |
| Maximum parts         | 10,000                  |
| Maximum file size     | 5 TB                    |
| Part URL expiry       | 15 minutes              |
| Part URLs per request | 500                     |

## Python Example

```python theme={null}
import requests
import os
import math

BASE_URL = "https://platform.kodexa-enterprise.com"
TOKEN = "kit_your-intake-token"
HEADERS = {"x-api-key": TOKEN}
PART_SIZE = 50 * 1024 * 1024  # 50 MB per part

def upload_large_file(org_slug, intake_slug, file_path):
    file_size = os.path.getsize(file_path)
    filename = os.path.basename(file_path)

    if file_size <= 256 * 1024 * 1024:
        # Direct upload for files under 256 MB
        with open(file_path, "rb") as f:
            resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}",
                headers=HEADERS,
                files={"file": (filename, f)},
            )
        return resp.json()

    # Step 1: Initiate multipart upload
    resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-request",
        headers=HEADERS,
        params={"path": filename, "contentType": "application/octet-stream", "contentLength": file_size},
    )
    resp.raise_for_status()
    init = resp.json()
    s3_key = init["s3Key"]
    upload_id = init["uploadId"]

    # Step 2: Upload parts
    total_parts = math.ceil(file_size / PART_SIZE)
    completed_parts = []

    with open(file_path, "rb") as f:
        for batch_start in range(0, total_parts, 500):
            batch_end = min(batch_start + 500, total_parts)
            part_numbers = list(range(batch_start + 1, batch_end + 1))

            # Get presigned URLs for this batch
            urls_resp = requests.post(
                f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-part-urls",
                headers={**HEADERS, "Content-Type": "application/json"},
                params={"s3Key": s3_key, "uploadId": upload_id},
                json=part_numbers,
            )
            urls_resp.raise_for_status()
            part_urls = urls_resp.json()["parts"]

            for part_info in part_urls:
                chunk = f.read(PART_SIZE)
                put_resp = requests.put(part_info["uploadUrl"], data=chunk)
                put_resp.raise_for_status()
                completed_parts.append({
                    "partNumber": part_info["partNumber"],
                    "etag": put_resp.headers["ETag"],
                })
                print(f"  Uploaded part {part_info['partNumber']}/{total_parts}")

    # Step 3: Complete
    complete_resp = requests.post(
        f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-complete",
        headers={**HEADERS, "Content-Type": "application/json"},
        params={"s3Key": s3_key, "uploadId": upload_id, "path": filename},
        json={"parts": completed_parts},
    )
    complete_resp.raise_for_status()
    return complete_resp.json()
```

## TypeScript Example

```typescript theme={null}
const BASE_URL = "https://platform.kodexa-enterprise.com";
const TOKEN = "kit_your-intake-token";
const PART_SIZE = 50 * 1024 * 1024; // 50 MB

async function uploadLargeFile(orgSlug: string, intakeSlug: string, file: File) {
  const headers = { "x-api-key": TOKEN };

  if (file.size <= 256 * 1024 * 1024) {
    // Direct upload for files under 256 MB
    const form = new FormData();
    form.append("file", file);
    const resp = await fetch(`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}`, {
      method: "POST", headers, body: form,
    });
    return resp.json();
  }

  // Step 1: Initiate multipart upload
  const params = new URLSearchParams({
    path: file.name,
    contentType: file.type || "application/octet-stream",
    contentLength: String(file.size),
  });
  const initResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-request?${params}`,
    { method: "POST", headers },
  );
  const { s3Key, uploadId } = await initResp.json();

  // Step 2: Upload parts
  const totalParts = Math.ceil(file.size / PART_SIZE);
  const completedParts: { partNumber: number; etag: string }[] = [];

  for (let batchStart = 0; batchStart < totalParts; batchStart += 500) {
    const batchEnd = Math.min(batchStart + 500, totalParts);
    const partNumbers = Array.from({ length: batchEnd - batchStart }, (_, i) => batchStart + i + 1);

    const urlParams = new URLSearchParams({ s3Key, uploadId });
    const urlsResp = await fetch(
      `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-part-urls?${urlParams}`,
      { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(partNumbers) },
    );
    const { parts: partUrls } = await urlsResp.json();

    for (const partInfo of partUrls) {
      const start = (partInfo.partNumber - 1) * PART_SIZE;
      const end = Math.min(start + PART_SIZE, file.size);
      const chunk = file.slice(start, end);

      const putResp = await fetch(partInfo.uploadUrl, { method: "PUT", body: chunk });
      completedParts.push({ partNumber: partInfo.partNumber, etag: putResp.headers.get("ETag")! });
    }
  }

  // Step 3: Complete
  const completeParams = new URLSearchParams({ s3Key, uploadId, path: file.name });
  const completeResp = await fetch(
    `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-complete?${completeParams}`,
    {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify({ parts: completedParts }),
    },
  );
  return completeResp.json();
}
```

## Endpoint Reference

| Endpoint                                              | Method | Description                            |
| ----------------------------------------------------- | ------ | -------------------------------------- |
| `/api/intake/{org}/{slug}/presigned-upload-request`   | POST   | Get a presigned S3 PUT URL             |
| `/api/intake/{org}/{slug}/presigned-upload-complete`  | POST   | Finalize a presigned upload            |
| `/api/intake/{org}/{slug}/multipart-upload-request`   | POST   | Initiate a multipart upload            |
| `/api/intake/{org}/{slug}/multipart-upload-part-urls` | POST   | Get presigned URLs for parts (batched) |
| `/api/intake/{org}/{slug}/multipart-upload-complete`  | POST   | Finalize a multipart upload            |
| `/api/intake/{org}/{slug}/multipart-upload-abort`     | POST   | Cancel a multipart upload              |

All endpoints accept the same authentication as the direct intake upload: either an intake API token (`kit_*`) or a platform API key with appropriate permissions.
