> ## Documentation Index > Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt > Use this file to discover all available pages before exploring further. # Large File Uploads > Upload files over 256 MB to Kodexa intakes using presigned S3 URLs for single-file uploads or multipart uploads for resumable, chunked transfers. Direct uploads to an intake endpoint are limited to **256 MB** by the API gateway. For larger files, Kodexa provides two mechanisms: * **Presigned URL upload** — the server generates a temporary S3 URL; the client uploads directly to S3, bypassing the API gateway entirely * **Multipart upload** — the file is split into chunks (parts), each uploaded independently via presigned URLs, then assembled server-side Both require the **Enable large uploads** option to be turned on in the intake settings. ## Enabling Large Uploads In Kodexa Studio, open the intake settings and check **Enable large uploads**. This exposes the presigned and multipart upload endpoints for that intake. When disabled (the default), these endpoints return `403 Forbidden`. Large uploads use the same intake API tokens as direct uploads — no additional credentials are needed. ## Presigned URL Upload Best for: files between 256 MB and 5 GB over reliable connections. Call the presigned upload request endpoint with the file details: ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-request?path=large-report.pdf&contentType=application/pdf&contentLength=524288000" \ -H "x-api-key: kit_your-intake-token" ``` Response: ```json theme={null} { "uploadUrl": "https://s3.amazonaws.com/bucket/presigned-uploads/...", "s3Key": "presigned-uploads/intake-id/upload-id/large-report.pdf", "uploadId": "upload-id", "expiresAt": "2026-05-05T15:00:00Z" } ``` The presigned URL expires after **15 minutes**. PUT the file directly to the presigned URL. This bypasses the Kodexa API — the file goes straight to S3: ```bash theme={null} curl -X PUT "${uploadUrl}" \ -H "Content-Type: application/pdf" \ --data-binary @large-report.pdf ``` You can track upload progress using standard HTTP upload progress mechanisms. Tell Kodexa to process the uploaded file: ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-complete?s3Key=${s3Key}&path=large-report.pdf" \ -H "x-api-key: kit_your-intake-token" \ -H "Content-Type: application/json" \ -d '{ "metadata": {"department": "finance"}, "labels": "REVIEW", "documentVersion": "1.0" }' ``` The response is the created document family, identical to a direct upload response. All intake features work normally: metadata merging, intake scripts, activity plans, knowledge features. ### Complete Request Body The presigned complete endpoint accepts an optional JSON body with the same parameters as a direct upload: | Field | Type | Description | | ------------------- | ------ | --------------------------------------------------------------------- | | `metadata` | object | Per-upload metadata (highest priority in merge) | | `extraFields` | object | Additional metadata fields (lowest priority in merge) | | `externalData` | object | External data injected into the KDDB document | | `labels` | string | Comma-separated labels | | `statusId` | string | Document status ID | | `knowledgeFeatures` | string | Knowledge feature IDs JSON | | `documentVersion` | string | Document version | | `filename` | string | Original filename (used by intake scripts; defaults to path basename) | ## Multipart Upload Best for: files over 5 GB, unreliable connections, or when you need resumability. Multipart uploads split the file into chunks (minimum 5 MB each, except the last chunk) and upload each independently. If the connection drops, you only re-upload the failed chunks. ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-request?path=huge-dataset.csv&contentType=text/csv&contentLength=10737418240" \ -H "x-api-key: kit_your-intake-token" ``` Response: ```json theme={null} { "uploadId": "multipart-upload-id", "s3Key": "presigned-uploads/intake-id/upload-id/huge-dataset.csv", "expiresAt": "2026-05-05T15:00:00Z" } ``` Request presigned URLs for multiple parts in one call. Each part should be at least **5 MB** (except the last). You can request up to **500** part URLs per call. ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-part-urls?s3Key=${s3Key}&uploadId=${uploadId}" \ -H "x-api-key: kit_your-intake-token" \ -H "Content-Type: application/json" \ -d '[1, 2, 3, 4, 5]' ``` Response: ```json theme={null} { "parts": [ {"partNumber": 1, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"}, {"partNumber": 2, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"}, ... ] } ``` Each part URL expires after **15 minutes**. Request part URLs just-in-time rather than all at once for large uploads. PUT each chunk to its presigned URL. Record the `ETag` from each response header: ```bash theme={null} # Upload part 1 (first 50MB) curl -X PUT "${partUrl1}" \ -H "Content-Type: text/csv" \ --data-binary @part1.bin \ -D - # Print response headers to capture ETag ``` The `ETag` header in the response identifies the uploaded part. Send all part numbers and ETags to finalize: ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-complete?s3Key=${s3Key}&uploadId=${uploadId}&path=huge-dataset.csv" \ -H "x-api-key: kit_your-intake-token" \ -H "Content-Type: application/json" \ -d '{ "parts": [ {"partNumber": 1, "etag": "\"abc123...\""}, {"partNumber": 2, "etag": "\"def456...\""}, {"partNumber": 3, "etag": "\"ghi789...\""} ], "metadata": {"source": "data-pipeline"} }' ``` The response is the created document family. ### Resuming an Interrupted Upload If the upload is interrupted, you don't need to start over: 1. Keep the `uploadId` and `s3Key` from step 1 2. Request new part URLs for any parts that weren't uploaded 3. Upload only the missing parts 4. Complete with all parts (already-uploaded parts remain in S3) ### Aborting a Multipart Upload To cancel an in-progress multipart upload and free S3 resources: ```bash theme={null} curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-abort?s3Key=${s3Key}&uploadId=${uploadId}" \ -H "x-api-key: kit_your-intake-token" ``` Returns `204 No Content`. ## S3 Constraints | Constraint | Limit | | --------------------- | ----------------------- | | Minimum part size | 5 MB (except last part) | | Maximum parts | 10,000 | | Maximum file size | 5 TB | | Part URL expiry | 15 minutes | | Part URLs per request | 500 | ## Python Example ```python theme={null} import requests import os import math BASE_URL = "https://platform.kodexa-enterprise.com" TOKEN = "kit_your-intake-token" HEADERS = {"x-api-key": TOKEN} PART_SIZE = 50 * 1024 * 1024 # 50 MB per part def upload_large_file(org_slug, intake_slug, file_path): file_size = os.path.getsize(file_path) filename = os.path.basename(file_path) if file_size <= 256 * 1024 * 1024: # Direct upload for files under 256 MB with open(file_path, "rb") as f: resp = requests.post( f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}", headers=HEADERS, files={"file": (filename, f)}, ) return resp.json() # Step 1: Initiate multipart upload resp = requests.post( f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-request", headers=HEADERS, params={"path": filename, "contentType": "application/octet-stream", "contentLength": file_size}, ) resp.raise_for_status() init = resp.json() s3_key = init["s3Key"] upload_id = init["uploadId"] # Step 2: Upload parts total_parts = math.ceil(file_size / PART_SIZE) completed_parts = [] with open(file_path, "rb") as f: for batch_start in range(0, total_parts, 500): batch_end = min(batch_start + 500, total_parts) part_numbers = list(range(batch_start + 1, batch_end + 1)) # Get presigned URLs for this batch urls_resp = requests.post( f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-part-urls", headers={**HEADERS, "Content-Type": "application/json"}, params={"s3Key": s3_key, "uploadId": upload_id}, json=part_numbers, ) urls_resp.raise_for_status() part_urls = urls_resp.json()["parts"] for part_info in part_urls: chunk = f.read(PART_SIZE) put_resp = requests.put(part_info["uploadUrl"], data=chunk) put_resp.raise_for_status() completed_parts.append({ "partNumber": part_info["partNumber"], "etag": put_resp.headers["ETag"], }) print(f" Uploaded part {part_info['partNumber']}/{total_parts}") # Step 3: Complete complete_resp = requests.post( f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-complete", headers={**HEADERS, "Content-Type": "application/json"}, params={"s3Key": s3_key, "uploadId": upload_id, "path": filename}, json={"parts": completed_parts}, ) complete_resp.raise_for_status() return complete_resp.json() ``` ## TypeScript Example ```typescript theme={null} const BASE_URL = "https://platform.kodexa-enterprise.com"; const TOKEN = "kit_your-intake-token"; const PART_SIZE = 50 * 1024 * 1024; // 50 MB async function uploadLargeFile(orgSlug: string, intakeSlug: string, file: File) { const headers = { "x-api-key": TOKEN }; if (file.size <= 256 * 1024 * 1024) { // Direct upload for files under 256 MB const form = new FormData(); form.append("file", file); const resp = await fetch(`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}`, { method: "POST", headers, body: form, }); return resp.json(); } // Step 1: Initiate multipart upload const params = new URLSearchParams({ path: file.name, contentType: file.type || "application/octet-stream", contentLength: String(file.size), }); const initResp = await fetch( `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-request?${params}`, { method: "POST", headers }, ); const { s3Key, uploadId } = await initResp.json(); // Step 2: Upload parts const totalParts = Math.ceil(file.size / PART_SIZE); const completedParts: { partNumber: number; etag: string }[] = []; for (let batchStart = 0; batchStart < totalParts; batchStart += 500) { const batchEnd = Math.min(batchStart + 500, totalParts); const partNumbers = Array.from({ length: batchEnd - batchStart }, (_, i) => batchStart + i + 1); const urlParams = new URLSearchParams({ s3Key, uploadId }); const urlsResp = await fetch( `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-part-urls?${urlParams}`, { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(partNumbers) }, ); const { parts: partUrls } = await urlsResp.json(); for (const partInfo of partUrls) { const start = (partInfo.partNumber - 1) * PART_SIZE; const end = Math.min(start + PART_SIZE, file.size); const chunk = file.slice(start, end); const putResp = await fetch(partInfo.uploadUrl, { method: "PUT", body: chunk }); completedParts.push({ partNumber: partInfo.partNumber, etag: putResp.headers.get("ETag")! }); } } // Step 3: Complete const completeParams = new URLSearchParams({ s3Key, uploadId, path: file.name }); const completeResp = await fetch( `${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-complete?${completeParams}`, { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify({ parts: completedParts }), }, ); return completeResp.json(); } ``` ## Endpoint Reference | Endpoint | Method | Description | | ----------------------------------------------------- | ------ | -------------------------------------- | | `/api/intake/{org}/{slug}/presigned-upload-request` | POST | Get a presigned S3 PUT URL | | `/api/intake/{org}/{slug}/presigned-upload-complete` | POST | Finalize a presigned upload | | `/api/intake/{org}/{slug}/multipart-upload-request` | POST | Initiate a multipart upload | | `/api/intake/{org}/{slug}/multipart-upload-part-urls` | POST | Get presigned URLs for parts (batched) | | `/api/intake/{org}/{slug}/multipart-upload-complete` | POST | Finalize a multipart upload | | `/api/intake/{org}/{slug}/multipart-upload-abort` | POST | Cancel a multipart upload | All endpoints accept the same authentication as the direct intake upload: either an intake API token (`kit_*`) or a platform API key with appropriate permissions.