Documentation Index
Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
Use this file to discover all available pages before exploring further.
Direct uploads to an intake endpoint are limited to 256 MB by the API gateway. For larger files, Kodexa provides two mechanisms:
- Presigned URL upload — the server generates a temporary S3 URL; the client uploads directly to S3, bypassing the API gateway entirely
- Multipart upload — the file is split into chunks (parts), each uploaded independently via presigned URLs, then assembled server-side
Both require the Enable large uploads option to be turned on in the intake settings.
Enabling Large Uploads
In Kodexa Studio, open the intake settings and check Enable large uploads. This exposes the presigned and multipart upload endpoints for that intake. When disabled (the default), these endpoints return 403 Forbidden.
Large uploads use the same intake API tokens as direct uploads — no additional credentials are needed.
Presigned URL Upload
Best for: files between 256 MB and 5 GB over reliable connections.
Request a presigned URL
Call the presigned upload request endpoint with the file details:curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-request?path=large-report.pdf&contentType=application/pdf&contentLength=524288000" \
-H "x-api-key: kit_your-intake-token"
Response:{
"uploadUrl": "https://s3.amazonaws.com/bucket/presigned-uploads/...",
"s3Key": "presigned-uploads/intake-id/upload-id/large-report.pdf",
"uploadId": "upload-id",
"expiresAt": "2026-05-05T15:00:00Z"
}
The presigned URL expires after 15 minutes. Upload to S3
PUT the file directly to the presigned URL. This bypasses the Kodexa API — the file goes straight to S3:curl -X PUT "${uploadUrl}" \
-H "Content-Type: application/pdf" \
--data-binary @large-report.pdf
You can track upload progress using standard HTTP upload progress mechanisms. Complete the upload
Tell Kodexa to process the uploaded file:curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/presigned-upload-complete?s3Key=${s3Key}&path=large-report.pdf" \
-H "x-api-key: kit_your-intake-token" \
-H "Content-Type: application/json" \
-d '{
"metadata": {"department": "finance"},
"labels": "REVIEW",
"documentVersion": "1.0"
}'
The response is the created document family, identical to a direct upload response. All intake features work normally: metadata merging, intake scripts, activity plans, knowledge features.
Complete Request Body
The presigned complete endpoint accepts an optional JSON body with the same parameters as a direct upload:
| Field | Type | Description |
|---|
metadata | object | Per-upload metadata (highest priority in merge) |
extraFields | object | Additional metadata fields (lowest priority in merge) |
externalData | object | External data injected into the KDDB document |
labels | string | Comma-separated labels |
statusId | string | Document status ID |
knowledgeFeatures | string | Knowledge feature IDs JSON |
documentVersion | string | Document version |
filename | string | Original filename (used by intake scripts; defaults to path basename) |
Multipart Upload
Best for: files over 5 GB, unreliable connections, or when you need resumability.
Multipart uploads split the file into chunks (minimum 5 MB each, except the last chunk) and upload each independently. If the connection drops, you only re-upload the failed chunks.
Initiate the upload
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-request?path=huge-dataset.csv&contentType=text/csv&contentLength=10737418240" \
-H "x-api-key: kit_your-intake-token"
Response:{
"uploadId": "multipart-upload-id",
"s3Key": "presigned-uploads/intake-id/upload-id/huge-dataset.csv",
"expiresAt": "2026-05-05T15:00:00Z"
}
Get part URLs (batched)
Request presigned URLs for multiple parts in one call. Each part should be at least 5 MB (except the last). You can request up to 500 part URLs per call.curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-part-urls?s3Key=${s3Key}&uploadId=${uploadId}" \
-H "x-api-key: kit_your-intake-token" \
-H "Content-Type: application/json" \
-d '[1, 2, 3, 4, 5]'
Response:{
"parts": [
{"partNumber": 1, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
{"partNumber": 2, "uploadUrl": "https://s3...", "expiresAt": "2026-05-05T15:15:00Z"},
...
]
}
Each part URL expires after 15 minutes. Request part URLs just-in-time rather than all at once for large uploads. Upload each part
PUT each chunk to its presigned URL. Record the ETag from each response header:# Upload part 1 (first 50MB)
curl -X PUT "${partUrl1}" \
-H "Content-Type: text/csv" \
--data-binary @part1.bin \
-D - # Print response headers to capture ETag
The ETag header in the response identifies the uploaded part. Complete the upload
Send all part numbers and ETags to finalize:curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-complete?s3Key=${s3Key}&uploadId=${uploadId}&path=huge-dataset.csv" \
-H "x-api-key: kit_your-intake-token" \
-H "Content-Type: application/json" \
-d '{
"parts": [
{"partNumber": 1, "etag": "\"abc123...\""},
{"partNumber": 2, "etag": "\"def456...\""},
{"partNumber": 3, "etag": "\"ghi789...\""}
],
"metadata": {"source": "data-pipeline"}
}'
The response is the created document family.
Resuming an Interrupted Upload
If the upload is interrupted, you don’t need to start over:
- Keep the
uploadId and s3Key from step 1
- Request new part URLs for any parts that weren’t uploaded
- Upload only the missing parts
- Complete with all parts (already-uploaded parts remain in S3)
Aborting a Multipart Upload
To cancel an in-progress multipart upload and free S3 resources:
curl -X POST "https://platform.kodexa-enterprise.com/api/intake/my-org/my-intake/multipart-upload-abort?s3Key=${s3Key}&uploadId=${uploadId}" \
-H "x-api-key: kit_your-intake-token"
Returns 204 No Content.
S3 Constraints
| Constraint | Limit |
|---|
| Minimum part size | 5 MB (except last part) |
| Maximum parts | 10,000 |
| Maximum file size | 5 TB |
| Part URL expiry | 15 minutes |
| Part URLs per request | 500 |
Python Example
import requests
import os
import math
BASE_URL = "https://platform.kodexa-enterprise.com"
TOKEN = "kit_your-intake-token"
HEADERS = {"x-api-key": TOKEN}
PART_SIZE = 50 * 1024 * 1024 # 50 MB per part
def upload_large_file(org_slug, intake_slug, file_path):
file_size = os.path.getsize(file_path)
filename = os.path.basename(file_path)
if file_size <= 256 * 1024 * 1024:
# Direct upload for files under 256 MB
with open(file_path, "rb") as f:
resp = requests.post(
f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}",
headers=HEADERS,
files={"file": (filename, f)},
)
return resp.json()
# Step 1: Initiate multipart upload
resp = requests.post(
f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-request",
headers=HEADERS,
params={"path": filename, "contentType": "application/octet-stream", "contentLength": file_size},
)
resp.raise_for_status()
init = resp.json()
s3_key = init["s3Key"]
upload_id = init["uploadId"]
# Step 2: Upload parts
total_parts = math.ceil(file_size / PART_SIZE)
completed_parts = []
with open(file_path, "rb") as f:
for batch_start in range(0, total_parts, 500):
batch_end = min(batch_start + 500, total_parts)
part_numbers = list(range(batch_start + 1, batch_end + 1))
# Get presigned URLs for this batch
urls_resp = requests.post(
f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-part-urls",
headers={**HEADERS, "Content-Type": "application/json"},
params={"s3Key": s3_key, "uploadId": upload_id},
json=part_numbers,
)
urls_resp.raise_for_status()
part_urls = urls_resp.json()["parts"]
for part_info in part_urls:
chunk = f.read(PART_SIZE)
put_resp = requests.put(part_info["uploadUrl"], data=chunk)
put_resp.raise_for_status()
completed_parts.append({
"partNumber": part_info["partNumber"],
"etag": put_resp.headers["ETag"],
})
print(f" Uploaded part {part_info['partNumber']}/{total_parts}")
# Step 3: Complete
complete_resp = requests.post(
f"{BASE_URL}/api/intake/{org_slug}/{intake_slug}/multipart-upload-complete",
headers={**HEADERS, "Content-Type": "application/json"},
params={"s3Key": s3_key, "uploadId": upload_id, "path": filename},
json={"parts": completed_parts},
)
complete_resp.raise_for_status()
return complete_resp.json()
TypeScript Example
const BASE_URL = "https://platform.kodexa-enterprise.com";
const TOKEN = "kit_your-intake-token";
const PART_SIZE = 50 * 1024 * 1024; // 50 MB
async function uploadLargeFile(orgSlug: string, intakeSlug: string, file: File) {
const headers = { "x-api-key": TOKEN };
if (file.size <= 256 * 1024 * 1024) {
// Direct upload for files under 256 MB
const form = new FormData();
form.append("file", file);
const resp = await fetch(`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}`, {
method: "POST", headers, body: form,
});
return resp.json();
}
// Step 1: Initiate multipart upload
const params = new URLSearchParams({
path: file.name,
contentType: file.type || "application/octet-stream",
contentLength: String(file.size),
});
const initResp = await fetch(
`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-request?${params}`,
{ method: "POST", headers },
);
const { s3Key, uploadId } = await initResp.json();
// Step 2: Upload parts
const totalParts = Math.ceil(file.size / PART_SIZE);
const completedParts: { partNumber: number; etag: string }[] = [];
for (let batchStart = 0; batchStart < totalParts; batchStart += 500) {
const batchEnd = Math.min(batchStart + 500, totalParts);
const partNumbers = Array.from({ length: batchEnd - batchStart }, (_, i) => batchStart + i + 1);
const urlParams = new URLSearchParams({ s3Key, uploadId });
const urlsResp = await fetch(
`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-part-urls?${urlParams}`,
{ method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(partNumbers) },
);
const { parts: partUrls } = await urlsResp.json();
for (const partInfo of partUrls) {
const start = (partInfo.partNumber - 1) * PART_SIZE;
const end = Math.min(start + PART_SIZE, file.size);
const chunk = file.slice(start, end);
const putResp = await fetch(partInfo.uploadUrl, { method: "PUT", body: chunk });
completedParts.push({ partNumber: partInfo.partNumber, etag: putResp.headers.get("ETag")! });
}
}
// Step 3: Complete
const completeParams = new URLSearchParams({ s3Key, uploadId, path: file.name });
const completeResp = await fetch(
`${BASE_URL}/api/intake/${orgSlug}/${intakeSlug}/multipart-upload-complete?${completeParams}`,
{
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({ parts: completedParts }),
},
);
return completeResp.json();
}
Endpoint Reference
| Endpoint | Method | Description |
|---|
/api/intake/{org}/{slug}/presigned-upload-request | POST | Get a presigned S3 PUT URL |
/api/intake/{org}/{slug}/presigned-upload-complete | POST | Finalize a presigned upload |
/api/intake/{org}/{slug}/multipart-upload-request | POST | Initiate a multipart upload |
/api/intake/{org}/{slug}/multipart-upload-part-urls | POST | Get presigned URLs for parts (batched) |
/api/intake/{org}/{slug}/multipart-upload-complete | POST | Finalize a multipart upload |
/api/intake/{org}/{slug}/multipart-upload-abort | POST | Cancel a multipart upload |
All endpoints accept the same authentication as the direct intake upload: either an intake API token (kit_*) or a platform API key with appropriate permissions.