Store Purposes
There are two main purposes for a document store:- To hold documents that we will be using for training models
- To hold documents that we will be using to extract data
storePurpose
property that can be set to either TRAINING
or OPERATIONAL
. This is used to determine which documents are available for use in the store. The actual functionality of the store itself is the same regardless of the purpose.
Anatomy of a Document Family
A document family consists of a document and any of the derived documents that are created from it. Since a document family can contain both a native PDF and also the Kodexa Documents derived from it, we have a stereotype we call a content object. A content object points to something that contains content. This can be a file or a document, the content type on the content object is then either ‘Document’ or ‘Native’. In this case ‘Native’ means the original file, since it could be of any file type. The document family holds the list of content objects and also a concept called “Document Transitions”. A document transition is a link between two content objects that shows how a content object was derived from another content object, and which assistant (or user) was responsible for the derivation.Store Options
The document store has a number of options that can be set to control how it behaves. These are set on the store object and are:highQualityPreview
- If set totrue
then the store will generate high quality previews of the documents. This will increase the time it takes to generate the previews but will result in better quality previews. The default value isfalse
. This setting is used in the UI.searchable
- If set totrue
then the store will be searchable. This means that the platform will pass content from document to indexing.deleteProtection
- If set totrue
then the store will be protected from deletion. This means that you can’t delete the store or delete all its contents. However, you can still delete documents from the store.
Document Properties
You can specify document properties, these will be shown to the user using the options when they are uploading a file to the document store. This is a good way to capture information in the document family metadata that you can use later.Expression Labels
When a document (either a native file or a Kodexa document) is added to a Document store, we want to have the ability to determine if we want to add a label to it. This can be achieved with Label Expressions. A label expression allows you to, on a document store, add a specific label to the new document based on the results of an expression. The expression itself is actually a Spring Expression Language (https://docs.spring.io/spring-framework/docs/3.2.x/spring-framework-reference/html/expressions.html) expression. This can allow for a use-case where the application that is uploading the document to the platform can include metadata with the upload. This metadata (as well as the document and document family) are then available for the expression to use. Let’s say we have an application that is uploading documents to an instance of Kodexa. When the upload is associating a value in metadata called “ShouldPublishXml”, the value can beTrue
or False
. As we load the document into the document store, we want to determine if this metadata flag is present, and if it is there and not set to True
we want to add a label dont_publish
to the document. In order to do this, we will want to create a label expression at the document store level that has properties:
label: dont_publish
expression:
File Upload API Documentation
This documentation demonstrates how to upload files to the document store using different programming languages and tools.API Endpoint
path
: The target path for the uploaded file (query parameter)file
: The file contentdocument
: (Optional) Document metadata in KDDB format- Additional metadata can be included as form fields
Store Reference Format
The store reference follows the format:<org-slug>/<store-slug>/<store-version>
For example: demo-org/my-store/1.0.0
Authentication
All requests must include theX-ACCESS-TOKEN
header with a valid access token:
Examples
cURL/Bash
Basic file upload:JavaScript
Using the Fetch API:C#
Using HttpClient:Response
A successful upload returns a JSON response containing the document family details:Error Handling
The API uses standard HTTP status codes:- 200: Success
- 400: Bad Request (invalid parameters)
- 401: Unauthorized (invalid access token)
- 409: Conflict (file already exists when replace=false)
- 500: Server Error