Documents and Data

Documents and Data

Documents and Data Stores in Kodexa

The Kodexa platform incorporates a unique system called "stores" to manage and store information efficiently.

This system is divided into two primary types of stores:

  1. Document Stores: These are specialized in storing files and their corresponding document representations.
  2. Data Stores: These focus on holding extracted data objects and attributes identified within documents stored in a Document Store.

Concept and Design

Stores in Kodexa are designed to manage both native files and their associated "Document" representations, which contain unstructured data. The process involves defining a Data Structure to label documents, enabling the platform to convert these labeled documents into a structured format.

Detailed Explanation of Store Types

Document Stores

Document Stores play a pivotal role in managing files that are subject to parsing, labeling, and conversion into structured data. The term "document" here implies that upon uploading a file (like a PDF), Kodexa creates a "container." This container holds:

  • The original file (referred to as the native file).
  • One or more Kodexa Documents representing the semi-structured version of the native file.

These containers, known as Document Families, are integral as they encompass both the native files and their document representations. This setup allows for the independent labeling of documents by models or humans.

Data Stores

Data Stores are engineered to manage structured data extracted from labeled documents stored in a Document Store. They are interconnected with a Data Structure (internally termed as a Taxonomy), which:

  • Formalizes data structure into groups and individual data attributes.
  • Stores actual data points and their related groups in the Data Store, with a lineage tracing back to the document representation in the Document Store.

Interrelation of Store Types

Kodexa is strategically designed to facilitate the transformation of a document from a native file to a structured data set. This transformation process involves both Document Stores and Data Stores. Below is an overview of how each of these two store types interact within the Kodexa ecosystem.

Document StoresDocument StoresData StoresData Stores