Projects in Kodexa are powerful organizational units that combine various resources to address specific unstructured data use cases. They represent a dynamic configuration of components like Document Stores, Data Stores, and Data Definitions, integrated with unique features such as assistants. This article provides an introduction to Projects in Kodexa, explaining their purpose, structure, and how they facilitate efficient data processing workflows.
What is a Project?
A Project in Kodexa is essentially a framework that brings together different resources to solve a particular data-related challenge. It goes beyond the mere aggregation of these resources by introducing new concepts like assistants, which link resources together and create an active, operational workflow.
Creating and Managing Projects
Projects are owned by organizations within the Kodexa Platform. To work with projects, you first need to interact with an organization.
Finding an Organization
To find an organization, you can use the KodexaClient:
from kodexa.platform import KodexaClient
client = KodexaClient()
# List organizations
organizations = client.organizations.list(query='Philip').to_df()
print(organizations)
Listing Projects in an Organization
Once you have an organization, you can list its projects:
philips_organization = client.organizations.find_by_slug('philips-world')
projects = philips_organization.projects.list().to_df()
print(projects)
Finding a Specific Project
You can find a project by its name or ID:
my_project = philips_organization.projects.find_by_name('Dae Similar Formats')
print(my_project.id)
Associated Components
Projects in Kodexa are logical groupings of components. These components have accessors from the project endpoint instance that allow you to work with them.
Document Stores
For example, to list the document stores associated with a project:
document_stores = my_project.document_stores.to_df()
print(document_stores)
This will display information about the document stores, including their names, types, purposes, and other metadata.
Project Structure
A typical project in Kodexa includes:
- Document Stores: For storing and managing documents
- Data Stores: For storing extracted or processed data
- Data Definitions: Schemas defining the structure of data
- Assistants: AI-powered helpers that can perform various tasks
- Pipelines: Automated workflows for data processing
- Model Services: Machine learning models for data analysis and extraction
Benefits of Using Projects
- Organized Workflow: Projects provide a structured approach to handling complex data processing tasks.
- Resource Integration: They allow seamless integration of various Kodexa resources.
- Collaboration: Team members can work together within the context of a project.
- Scalability: Projects can be easily scaled to handle larger datasets or more complex workflows.
- Reusability: Components and configurations can be reused across different projects.
Conclusion
Projects in the Kodexa Platform offer a comprehensive solution for managing unstructured data processing workflows. By combining various resources and introducing intelligent assistants, they provide a powerful framework for tackling complex data challenges. Whether you're extracting information from invoices or training models on similar document formats, Kodexa Projects offer the flexibility and functionality to streamline your data processing tasks.
← Previous
Next →