The knowledge global exposes the knowledge features and items that have been attached to a document family. Scripts can read what the knowledge engine has resolved for a family without re-running resolution, then dispatch downstream behaviour from a single source of truth.
knowledge is available in script steps only — not in intake scripts, event subscriptions, or selection-option formulas. It is read-only: every returned object is Object.freeze-d, and there is no setter or mutate API.
How to think about it
Knowledge in Kodexa is a declarative way to describe what a document is and what should happen to it — kept separate from the script’s runtime state. Two pieces work together:
- Knowledge features are facts about a document family (vendor ID, language, document classification). They are attached to the family early in processing — typically by an upstream script step that reads upstream metadata, or by an intake module.
- Knowledge sets are org-level rules that match on features and produce knowledge items with rich, type-checked properties: configuration, prompts, validation rules, processing models, anything you want a downstream script to consume.
When a script step runs, the knowledge engine has already evaluated the project-scoped sets and materialised the resulting items into the document family. Your script reads those items via the knowledge global — it does not re-implement the matching logic.
The mental model is declare configuration as knowledge, dispatch in the script:
[upstream] [knowledge engine] [your script]
───── ──────── ──────
attach features ──► match against knowledge sets ──► read items, dispatch
Why we built it this way
Without knowledge, a script that needs vendor- or document-type-specific behaviour has to do one of:
- Hard-code lookup tables inside the script body.
- Round-trip to a service bridge for every document.
- Stash configuration in environment variables or module options.
- Re-implement set-matching logic by reading raw features and writing JS conditionals.
All of these put both logic and data in the script — making it harder to update one without redeploying the other. The knowledge global is designed so you can keep configuration in org metadata (versioned, reviewable, separately deployed) and keep the script thin enough that it rarely changes.
What knowledge is and isn’t
| Knowledge is good for | Knowledge is not for |
|---|
| Per-vendor / per-tenant configuration | Per-document runtime state (use script-local variables) |
| Lookup tables that change without code | Module-wide config (use module options) |
| Decoupling org-level policy from code | Cross-step communication (use the {features: [...]} return contract or doc metadata) |
| Audit trails of “what rules applied here” | High-frequency mutable state |
If the answer to “where does this value come from?” is “an org admin configures it in the UI”, knowledge is probably the right home. If the answer is “this script computes it for this document”, keep it in the script.
API surface
The global has four explicit accessors and four bare-form conveniences.
Bare accessors (single-family scripts)
When exactly one document family is in scope, scripts can use the bare forms. These are the 80% path for activity-plan script steps that operate on one family at a time.
| Accessor | Returns |
|---|
knowledge.features | KnowledgeFeature[] — all features attached to the in-scope family |
knowledge.items | KnowledgeItem[] — all knowledge items resolved on the in-scope family |
knowledge.featuresByType(typeSlug) | KnowledgeFeature[] filtered by featureType.slug === typeSlug |
knowledge.itemsByType(typeSlug) | KnowledgeItem[] filtered by itemType.slug === typeSlug |
Explicit accessors (always available)
| Accessor | Returns |
|---|
knowledge.getFeatures(familyId) | KnowledgeFeature[] for the given family |
knowledge.getItems(familyId) | KnowledgeItem[] for the given family |
knowledge.getFeaturesByType(familyId, typeSlug) | filtered features |
knowledge.getItemsByType(familyId, typeSlug) | filtered items |
The explicit forms are required when the script’s families slice has anything other than exactly one entry. The familyId you pass MUST be in the script’s families slice (see Family-access scoping).
Object shapes
Annotations below use TypeScript-style notation. Each returned array element is a frozen plain JS object with exactly these properties.
KnowledgeFeature
{
id: string; // primary key of kdxa_knowledge_features
uuid: string; // stable UUID
slug: string; // computed from feature type + properties
active: boolean; // is_active column; not filtered server-side
properties: object | null; // free-form map; shape depends on featureType
extendedProperties: object | null; // free-form map; shape depends on featureType
featureType: KnowledgeFeatureType | null; // null if type was deleted
}
KnowledgeFeatureType (sub-record on each feature)
{
slug: string; // type identifier (lowercase by convention)
name: string; // display name
description: string | null; // long-form description
icon: string | null; // icon ref (e.g. "tag", "barcode")
color: string | null; // hex or theme colour ref
options: KnowledgeOption[] | null; // schema for `properties`
extendedOptions: KnowledgeOption[] | null; // schema for `extendedProperties`
labelJsonPath: string | null; // JSONPath / JSONata expression for human label
useJSONata: boolean; // true => labelJsonPath is JSONata, not JSONPath
}
KnowledgeItem
{
id: string; // KDDB-side identifier
uuid: string; // stable UUID
slug: string; // type-scoped slug
title: string; // display title
description: string | null; // long-form description
active: boolean; // is_active flag from the resolved set
sequenceOrder: number; // ordering hint within the type
properties: object | null; // free-form map; shape depends on itemType
knowledgeSetId: string | null; // owning knowledge set
itemType: KnowledgeItemType | null; // null if type was deleted
}
KnowledgeItemType (sub-record on each item)
{
slug: string; // type identifier (lowercase by convention)
name: string; // display name
description: string | null; // long-form description
options: KnowledgeOption[] | null; // schema for `properties`
supportsAttachment: boolean; // whether this type accepts file attachments
}
KnowledgeOption (schema entries inside options / extendedOptions)
{
name: string; // property key
type: string; // "string", "boolean", "selection", ...
label: string; // display label
description: string; // help text
required: boolean;
default: any;
possibleValues: any[]; // present for "selection" type
}
properties is free-form. It is whatever shape the type author defined in options. If a type defines instructionMarkdown, domain, pipeline, etc., those keys live on properties — they are NOT separate top-level fields on the feature or item. To find what keys a type uses, inspect featureType.options (or itemType.options) and look at each option’s name.// Right
const route = knowledge.itemsByType("processing-model")[0];
const pipeline = route.properties.pipeline; // correct
// Wrong -- will be undefined
const pipeline = route.pipeline; // there is no top-level pipeline field
Single-family rule
The bare accessors (knowledge.features, knowledge.items, knowledge.featuresByType, knowledge.itemsByType) auto-bind to families[0] when exactly one family is in scope. Anything else throws:
knowledge.<accessor> requires exactly one document family in scope (found N). Use knowledge.<explicit-accessor>(familyId) explicitly for multi-family scripts.
For multi-family script steps, you MUST iterate over families and use the explicit forms:
for (var i = 0; i < families.length; i++) {
var fam = families[i];
var feats = knowledge.getFeatures(fam.id);
var route = knowledge.getItemsByType(fam.id, "processing-model")[0];
// ...
}
Family-access scoping
Even with the explicit accessors, the familyId you pass must be present in the script’s families slice (the families attached to the script’s task). Calling with a foreign family throws:
knowledge: familyId "<id>" is not in scope for this script
This is the primary tenant-isolation defence: the underlying KDDB loader and feature query are keyed by document_family_id only, so this check prevents a script from reading another tenant’s data by guessing UUIDs.
Write protection
Every object returned by knowledge.* is Object.freeze-d, recursively. Nested arrays and sub-records (e.g. featureType, itemType, properties) are also frozen.
| JS mode | Mutate attempt |
|---|
| Strict mode | throws TypeError |
| Sloppy mode | silently no-ops |
In both cases Object.isFrozen(obj) returns true. The contract is: knowledge is read-only from scripts. If you need to compute a derived value, build a new object instead of mutating the returned one:
var feats = knowledge.features;
// var enriched = feats[0]; // frozen, can't assign new keys
// enriched.computed = "x"; // throws in strict mode
var copy = Object.assign({}, feats[0], { computed: "x" }); // OK -- new object
Per-script-execution caches keyed by family ID make repeated calls cheap.
| Resource | First-call cost | Subsequent calls (same family) |
|---|
| Features (bare or explicit) | 1 SQL query joining kdxa_document_family_features -> kdxa_knowledge_features -> kdxa_knowledge_feature_types | map hit, no I/O |
| Items (bare or explicit) | 1 KDDB document load + 1 read of getKnowledge() | map hit, no I/O |
featuresByType / itemsByType | filters the cached full list — no extra queries | map hit, no I/O |
Calling knowledge.featuresByType("vendor") ten times in the same script runs one query.
The KDDB load behind knowledge.items is a separate loader from loadDocument(familyId). If a script calls both, the KDDB is loaded twice (once for each cache). This is acceptable today but may be consolidated in a future revision.
Item ordering
knowledge.items and knowledge.itemsByType always return items sorted by:
sequenceOrder ASC
- ties broken by
slug ASC
This is a documented contract — scripts that index by position (e.g. items[0]) may rely on it. Without this, scripts would be at the mercy of whatever order the KDDB happens to return.
Features have no documented ordering — they come back in whatever order the SQL join returns them.
Active-features policy
The bindings return all features and items regardless of the active flag, with the boolean exposed on each instance. If you want only-active records, filter in JS:
var activeOnly = knowledge.features.filter(function (f) { return f.active; });
var activeRoutes = knowledge.itemsByType("processing-model").filter(function (it) { return it.active; });
The rationale: scripts that audit, log, or report on inactive records need to see them too. Filtering up-front would have been a footgun.
Empty-result handling
Filtered accessors return [] when nothing matches. Indexing [0] on an empty array yields undefined, and any property access after that throws. Always guard:
var route = knowledge.itemsByType("processing-model")[0];
if (!route) {
throw new Error("no processing-model resolved -- vendor missing or unmapped");
}
// safe to use route.properties from here on
Null featureType / itemType
In pathological cases (e.g. the type was deleted via direct DB access despite the immutability claims) the enrichment lookup returns no row. The binding surfaces this as featureType: null (or itemType: null) on the affected instance rather than throwing. Null-check before accessing nested fields:
var feats = knowledge.features;
for (var i = 0; i < feats.length; i++) {
var slug = feats[i].featureType ? feats[i].featureType.slug : "(orphaned)";
log.info("feature", feats[i].uuid, "type:", slug);
}
itemType may also be null when a knowledge item references a type from a different organisation than the script’s context, since the type lookup is org-scoped.
Type-slug case sensitivity
featuresByType / itemsByType perform an exact string match against the type’s slug. Slugs are lowercase by convention — featuresByType("Vendor") will not match a type whose slug is "vendor".
Kill switch
There is no runtime feature flag for the knowledge global. Emergency disable requires commenting out the registration in kodexa-orchestrator/internal/service/planner_script_adapters.go (the kb.Register(vm) call near the bottom of plannerContextBindings.Register) and redeploying.
What’s NOT here (deferred to v2)
The following are not available in v1:
- Resolution metadata (
knowledge.matches, ResolutionMatch objects). The engine does not currently persist per-set resolution decisions; surfacing them requires engine changes.
- Per-item source / clauses. Which knowledge clause produced an item is not exposed.
- Attachment-fetch accessors.
itemType.supportsAttachment is exposed, but there is no accessor to fetch the underlying attachment (presigned URL, content, etc.).
- Mid-script re-resolution. The bindings read what is already attached to the family; they do not re-run the knowledge engine. If you need fresh resolution, run a separate engine invocation upstream of the script.
- Feature provenance read fields (
feature.attachedAt, feature.attachedBy). The schema and write-side population land in v1’s parallel track so the data accumulates; the read API ships in v2.
Worked example: vendor routing
A common cass-analysis pattern: a single document family carries a vendor feature emitted upstream, the knowledge engine resolves the family’s processing-model knowledge set, and a downstream script step dispatches based on the resolved item’s properties.
// Step 1 (upstream) emits the vendor feature from the family's metadata.
// Step 2 (this script step) reads the resolved processing-model item and
// returns a dispatch action.
const route = knowledge.itemsByType("processing-model")[0];
if (!route) {
throw new Error("no processing-model resolved -- vendor missing or unmapped");
}
// route.properties is shaped by the processing-model item type's options:
// { domain: "utilities", pipeline: "template", hasLineItemsTemplate: false }
switch (route.properties.pipeline) {
case "template": return { action: "template_path" };
case "llm": return { action: "llm_path" };
default: return { action: "unknown" };
}
The same pattern applied to multi-family steps:
var dispatches = [];
for (var i = 0; i < families.length; i++) {
var fam = families[i];
var route = knowledge.getItemsByType(fam.id, "processing-model")[0];
if (!route) {
log.warn("no processing-model for family", fam.id);
continue;
}
dispatches.push({ familyId: fam.id, pipeline: route.properties.pipeline });
}
return { dispatches: dispatches };
Common patterns
Pattern 1: Two-step feature emission then read
The canonical use of knowledge is a two-step activity-plan flow: an upstream script step emits a feature using the planner’s {features: [...]} return contract; the orchestrator runs AssessAndEnrich between steps; a downstream script reads the resolved items via knowledge.itemsByType(...).
// Step 1: emit_features (script step, dependsOn: [])
// Read upstream metadata and emit a feature so the knowledge engine
// can resolve which items apply to this document.
var doc = loadDocument(families[0].id);
var vendorId = (doc.metadata || {})["CustomFields.PrimaryVendorId"];
if (!vendorId) {
throw new Error("PrimaryVendorId missing -- intake metadata is incomplete");
}
return {
action: "ok",
features: [{
documentFamilyId: families[0].id,
featureTypeSlug: "vendor",
properties: { vendorId: String(vendorId) }
}]
};
// Step 2: classify_path (script step, dependsOn: [emit_features, ocr])
// AssessAndEnrich has run between the two steps. Read the resolved item.
var route = knowledge.itemsByType("processing-model")[0];
if (!route) {
// Useful fallback: keep the legacy path active until coverage is complete.
log.warn("No processing-model resolved; falling back to template_single");
return { action: "template_single" };
}
switch (route.properties.pipeline) {
case "template": return { action: "template_path" };
case "llm": return { action: "llm_path" };
default: return { action: "unknown" };
}
The split keeps each step focused: step 1 sources data from upstream, step 2 dispatches based on org-level configuration.
Pattern 2: Per-vendor configuration without a lookup table
Older code commonly contains hard-coded tables like:
// Before: vendor-specific knowledge baked into the script
var TEMPLATE_VENDORS = ["VEN-001", "VEN-005", "VEN-018"];
var LINE_ITEM_VENDORS = ["VEN-002", "VEN-018"];
if (TEMPLATE_VENDORS.indexOf(vendorId) >= 0) { /* ... */ }
Knowledge replaces this with a declarative item per vendor:
// After: read the vendor's processing-model item; let knowledge sets resolve it
var route = knowledge.itemsByType("processing-model")[0];
if (route && route.properties.pipeline === "template") {
// ...
if (route.properties.hasLineItemsTemplate) { /* ... */ }
}
Now adding a new vendor is a knowledge-set edit (UI or YAML), not a code change.
Pattern 3: Conditional behaviour gated on feature presence
Sometimes you don’t need a knowledge item at all — just the existence of a feature. Use featuresByType:
// Skip OCR if the document has already been classified as machine-readable upstream.
var classifications = knowledge.featuresByType("document-classification");
var alreadyOCRed = classifications.some(function (f) {
return f.properties.layer === "text-extracted";
});
if (alreadyOCRed) {
return { action: "skip_ocr" };
}
Pattern 4: Multi-vendor / batch script steps
When the step’s families slice has more than one entry, the bare accessors throw. Loop and use the explicit forms:
var dispatches = [];
for (var i = 0; i < families.length; i++) {
var fam = families[i];
var route = knowledge.getItemsByType(fam.id, "processing-model")[0];
if (!route) {
log.warn("no processing-model for family", fam.id);
continue;
}
dispatches.push({ familyId: fam.id, pipeline: route.properties.pipeline });
}
return { action: "dispatched", count: dispatches.length, items: dispatches };
Pattern 5: Audit / inspection
A read-only inspection script can summarise what the engine has done:
var feats = knowledge.features;
var items = knowledge.items;
log.info("knowledge state",
"features", feats.length,
"items", items.length,
"byType", items.reduce(function (acc, it) {
var t = it.itemType ? it.itemType.slug : "(orphaned)";
acc[t] = (acc[t] || 0) + 1;
return acc;
}, {})
);
return { action: "audited" };
This is useful in test pipelines, before-merge checks, and triage activity-plans where you want visibility into what was resolved.
Choosing knowledge vs. alternatives
A short decision guide for “where should this configuration live?”:
| Source of the value | Recommended home | Notes |
|---|
| Org admin sets it once per vendor | Knowledge item (this binding) | Versioned, reviewable, no redeploy to update |
| Same for every project of a given type | Project template option | Set when a project is created from the template |
| Same for every document a module processes | Module option | Visible at module-config time |
| Computed at runtime from this specific document | Script-local variable | Don’t shoehorn into knowledge |
| Cross-step within one activity run | {features: [...]} return contract, or doc metadata | Knowledge is for stable config, not run-state |
| Upstream system pushes it on every doc | Document metadata / intake | Surface as a knowledge feature only if downstream knowledge sets need to match on it |
The recurring test: if the org admin needs to change this value tomorrow, do they have to redeploy code, redeploy the project template, or just edit a knowledge item? The last is the cheapest — favour it when the value is genuinely org-scoped.
See also