Skip to main content
Transactions enable you to batch multiple data object and data attribute operations into a single atomic unit. All operations are queued locally and executed in one call, providing both atomicity (all succeed or all fail) and significantly better performance for bulk operations.

Overview

Transactions are especially useful when:
  • Creating many data objects and attributes at once
  • Performing extraction operations that produce multiple related records
  • Needing atomic rollback on failure
  • Optimizing performance for bulk operations

Python Usage

In Python, use the batch_transaction() context manager:
from kodexa_document import Document
from kodexa_document.accessors import DataObjectInput, DataAttributeInput

with Document() as doc:
    root = doc.create_node("document", "Invoice #12345")
    doc.content_node = root

    with doc.batch_transaction() as tx:
        # Create a data object (returns immediately with a temporary record)
        invoice = tx.data_objects.create(DataObjectInput(
            path="/invoice"
        ))

        # Add attributes using the temporary ID
        tx.data_attributes.create(invoice['id'], DataAttributeInput(
            tag="vendor-name",
            string_value="Acme Corp",
            confidence=0.95
        ))
        tx.data_attributes.create(invoice['id'], DataAttributeInput(
            tag="total-amount",
            decimal_value=1234.56,
            confidence=0.92
        ))

        # Create child objects
        for desc, amount in [("Widget A", 100.0), ("Widget B", 250.0)]:
            line_item = tx.data_objects.create(DataObjectInput(
                parent_id=invoice['id'],
                path="/invoice/line-item"
            ))
            tx.data_attributes.create(line_item['id'], DataAttributeInput(
                tag="description",
                string_value=desc,
                confidence=0.90
            ))
            tx.data_attributes.create(line_item['id'], DataAttributeInput(
                tag="amount",
                decimal_value=amount,
                confidence=0.90
            ))

        print(f"Queued {tx.operation_count} operations")
    # All operations are committed atomically when exiting the context

    # Verify the results
    objects = doc.data_objects.get_all()
    print(f"Created {len(objects)} data objects")

Transaction Operations

The TransactionContext provides accessors that mirror the standard data accessors:
AccessorMethodDescription
tx.data_objectscreate(input)Queue a data object creation
tx.data_objectsupdate(id, updates)Queue a data object update
tx.data_objectsdelete(id)Queue a data object deletion
tx.data_attributescreate(obj_id, input)Queue an attribute creation
tx.data_attributesupdate(id, updates)Queue an attribute update
tx.data_attributesdelete(id)Queue an attribute deletion
tx.data_attributesset_value(id, value)Queue a value update
tx.data_attributesset_confidence(id, confidence)Queue a confidence update

ID Resolution

When you create a data object within a transaction, it returns a record with a temporary ID. You can use this temporary ID to create child objects or attributes within the same transaction. The IDs are resolved to real database IDs when the transaction is committed.
with doc.batch_transaction() as tx:
    # parent gets a temporary ID
    parent = tx.data_objects.create(DataObjectInput(path="/parent"))

    # Use the temporary ID for the child - it's resolved on commit
    child = tx.data_objects.create(DataObjectInput(
        parent_id=parent['id'],
        path="/parent/child"
    ))

    # Use the child's temporary ID for attributes
    tx.data_attributes.create(child['id'], DataAttributeInput(
        tag="name",
        string_value="Child attribute"
    ))

Error Handling

If an exception occurs within the transaction block, all queued operations are discarded:
try:
    with doc.batch_transaction() as tx:
        tx.data_objects.create(DataObjectInput(path="/test"))
        raise ValueError("Something went wrong")
        # The create operation is NOT committed
except ValueError:
    print("Transaction rolled back")

# No data objects were created
assert len(doc.data_objects.get_all()) == 0

TypeScript Usage

In TypeScript, use the transaction() method with an async callback:
import { Kodexa } from '@kodexa-ai/document-wasm-ts';

async function batchCreateData() {
  await Kodexa.init();
  const doc = await Kodexa.createDocument();

  try {
    await doc.transaction(async (tx) => {
      // Create a data object
      const invoice = tx.dataObjects.create({
        path: '/invoice'
      });

      // Add attributes using the temporary ID
      tx.dataAttributes.create(invoice.id, {
        tag: 'vendor-name',
        stringValue: 'Acme Corp',
        confidence: 0.95
      });
      tx.dataAttributes.create(invoice.id, {
        tag: 'total-amount',
        decimalValue: 1234.56,
        confidence: 0.92
      });

      // Create child objects
      const items = [
        { desc: 'Widget A', amount: 100.0 },
        { desc: 'Widget B', amount: 250.0 }
      ];

      for (const { desc, amount } of items) {
        const lineItem = tx.dataObjects.create({
          parentId: invoice.id,
          path: '/invoice/line-item'
        });
        tx.dataAttributes.create(lineItem.id, {
          tag: 'description',
          stringValue: desc,
          confidence: 0.90
        });
        tx.dataAttributes.create(lineItem.id, {
          tag: 'amount',
          decimalValue: amount,
          confidence: 0.90
        });
      }

      console.log(`Queued ${tx.operationCount} operations`);
    });
    // All operations are committed atomically

    // Verify results
    const objects = await doc.dataObjects.getAll();
    console.log(`Created ${objects.length} data objects`);
  } finally {
    doc.dispose();
  }
}

Performance

Transactions provide significant performance benefits for bulk operations:
  • Without transactions: Each create/update/delete is a separate FFI/WASM call
  • With transactions: All operations are batched into a single call
For operations involving dozens or hundreds of data objects and attributes, transactions can be orders of magnitude faster.

Best Practices

  1. Use transactions for bulk operations: Any time you’re creating more than a few data objects or attributes, wrap them in a transaction.
  2. Keep transactions focused: Don’t mix unrelated operations in the same transaction.
  3. Handle errors: Wrap transaction blocks in try/except (Python) or try/catch (TypeScript) to handle failures gracefully.
  4. Check operation count: Use tx.operation_count (Python) or tx.operationCount (TypeScript) to verify the expected number of operations before committing.