> ## Documentation Index
> Fetch the complete documentation index at: https://developer.kodexa.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Query Commands

> Use kdx document grep and lines commands for Unix-style text search and line retrieval in KDDB files, with JSON Lines output for shell pipelines.

The `kdx document grep` and `kdx document lines` commands provide Unix-style text search and line retrieval for KDDB files. Output is JSON Lines format by default, ideal for piping through `jq` and building shell pipelines.

## Grep Command

Search document content for lines matching a pattern, similar to Unix `grep`.

```bash theme={null}
kdx document grep <pattern> <file.kddb> [flags]
```

### Flags

| Flag                | Short | Description                            | Default       |
| ------------------- | ----- | -------------------------------------- | ------------- |
| `--ignore-case`     | `-i`  | Case-insensitive matching              | false         |
| `--extended-regexp` | `-E`  | Use extended regex (otherwise literal) | false         |
| `--count`           | `-c`  | Only print count of matches            | false         |
| `--context`         | `-C`  | Print N lines before and after         | 0             |
| `--before`          | `-B`  | Print N lines before each match        | 0             |
| `--after`           | `-A`  | Print N lines after each match         | 0             |
| `--max`             |       | Maximum number of results              | 0 (unlimited) |
| `--page`            |       | Search only on page N (1-based)        | 0 (all pages) |
| `--pretty`          |       | Pretty-print JSON output               | false         |

### Basic Search

Find all lines containing a text pattern:

```bash theme={null}
kdx document grep "risk factor" doc.kddb
```

```json theme={null}
{"uuid":"abc-123","content":"Risk factors include market volatility","page":5}
{"uuid":"def-456","content":"Additional risk factors discussed below","page":5}
```

### Case-Insensitive Search

```bash theme={null}
kdx document grep -i "revenue" doc.kddb
```

### Regex Search

Use `-E` for extended regex patterns:

```bash theme={null}
# Find section headers with numbers
kdx document grep -E "Section\s+\d+" doc.kddb

# Find amounts with dollar signs
kdx document grep -E "\$[\d,]+\.\d{2}" doc.kddb

# Find dates in various formats
kdx document grep -E "\d{1,2}/\d{1,2}/\d{2,4}" doc.kddb
```

### Context Lines

Show surrounding lines for context:

```bash theme={null}
# 2 lines before and after each match
kdx document grep -C 2 "total revenue" doc.kddb

# 3 lines before, 1 line after
kdx document grep -B 3 -A 1 "conclusion" doc.kddb
```

### Count Matches

Get just the count of matching lines:

```bash theme={null}
kdx document grep -c "error" doc.kddb
```

```text theme={null}
12
```

### Limit Results

Stop after finding N matches:

```bash theme={null}
kdx document grep --max 10 "warning" doc.kddb
```

### Filter by Page

Search only on a specific page:

```bash theme={null}
kdx document grep --page 5 "total" doc.kddb
```

### Pretty Print

For debugging, use pretty-printed JSON:

```bash theme={null}
kdx document grep --pretty "revenue" doc.kddb
```

```json theme={null}
{
  "uuid": "abc-123",
  "content": "Total revenue for Q4 was $1.2M",
  "page": 3
}
```

## Lines Command

Retrieve specific lines by index range, page number, or UUID.

```bash theme={null}
kdx document lines <file.kddb> [flags]
```

### Flags

| Flag       | Description                                          | Default |
| ---------- | ---------------------------------------------------- | ------- |
| `--range`  | Line index range (start:end, 0-based, exclusive end) |         |
| `--page`   | Get all lines from page N (1-based)                  |         |
| `--uuid`   | Get lines by UUID (comma-separated)                  |         |
| `--pretty` | Pretty-print JSON output                             | false   |

### Lines by Index Range

Get lines by their 0-based index (exclusive end, like Python slicing):

```bash theme={null}
# Get lines 10-19 (10 lines total)
kdx document lines --range 10:20 doc.kddb
```

### Lines by Page

Get all lines from a specific page:

```bash theme={null}
# All lines from page 5
kdx document lines --page 5 doc.kddb
```

### Lines by UUID

Get specific lines by their UUID:

```bash theme={null}
# Single UUID
kdx document lines --uuid abc-123 doc.kddb

# Multiple UUIDs
kdx document lines --uuid abc-123,def-456,ghi-789 doc.kddb
```

### All Lines

Without flags, returns all lines in the document:

```bash theme={null}
kdx document lines doc.kddb
```

## Output Format

Both commands output JSON Lines (JSONL) format by default - one JSON object per line:

```json theme={null}
{"uuid":"abc-123","content":"Line content here","page":3}
{"uuid":"def-456","content":"Another line of text","page":3}
{"uuid":"ghi-789","content":"Third line content","page":4}
```

### Output Fields

| Field     | Description                                                  |
| --------- | ------------------------------------------------------------ |
| `uuid`    | Unique identifier for the line (from `spatial:uuid` feature) |
| `content` | Text content of the line                                     |
| `page`    | Page number (1-based)                                        |

## Shell Pipeline Examples

### Extract UUIDs from Search Results

```bash theme={null}
kdx document grep -i "risk" doc.kddb | jq -r '.uuid'
```

### Count Matches per Page

```bash theme={null}
kdx document grep "revenue" doc.kddb | jq -s 'group_by(.page) | map({page: .[0].page, count: length})'
```

```json theme={null}
[
  {"page": 3, "count": 5},
  {"page": 7, "count": 2},
  {"page": 12, "count": 8}
]
```

### Get Content from UUIDs

```bash theme={null}
# Read UUIDs from a file and fetch their content
UUIDS=$(cat uuids.txt | tr '\n' ',')
kdx document lines --uuid "$UUIDS" doc.kddb | jq -r '.content'
```

### Extract Lines from Multiple Pages

```bash theme={null}
for p in {10..15}; do
  kdx document lines --page $p doc.kddb
done | jq -s '.'
```

### Filter and Transform

```bash theme={null}
# Find all amounts and extract the values
kdx document grep -E '\$[\d,]+' doc.kddb | jq -r '.content'

# Get first 5 lines from each page
for p in $(seq 1 10); do
  kdx document lines --page $p doc.kddb | head -5
done
```

### Build Context for LLM

```bash theme={null}
# Extract specific content by UUID for LLM context
kdx document lines --uuid abc-123,def-456 doc.kddb | jq -r '.content' | paste -sd '\n' -
```

### Search and Retrieve Full Context

```bash theme={null}
# Find lines mentioning "revenue", then get their page content
PAGES=$(kdx document grep "revenue" doc.kddb | jq -r '.page' | sort -u)
for p in $PAGES; do
  echo "=== Page $p ==="
  kdx document lines --page $p doc.kddb | jq -r '.content'
done
```

## Use Cases

### Document Search and Analysis

Quickly search across document content:

```bash theme={null}
# Find all mentions of a term
kdx document grep "depreciation" financials.kddb

# Case-insensitive search for variations
kdx document grep -i "ebitda" report.kddb

# Find patterns like phone numbers
kdx document grep -E "\(\d{3}\)\s*\d{3}-\d{4}" contract.kddb
```

### Building LLM Prompts

Extract relevant content for LLM context windows:

```bash theme={null}
# Get all lines from executive summary (page 2)
CONTEXT=$(kdx document lines --page 2 doc.kddb | jq -r '.content' | paste -sd '\n' -)

# Search for relevant lines and build context
CONTEXT=$(kdx document grep -i "key finding" doc.kddb | jq -r '.content' | paste -sd '\n' -)
```

### Data Pipeline Integration

Use in automated workflows:

```bash theme={null}
#!/bin/bash
# Extract all lines mentioning amounts
kdx document grep -E '\$[\d,]+' "$1" | while read -r line; do
  UUID=$(echo "$line" | jq -r '.uuid')
  CONTENT=$(echo "$line" | jq -r '.content')
  PAGE=$(echo "$line" | jq -r '.page')

  # Process each match...
  echo "Found amount on page $PAGE: $CONTENT"
done
```

### Quality Assurance

Validate document content:

```bash theme={null}
# Check if required sections exist
if kdx document grep -c "Terms and Conditions" contract.kddb | grep -q "^0$"; then
  echo "ERROR: Missing Terms and Conditions section"
  exit 1
fi

# Count pages with content
PAGES_WITH_CONTENT=$(kdx document grep "." doc.kddb | jq -r '.page' | sort -u | wc -l)
echo "Document has content on $PAGES_WITH_CONTENT pages"
```

## Tips

<Tip>
  Use `-E` (extended regex) when you need pattern matching. Without it, special characters like `.` and `*` are treated literally.
</Tip>

<Tip>
  JSON Lines format streams well - you can process results as they arrive without loading everything into memory.
</Tip>

<Tip>
  Combine `grep` with `--max` to quickly check if a pattern exists without processing the entire document.
</Tip>

<Note>
  The `uuid` field comes from the `spatial:uuid` feature on line nodes. Documents without this feature will have empty UUIDs.
</Note>

<Note>
  Page numbers are 1-based in the output and flags, matching how users typically refer to pages.
</Note>

## Related

<CardGroup cols={2}>
  <Card title="Document Structure" icon="tree" href="/guides/kdx-cli/document/structure">
    Print and select commands for structure inspection
  </Card>

  <Card title="Document Overview" icon="file-lines" href="/guides/kdx-cli/document/overview">
    All document commands
  </Card>

  <Card title="SDK: Selectors" icon="crosshairs" href="/sdk/selectors">
    Programmatic node selection
  </Card>
</CardGroup>
