Token Management¶

Cub tracks token usage across AI harnesses to help you monitor costs and stay within budget. Different harnesses have varying levels of token reporting accuracy.

How Tokens Are Tracked¶

Each time a harness is invoked, Cub records:

Metric	Description
input_tokens	Tokens sent to the AI (prompt + context)
output_tokens	Tokens generated by the AI
cache_read_tokens	Tokens served from prompt cache
cache_creation_tokens	Tokens written to prompt cache
total_tokens	Sum of input + output tokens
cost_usd	Estimated cost (if available)

Token Tracking by Harness¶

Not all harnesses report tokens equally:

Harness	Token Reporting	Accuracy
Claude Code	Full	Exact from API
OpenCode	Full	Exact from API
Codex	None	Not reported
Gemini	Estimated	~4 chars/token

Claude Code¶

Claude Code provides the most comprehensive token data:

{
  "usage": {
    "input_tokens": 12500,
    "output_tokens": 3200,
    "cache_read_tokens": 8000,
    "cache_creation_tokens": 0,
    "cost_usd": 0.0234
  }
}

OpenCode¶

OpenCode reports accurate token counts:

{
  "usage": {
    "input_tokens": 15000,
    "output_tokens": 4500,
    "total_tokens": 19500
  }
}

Codex¶

Codex does not report token usage. Cub disables token-based budget tracking when using Codex.

Gemini¶

Gemini uses character-based estimation (~4 characters per token):

estimated_tokens = len(output_text) // 4

Viewing Token Usage¶

Real-Time During Run¶

Enable streaming to see tokens per task:

cub run --stream

Output shows:

[green]Task completed in 45.2s[/green]
[dim]Tokens: 156,892[/dim]

In Run Summary¶

After a run completes:

Run Summary
+------------------+------------+
| Tokens Used      | 456,892    |
| Cost             | $2.34      |
+------------------+------------+

In Status File¶

Cub writes status to .cub/runs/{session}/status.json:

{
  "budget": {
    "tokens_used": 456892,
    "tokens_limit": 1000000,
    "cost_usd": 2.34,
    "cost_limit": 5.0,
    "tokens_percentage": 45.69
  }
}

In JSONL Logs¶

Each task logs token usage in ~/.local/share/cub/logs/{project}/{session}.jsonl:

{"timestamp": "2026-01-17T10:30:00Z", "event_type": "task_end", "data": {"task_id": "cub-054", "tokens_used": 156892, "exit_code": 0}}

Query with jq:

# Total tokens for a session
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
  jq -s '[.[] | select(.event_type == "task_end") | .data.tokens_used] | add'

# Tokens per task
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
  jq 'select(.event_type == "task_end") | {task: .data.task_id, tokens: .data.tokens_used}'

Budget Configuration¶

Per-Task Token Limit¶

Stop a task if it exceeds a token threshold:

{
  "budget": {
    "max_tokens_per_task": 500000
  }
}

cub run --budget-tokens 500000

Session Cost Limit¶

Stop the run when total cost exceeds a threshold:

{
  "budget": {
    "max_total_cost": 10.0
  }
}

cub run --budget 10.0

Tasks Per Session¶

Stop after completing N tasks:

{
  "budget": {
    "max_tasks_per_session": 20
  }
}

Token Efficiency Tips¶

Use Prompt Caching¶

Claude Code supports prompt caching, which reduces token consumption for repeated context:

Effective Input Tokens = Input Tokens - Cache Read Tokens

If your system prompt is 10K tokens and gets cached:

First task: 10K input tokens (cache creation)
Subsequent tasks: 0 effective tokens for system prompt

Select Efficient Models¶

Use task labels to route simple tasks to smaller models:

# Tag a task to use a faster model
bd label cub-054 model:haiku

In .cub.json, you can set defaults:

{
  "harness": {
    "model": "sonnet"
  }
}

Monitor Token Patterns¶

High token tasks often indicate:

Large codebases being scanned
Repeated context in multi-turn conversations
Complex tasks that should be broken down

Review your logs to identify optimization opportunities:

# Find high-token tasks
cat ~/.local/share/cub/logs/myproject/*.jsonl | \
  jq -s 'sort_by(-.data.tokens_used) | .[0:5] | .[] | {task: .data.task_id, tokens: .data.tokens_used}'

Cost Estimation¶

Cub estimates costs using approximate per-token pricing. Actual costs may vary based on:

Model used (Opus > Sonnet > Haiku)
Prompt caching discounts
Batch vs. interactive pricing

The cost_usd field provides a rough estimate for budget planning, not billing.

Handling Harnesses Without Token Reporting¶

When using Codex or other harnesses without token reporting:

Token budget limits are ignored
Cost budget limits are ignored
Task and iteration limits still work
Consider time-based monitoring instead

# Use iteration limits when tokens unavailable
cub run --once  # Single iteration