Skip to content

Token Management

Cub tracks token usage across AI harnesses to help you monitor costs and stay within budget. Different harnesses have varying levels of token reporting accuracy.

How Tokens Are Tracked

Each time a harness is invoked, Cub records:

Metric Description
input_tokens Tokens sent to the AI (prompt + context)
output_tokens Tokens generated by the AI
cache_read_tokens Tokens served from prompt cache
cache_creation_tokens Tokens written to prompt cache
total_tokens Sum of input + output tokens
cost_usd Estimated cost (if available)

Token Tracking by Harness

Not all harnesses report tokens equally:

Harness Token Reporting Accuracy
Claude Code ✅ Full Exact from API
OpenCode ✅ Full Exact from API
Codex ❌ None Not reported
Gemini ⚠ Estimated ~4 chars/token

Claude Code

Claude Code provides the most comprehensive token data:

{
  "usage": {
    "input_tokens": 12500,
    "output_tokens": 3200,
    "cache_read_tokens": 8000,
    "cache_creation_tokens": 0,
    "cost_usd": 0.0234
  }
}

OpenCode

OpenCode reports accurate token counts:

{
  "usage": {
    "input_tokens": 15000,
    "output_tokens": 4500,
    "total_tokens": 19500
  }
}

Codex

Codex does not report token usage. Cub disables token-based budget tracking when using Codex.

Gemini

Gemini uses character-based estimation (~4 characters per token):

estimated_tokens = len(output_text) // 4

Viewing Token Usage

Real-Time During Run

Enable streaming to see tokens per task:

cub run --stream

Output shows:

[green]Task completed in 45.2s[/green]
[dim]Tokens: 156,892[/dim]

In Run Summary

After a run completes:

Run Summary
+------------------+------------+
| Tokens Used      | 456,892    |
| Cost             | $2.34      |
+------------------+------------+

In Status File

Cub writes status to .cub/runs/{session}/status.json:

{
  "budget": {
    "tokens_used": 456892,
    "tokens_limit": 1000000,
    "cost_usd": 2.34,
    "cost_limit": 5.0,
    "tokens_percentage": 45.69
  }
}

In JSONL Logs

Each task logs token usage in ~/.local/share/cub/logs/{project}/{session}.jsonl:

{"timestamp": "2026-01-17T10:30:00Z", "event_type": "task_end", "data": {"task_id": "cub-054", "tokens_used": 156892, "exit_code": 0}}

Query with jq:

# Total tokens for a session
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
  jq -s '[.[] | select(.event_type == "task_end") | .data.tokens_used] | add'

# Tokens per task
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
  jq 'select(.event_type == "task_end") | {task: .data.task_id, tokens: .data.tokens_used}'

Budget Configuration

Per-Task Token Limit

Stop a task if it exceeds a token threshold:

{
  "budget": {
    "max_tokens_per_task": 500000
  }
}
cub run --budget-tokens 500000

Session Cost Limit

Stop the run when total cost exceeds a threshold:

{
  "budget": {
    "max_total_cost": 10.0
  }
}
cub run --budget 10.0

Tasks Per Session

Stop after completing N tasks:

{
  "budget": {
    "max_tasks_per_session": 20
  }
}

Token Efficiency Tips

Use Prompt Caching

Claude Code supports prompt caching, which reduces token consumption for repeated context:

Effective Input Tokens = Input Tokens - Cache Read Tokens

If your system prompt is 10K tokens and gets cached:

  • First task: 10K input tokens (cache creation)
  • Subsequent tasks: 0 effective tokens for system prompt

Select Efficient Models

Use task labels to route simple tasks to smaller models:

# Tag a task to use a faster model
bd label cub-054 model:haiku

In .cub.json, you can set defaults:

{
  "harness": {
    "model": "sonnet"
  }
}

Monitor Token Patterns

High token tasks often indicate:

  • Large codebases being scanned
  • Repeated context in multi-turn conversations
  • Complex tasks that should be broken down

Review your logs to identify optimization opportunities:

# Find high-token tasks
cat ~/.local/share/cub/logs/myproject/*.jsonl | \
  jq -s 'sort_by(-.data.tokens_used) | .[0:5] | .[] | {task: .data.task_id, tokens: .data.tokens_used}'

Cost Estimation

Cub estimates costs using approximate per-token pricing. Actual costs may vary based on:

  • Model used (Opus > Sonnet > Haiku)
  • Prompt caching discounts
  • Batch vs. interactive pricing

The cost_usd field provides a rough estimate for budget planning, not billing.

Handling Harnesses Without Token Reporting

When using Codex or other harnesses without token reporting:

  1. Token budget limits are ignored
  2. Cost budget limits are ignored
  3. Task and iteration limits still work
  4. Consider time-based monitoring instead
# Use iteration limits when tokens unavailable
cub run --once  # Single iteration