Token Management¶
Cub tracks token usage across AI harnesses to help you monitor costs and stay within budget. Different harnesses have varying levels of token reporting accuracy.
How Tokens Are Tracked¶
Each time a harness is invoked, Cub records:
| Metric | Description |
|---|---|
| input_tokens | Tokens sent to the AI (prompt + context) |
| output_tokens | Tokens generated by the AI |
| cache_read_tokens | Tokens served from prompt cache |
| cache_creation_tokens | Tokens written to prompt cache |
| total_tokens | Sum of input + output tokens |
| cost_usd | Estimated cost (if available) |
Token Tracking by Harness¶
Not all harnesses report tokens equally:
| Harness | Token Reporting | Accuracy |
|---|---|---|
| Claude Code | Exact from API | |
| OpenCode | Exact from API | |
| Codex | Not reported | |
| Gemini | ~4 chars/token |
Claude Code¶
Claude Code provides the most comprehensive token data:
{
"usage": {
"input_tokens": 12500,
"output_tokens": 3200,
"cache_read_tokens": 8000,
"cache_creation_tokens": 0,
"cost_usd": 0.0234
}
}
OpenCode¶
OpenCode reports accurate token counts:
Codex¶
Codex does not report token usage. Cub disables token-based budget tracking when using Codex.
Gemini¶
Gemini uses character-based estimation (~4 characters per token):
Viewing Token Usage¶
Real-Time During Run¶
Enable streaming to see tokens per task:
Output shows:
In Run Summary¶
After a run completes:
Run Summary
+------------------+------------+
| Tokens Used | 456,892 |
| Cost | $2.34 |
+------------------+------------+
In Status File¶
Cub writes status to .cub/runs/{session}/status.json:
{
"budget": {
"tokens_used": 456892,
"tokens_limit": 1000000,
"cost_usd": 2.34,
"cost_limit": 5.0,
"tokens_percentage": 45.69
}
}
In JSONL Logs¶
Each task logs token usage in ~/.local/share/cub/logs/{project}/{session}.jsonl:
{"timestamp": "2026-01-17T10:30:00Z", "event_type": "task_end", "data": {"task_id": "cub-054", "tokens_used": 156892, "exit_code": 0}}
Query with jq:
# Total tokens for a session
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
jq -s '[.[] | select(.event_type == "task_end") | .data.tokens_used] | add'
# Tokens per task
cat ~/.local/share/cub/logs/myproject/session.jsonl | \
jq 'select(.event_type == "task_end") | {task: .data.task_id, tokens: .data.tokens_used}'
Budget Configuration¶
Per-Task Token Limit¶
Stop a task if it exceeds a token threshold:
Session Cost Limit¶
Stop the run when total cost exceeds a threshold:
Tasks Per Session¶
Stop after completing N tasks:
Token Efficiency Tips¶
Use Prompt Caching¶
Claude Code supports prompt caching, which reduces token consumption for repeated context:
If your system prompt is 10K tokens and gets cached:
- First task: 10K input tokens (cache creation)
- Subsequent tasks: 0 effective tokens for system prompt
Select Efficient Models¶
Use task labels to route simple tasks to smaller models:
In .cub.json, you can set defaults:
Monitor Token Patterns¶
High token tasks often indicate:
- Large codebases being scanned
- Repeated context in multi-turn conversations
- Complex tasks that should be broken down
Review your logs to identify optimization opportunities:
# Find high-token tasks
cat ~/.local/share/cub/logs/myproject/*.jsonl | \
jq -s 'sort_by(-.data.tokens_used) | .[0:5] | .[] | {task: .data.task_id, tokens: .data.tokens_used}'
Cost Estimation¶
Cub estimates costs using approximate per-token pricing. Actual costs may vary based on:
- Model used (Opus > Sonnet > Haiku)
- Prompt caching discounts
- Batch vs. interactive pricing
The cost_usd field provides a rough estimate for budget planning, not billing.
Handling Harnesses Without Token Reporting¶
When using Codex or other harnesses without token reporting:
- Token budget limits are ignored
- Cost budget limits are ignored
- Task and iteration limits still work
- Consider time-based monitoring instead