All articles
March 18, 2026ยท5 min read

The $8/Day Operating System

How 5-tier LLM routing cuts AI costs 90% while keeping 8 agents running 24/7.

aicostsllm-routingoptimization

The Expensive Mistake Everyone Makes

You build an agent system. You pick GPT-4 (or Claude Opus, or Gemini Ultra) because you want quality. Every task โ€” health checks, email classification, status updates, strategic analysis โ€” goes through the same frontier model.

Day one: works great. Day two: your API bill is $40. Day seven: you have spent $280 and your agents are doing the same work a model 10x cheaper could handle for 90% of their tasks.

This is the most common mistake in agent system design. Using one model for everything is like hiring a surgeon to take your temperature, a lawyer to file your paperwork, and an architect to move your furniture. The output quality is the same โ€” but the cost is absurd.

The 5-Tier Solution

We solved this by building a model routing system that matches task complexity to model capability:

Tier 1 โ€” Nano ($0.10/M tokens). Health checks, routing decisions, simple classification. "Is this email urgent?" "Is this agent alive?" These tasks need speed, not intelligence. Flash-lite models handle them perfectly. Tier 2 โ€” Workhorse ($0.32/M tokens). The default tier. Email triage, structured reasoning, data extraction, reflection summaries. 60% of all agent work lands here. Reliable, fast, cheap. Tier 3 โ€” Capable ($0.50/M tokens). Long-context tasks, content drafts, detailed analysis. When the agent needs to read a 10-page document and produce a summary, this tier handles it without quality loss. Tier 4 โ€” Power ($0.55/M tokens). Code review, strategic analysis, complex multi-step reasoning. Tasks with the keyword "deploy" or "architecture" auto-route here. Reserved for work where mistakes are expensive. Tier 5 โ€” Premium ($3-15/M tokens). Critical decisions, production deployments, tasks flagged by the safety monitor. Less than 1% of all calls. Used only when the stakes justify the cost.

Pattern-Based Routing

The router does not just pick tiers randomly. It uses keyword patterns in the task title and description to automatically escalate or downgrade:

  • Task contains "health check" or "status"? Downgrade to Nano. No reason to spend $0.50 on a ping.
  • Task contains "deploy" or "production"? Escalate to Power. A bad deploy costs more than $0.55.
  • Task contains "strategic" or "quarterly review"? Escalate to Premium. Big-picture analysis needs the best model.
  • Task is a reflection summary? Force Workhorse. Reflection is important but structured โ€” no need for premium.

This pattern matching is configurable. You define the keywords and tier mappings in klawty.json. The defaults are battle-tested from 3 months of production operation, but every business has different high-stakes tasks.

Daily Caps: The Safety Net

Even with smart routing, runaway costs can happen. An agent stuck in a loop, a discovery run that creates too many premium-tier tasks, an API that returns malformed responses causing retries.

Our cost control system has two thresholds:

  • 80% warning. When daily spend hits 80% of your configured cap, all agents are notified. They start conserving โ€” preferring cheaper tiers, deferring non-urgent work.
  • 100% hard cap. Premium and power tier calls are blocked. But nano and workhorse tiers keep running. Your agents do not stop working โ€” they just stop spending on expensive models. Health checks, email triage, and routine tasks continue uninterrupted.

Auto-reset at midnight. No manual intervention needed.

Real Numbers from Production

Here is the actual cost distribution from our 8-agent production system over a typical week:

  • Nano (Tier 1): 60% of all LLM calls. Cost: $1.20/week
  • Workhorse (Tier 2): 25% of calls. Cost: $2.80/week
  • Capable (Tier 3): 10% of calls. Cost: $3.50/week
  • Power (Tier 4): 4% of calls. Cost: $2.10/week
  • Premium (Tier 5): 1% of calls. Cost: $1.40/week
Total: ~$56/week. About $8/day.

For comparison, the same workload on a single-model setup (everything through a capable model) costs approximately $40/day. Same tasks completed. Same output quality for 95% of the work. The 5% that benefits from premium models gets premium models โ€” everything else gets the cheapest model that can handle the job.

The Fallback Chain

Smart routing also means smart failure handling. If a Power-tier model is down or rate-limited, the router doesn't fail โ€” it tries the next tier down. Power fails? Try Capable. Capable fails? Try Workhorse.

Combined with the circuit breaker (which tracks per-model failure rates and stops hammering broken endpoints), your agents stay productive even during API outages. One model going down is a non-event, not an emergency.

Build With Smart Routing

Every tier of Agent Builder includes LLM routing. Solo gets 3-tier routing (Nano, Workhorse, Capable). Team adds Power tier. Fleet includes all 5 tiers with pattern-based escalation, daily caps, and per-agent cost tracking.

The routing configuration is in klawty.json โ€” no code changes needed. Add a model, change a pattern, adjust a cap. Your agents adapt immediately.

Configure your agent system at ai-agent-builder.ai/build and stop paying frontier prices for commodity work.

Related articles

Ready to build your own?

Configure your autonomous agent system in 5 minutes โ€” or get a pre-fitted system for your industry.