How We Run 7 AI Agents for 37 EUR/Month -- The Architecture Behind dcode-ops
7 autonomous AI agents manage our entire business for 37 EUR/month in LLM costs. Here is the 4-layer architecture and 5-tier model routing that makes it work.
We Run 7 AI Agents That Manage Our Entire Business. The LLM Bill Is 37 EUR.
Not per day. Per month. Seven autonomous agents running 24/7 on a Hetzner VPS in Germany, managing five websites, monitoring security, writing content, scoring leads, deploying code, analyzing traffic, and coordinating with each other. Total LLM inference cost: approximately 37 EUR per month.
This is not a lab demo. This is dcode-ops, our production AI operations system. It has been running since March 2026, built on Klawty OS, and it manages the entire digital presence of dcode technologies -- a software company headquartered in Luxembourg.
This article explains exactly how we architected the system to keep costs at 37 EUR while maintaining the quality to actually run a business on it.
Why Most Agent Systems Burn 10-100x More
The AI agent market is projected to grow from $7.8 billion in 2025 to $52.6 billion by 2030, according to MarketsandMarkets. But here is the number that matters more: according to Gartner, over 40% of agentic AI projects risk cancellation, not because the technology failed, but because costs spiraled beyond what the business case could justify.
The root cause is almost always the same: every LLM call goes through a single frontier model.
A health check -- "is this website returning a 200 status code?" -- gets sent to Claude Sonnet at $15 per million tokens. A routine DNS verification gets the same treatment as a strategic business analysis. It is like paying a surgeon to take your temperature. The output is identical. The bill is not.
We spent three months building the Inscape system (8 agents for a design-and-build company) and made every mistake in the book before we got the cost engineering right. When we built dcode-ops, we started with the architecture instead of bolting on cost controls after the fact.
The result is a 4-layer system with 5-tier model routing, and a governance engine that prevents waste before it happens.
The 4-Layer Architecture
Most people think of agents as "LLM + tools." That is layer 2 of 4. The layers that actually determine whether your system works in production -- and what it costs -- are the ones nobody talks about.
Layer 1: Orchestration
One agent sits above the others. In our system, that is Atlas (internally named Karim), the Architect and Orchestrator. Atlas runs on a 15-minute cycle using a power-tier model (Kimi K2.5, $5.00 per million tokens). His job: decide what work needs doing, assign it to the right agent, resolve conflicts, and produce cross-site status reports.
Atlas is the only agent with wildcard tool access. Every other agent has an explicit allow list. This is not optional -- it is the difference between a coordinated system and six independent bots stomping on each other's work.
Layer 2: Execution
Five specialist agents do the actual work:
Scout -- Research and Intelligence. Runs every 60 minutes on a capable-tier model (Gemini 2 Flash, $0.10 per million tokens). Scout gathers data: competitor rankings, traffic patterns, keyword positions, security advisories. Scout never acts on findings. Scout reports. Atlas decides. Ship (Axel) -- DevOps and Deployment. Runs every 30 minutes on a workhorse-tier model (DeepSeek V3.2, $0.27 per million tokens). Axel handles git pulls, npm builds, PM2 process management, port checks, disk and memory monitoring, SSL renewals, and deployment verification. 33 tools in his toolkit -- the largest of any agent. Plume (Lina) -- Content and SEO. Runs every 30 minutes on a capable-tier model. Lina writes blog posts, updates meta tags, audits SEO scores, researches keywords, drafts email sequences, and schedules social media posts. She wrote none of this article, which is unusual for her -- but this one needed the builder's perspective. Mira -- Analysis and Reporting. Runs every 60 minutes on a capable-tier model. Mira reads Plausible analytics, Search Console data, and Stripe revenue metrics. She produces comparison reports, detects anomalies, and generates the weekly growth report that Atlas uses to plan the next sprint. Closer (Hugo) -- Commerce and Sales. Runs every 20 minutes on a power-tier model (Kimi K2.5). Hugo scores leads from Supabase, drafts personalized follow-ups, monitors subscription health, detects churn risk, and manages the founding member program. He runs on a power-tier model because sales communication requires nuance that cheaper models miss.Layer 3: Governance
Sentinel -- Audit and Compliance. Runs every 60 minutes on a workhorse-tier model. Sentinel validates every proposal submitted by any agent against 9 business rules before it can execute. Sentinel checks SSL certificates, scans security headers, verifies uptime, audits npm dependencies, monitors GDPR compliance, and runs file integrity checks on a daily schedule.Sentinel is not an afterthought. In autonomous agent systems, governance is the layer that determines whether you can actually trust the system. Without it, you get agents sending 47 emails in a single cycle (we learned this the hard way on our first system).
Layer 4: Memory
All 7 agents share a 5-tier memory architecture:
- Working memory -- ephemeral, resets each cycle. Current task context.
- Session memory -- daily markdown logs, auto-loaded for today and yesterday.
- Agent memory -- persistent per-agent file (MEMORY.md), capped at 100 lines.
- Semantic memory -- Qdrant vector database with 1,536-dimension embeddings, searched on demand.
- Reference memory -- structured knowledge files (people, projects, decisions, lessons).
Memory distillation runs at 23:00 CET every night. It extracts insights from daily logs, updates agent memory files, generates embeddings for Qdrant, and prunes stale entries. Without distillation, agent memory files grow to 300+ lines of contradictory observations within two weeks. We learned this on our first system and built the solution into dcode-ops from day one.
5-Tier Model Routing: Why 80% of Our Calls Cost Under $0.30 Per Million Tokens
This is the single biggest cost lever in the entire system. The price spread across our five model tiers is over 200x:
| Tier | Model | Cost/M Tokens | Use Cases | % of Calls | |------|-------|---------------|-----------|------------| | Nano | Qwen 3.5 Coder Flash | $0.07 | Health checks, status queries, routing decisions | ~35% | | Workhorse | DeepSeek V3.2 | $0.27 | Security scans, deployments, monitoring | ~25% | | Capable | Gemini 2 Flash | $0.10 | Content drafts, SEO analysis, analytics | ~25% | | Power | Kimi K2.5 | $5.00 | Orchestration, sales drafts, complex reasoning | ~12% | | Premium | Claude Sonnet 4.6 | $15.00 | Strategic decisions, crisis response | ~3% |
The routing is pattern-based. Task title contains "health check" or "status"? Nano. Contains "deploy" or "production"? Workhorse. Contains "draft proposal" or "client email"? Power. Contains "strategic" or "quarterly review"? Premium.
If a lower-tier model fails to produce a satisfactory result, the task automatically escalates to the next tier up. Fallback always moves up in cost, never down. A nano outage means tasks run at workhorse prices -- more expensive, but nothing stops. Premium has no fallback. Strategic tasks queue until the model recovers. This is intentional: strategic decisions are worth waiting for.
The result: roughly 85% of all LLM calls hit the three cheapest tiers (nano, workhorse, capable), where prices range from $0.07 to $0.27 per million tokens. Only 3% of calls touch the premium tier. Same work completed. Dramatically lower bill.
161 Tools, 24 Skills, 8 Database Tables
The seven agents share 161 tools across 14 tool files (12,093 lines of JavaScript). Every tool has an explicit risk level:
- AUTO -- read-only, no external effect. No approval needed.
- AUTO+ -- internal writes, logged but not gated.
- PROPOSE -- external effect, reversible. Executes with a 15-minute rollback window.
- CONFIRM -- high-risk or irreversible. Requires explicit approval from the company founder via Telegram or Discord reaction.
- BLOCK -- hardcoded rejection. Financial transfers, database deletion, bulk email. Always returns an error.
Every tool invocation is recorded in the tool_calls table with the agent name, tool name, input parameters, output, success/failure status, execution duration, and cost. This is not optional logging. It is the audit trail that makes autonomous operation trustworthy.
The 24 skills are loaded dynamically based on task keywords. A task about SEO loads the seo-audit skill. A task about churn loads the churn-prevention skill. Skills provide domain-specific methodology and reference data without permanently consuming context window space.
The Governance Engine That Prevents Waste
Cost efficiency is not just about cheap models. It is about not doing unnecessary work.
4-Layer Deduplication
Spam is the number one failure mode of autonomous agents. An agent running every 15 minutes will create 96 identical tasks per day if you do not catch duplicates. We built four layers of dedup:
- Task dedup -- before creating any task, check for existing tasks with greater than 70% title similarity created in the last 48 hours.
- Channel dedup -- before posting to Discord or Telegram, hash the message content and check against posts from the last 4 hours.
- Proposal dedup -- before creating a proposal, check if the same agent already has a pending proposal for the same action.
- Discovery dedup -- before a discovery scan creates a task, check if a similar task was completed in the last 7 days.
Each layer prevented real spam in the Inscape system before we built dcode-ops. The 48-hour task window, the 4-hour channel window, the 7-day discovery window -- these are not arbitrary numbers. They are calibrated from three months of production data.
9-State Proposal Machine
Every non-trivial action goes through a proposal lifecycle with 9 states: pending, sentinel_approved, executing, awaiting_human, completed, rejected, expired, rolled_back, dead_letter.
The flow: an agent submits a proposal. Sentinel validates it against 9 business rules (does this agent have permission for this tool? Is the action within budget? Does it violate any blocked patterns?). If it passes, PROPOSE-tier actions auto-execute with a 15-minute rollback window. CONFIRM-tier actions send a Telegram message to the founder and wait for a reaction.
If no action is taken: reminder at 12 hours, escalation at 20 hours, auto-expire at 24 hours. Three consecutive failures on the same proposal send it to the dead-letter queue. The system never silently drops work, and it never silently executes dangerous operations.
Circuit Breaker
Every agent has a circuit breaker with three states: closed (normal), open (paused), and half-open (testing). Three consecutive task failures trip the breaker. The agent enters exponential cooldown: 5 minutes, then 10, then 20, then 40, maxing at 60. In half-open state, the agent tries one task. If it succeeds, the breaker closes. If it fails, it opens again with double the cooldown.
Without circuit breakers, a single malformed task causes an agent to burn through LLM calls retrying endlessly. On our first system, a broken email parser caused 47 retries in one hour, costing $23 on a system that normally costs $8 per day. With circuit breakers, the same failure costs $0.12.
Quiet Hours
Between 23:30 and 06:30 CET, the heartbeat pauses. Agents stop running think cycles. Memory distillation runs at 23:00, backups run at 02:00, and then everything sleeps until morning. This saves approximately 30% of daily LLM costs -- seven hours of zero inference calls -- with no impact on business operations because nobody is reading emails or checking dashboards at 3 AM.
The Real Cost Breakdown
Here is the per-agent daily budget from our configuration:
| Agent | Role | Model Tier | Cycle | Daily Cap | |-------|------|-----------|-------|-----------| | Atlas (Karim) | Orchestrator | Power | 15 min | $12.00 | | Closer (Hugo) | Sales & CRM | Power | 20 min | $10.00 | | Plume (Lina) | Content & SEO | Capable | 30 min | $8.00 | | Scout | Research | Capable | 60 min | $5.00 | | Mira | Analytics | Capable | 60 min | $5.00 | | Ship (Axel) | DevOps | Workhorse | 30 min | $4.00 | | Sentinel | Governance | Workhorse | 60 min | $4.00 | | System total | | | | $50.00 |
The $50 daily cap is the ceiling, not the average. In practice, daily spend runs between $1.00 and $3.00 because most cycles involve nano and workhorse-tier calls. The power-tier agents (Atlas and Hugo) are more expensive per call but make fewer calls per day. The capable-tier agents (Scout, Plume, Mira) run on Gemini 2 Flash at $0.10 per million tokens, which is nearly free for typical task sizes.
At 80% of the daily cap ($40), all agents downgrade their model tier. At 100%, everything switches to nano-only mode. The agents never stop working. They just work with cheaper models. Auto-reset at midnight. No manual intervention.
Monthly projection: at typical usage of $1.20 per day average, that is approximately $37 per month -- 37 EUR at current exchange rates.
What This Architecture Gives You
The four layers are not independent features. They are force multipliers:
Orchestration + cost routing means the right model handles the right task. Atlas decides what to prioritize; the router decides which model to use. Neither alone is sufficient. Execution + governance means agents can work autonomously without being dangerous. Sentinel validates before any agent executes. The proposal system provides the rollback safety net. Without governance, execution is just expensive chaos. Memory + deduplication means agents learn without spamming. The 5-tier memory system gives agents persistent context. The 4-layer dedup system prevents that context from generating redundant work.The $37/month number is a consequence of this architecture, not a goal we optimized for. We optimized for reliability, safety, and autonomy. The cost efficiency followed because well-architected systems do not waste compute on unnecessary work.
Build Your Own
The architecture described in this article is available as a product. The Klawty OS runtime is open source on GitHub. The pre-configured systems -- with agents, tools, skills, memory, and governance built in -- are available through AI Agent Builder.
Solo (1 agent, 49 EUR one-time), Team (3 agents, 149 EUR), or Fleet (7+ agents, 299 EUR). Or skip configuration entirely: pick your industry, and get a pre-fitted AI team with vertical-specific tools and workflows for 199 EUR/month managed.
The economics of autonomous AI agents are better than most people think. You just have to build the architecture before you build the agents.
---
This article was researched by Scout, structured by Atlas, and written by Plume -- three of the seven AI agents described above. The article itself is a product demo. -> Build your AI team: ai-agent-builder.ai -> Star the open-source runtime: github.com/dcode-tec/Klawty dcode technologies . Luxembourg . d-code.luRelated articles
Ready to build your own?
Configure your autonomous agent system in 5 minutes โ or get a pre-fitted system for your industry.