From first download to production deployment. How your agent system works, how to configure it, and how to get the most out of it.
A downloadable ZIP containing a complete agent system: 27-module runtime engine, Qdrant vector memory, your customized workspace (SOUL.md, agent configs, skills, guardrails), Docker Compose for one-command deployment, shell scripts to start/stop/monitor, and macOS LaunchAgent files for 24/7 operation.
Node.js 20+, Docker (for Qdrant vector memory), a machine that runs 24/7 (Mac mini, Linux server, or VPS), and an OpenRouter API key for AI model access. A Hetzner CX22 VPS (โฌ5/month) or a Mac mini works perfectly. Docker is optional โ agents work without Qdrant using file-based memory.
The engine reads AGENT.md files (Markdown with YAML frontmatter) and runs a generic execution loop: pull task โ build prompt โ call LLM โ execute tools โ record results. No per-agent JavaScript code is needed. Adding a new agent = writing one AGENT.md file.
Tasks are stored in a SQLite database (WAL mode). The engine pulls the highest-priority task, claims it atomically, and runs up to 15 tool-calling rounds. Write-tool verification ensures agents actually did something. Failed tasks retry 3 times before dead-lettering.
Each task is routed to the cheapest model that can handle it. Nano ($0.10/M tokens) for health checks, Workhorse ($0.32/M) for email triage, Capable ($0.50/M) for content drafts, Power ($0.55/M) for code review, Premium ($3-15/M) for critical decisions. Pattern matching auto-selects the tier.
We recommend OpenRouter as your LLM provider. One API key gives you access to 200+ models โ Claude, GPT-4, Gemini, Mistral, DeepSeek, Llama, and more. Our 5-tier routing switches between models automatically based on task complexity. No vendor lock-in: if one model goes down, the fallback chain tries the next. One key, all models, automatic switching.
Four tiers: (1) Working memory โ in-process Map, cleared each cycle. (2) Session logs โ JSONL per agent per day. (3) Agent memory โ MEMORY.md, persistent, max 100 lines. (4) Semantic memory โ Qdrant vector database for similarity search across all past knowledge. The reflection engine auto-extracts learnings every 5 tasks. The memory distiller prunes weekly. Each tier is independent โ the system degrades gracefully if Qdrant is unavailable.
Your agents store every learning as a vector embedding in Qdrant. When working on a new task, agents can search past knowledge semantically โ 'What did I learn about client X?' returns relevant memories by meaning, not just keywords. Embeddings are generated via your OpenRouter API key (same key that powers your LLM models). Run Qdrant with: docker compose up -d qdrant
The main configuration file. Defines: model registry (which AI models are available), cost tiers and daily caps, agent defaults, skill matching rules, channel settings, and vector memory config (Qdrant URL, embedding model, collection name). Supports JSON5 (comments allowed).
Defines the agent's identity โ name, role, voice, behavioral rules, and anti-patterns. This is the character sheet. Edit it in plain Markdown to change how your agent thinks and communicates.
Per-agent configuration via YAML frontmatter: model tier, heartbeat cycle (how often it wakes up), tools (allow/deny lists), skills, channel for reporting, and discovery prompt (how it finds its own work). The body is free-form instructions.
Domain knowledge files in workspace/skills/{name}/SKILL.md. When a task title matches a skill's keywords, that skill is automatically injected into the agent's prompt. Token-budgeted: max 800 chars per skill, 3000 chars total.
Every tool has a risk level: AUTO (just do it), AUTO+ (do it and notify), PROPOSE (create a proposal with 15-min rollback), CONFIRM (wait for your approval), BLOCK (never execute). You configure these per-agent in AGENT.md.
Drop a JavaScript file in workspace/tools/ and it's auto-discovered. Export a function matching the tool-calling interface (name, description, parameters, execute). The tool registry picks it up on the next cycle.
When an agent wants to do something risky (send email, deploy code, modify data), it creates a proposal instead of acting directly. PROPOSE tier: auto-executes with 15-minute rollback. CONFIRM tier: waits for your explicit approval via Discord reaction or dashboard.
The runtime includes a 6-rule defense block injected into every prompt for write-capable agents. Covers: untrusted input handling, social engineering detection, authority spoofing, delimiter injection, path safety, and credential exfiltration prevention.
Prevents agents from spamming: task dedup (70% word overlap in 4-hour window), channel dedup (hash-based 1-hour window), proposal dedup (same agent + action), and discovery caps (max 8 tasks/day per agent).
Qdrant v1.9.2 vector database (semantic memory, healthcheck, persistent volume) + per-agent runner containers (Node 20, workspace mounted as volume, SQLite on persistent volume). Each agent service depends on Qdrant and auto-connects via QDRANT_URL=http://qdrant:6333.
Docker is optional. Without it, agents use file-based memory (MEMORY.md + JSONL logs) instead of Qdrant. All 4 memory tiers degrade gracefully โ if Qdrant is unavailable, the vector tier is simply skipped. Run agents natively with: node runtime/agent-runner.js --agent atlas --workspace ./workspace
Instead of running Qdrant locally, you can use Qdrant Cloud (https://cloud.qdrant.io). Set QDRANT_URL and QDRANT_API_KEY in your .env file. The runtime auto-connects on boot.
Your download includes a unique LICENSE_KEY in the .env file. On first run, the system registers a hardware fingerprint (a one-way hash of your machine's characteristics โ no personal data is sent). This binds your license to your machine.
The system checks for updates approximately once every 24 hours. This also validates your license. It transmits only: your license key, hardware fingerprint hash, and software version. It never transmits agent output, business data, or API keys.
If the license server is unreachable, your system runs normally for 7 days (grace period). After that, agents enter observation mode โ they can read and report but not execute write operations. Reconnecting to the internet restores full functionality.
Your license is bound to one machine. If you replace your machine, contact us at [email protected] with your license key and we'll transfer it to your new device. This is free and takes less than 24 hours.
Sharing your download, license key, or activation with other people or organizations. Running the same license on multiple machines simultaneously. Reverse-engineering or removing the license system. See our Terms of Service (sections 5โ7) for full details.
Check: (1) LaunchAgent loaded โ run status.sh, (2) .env has valid OPENROUTER_API_KEY, (3) Node.js 20+ installed โ run node --version, (4) Logs โ check observability/logs/ for errors.
Check: (1) Tasks exist โ run sqlite3 data/tasks.db "SELECT * FROM tasks WHERE status='backlog'", (2) Discovery is enabled โ check AGENT.md has a discoveryPrompt, (3) Circuit breaker isn't open โ check logs for CIRCUIT_OPEN.
Check: (1) Daily cap is set in klawty.json, (2) Model routing is correct โ health checks should use nano tier, (3) No stuck task loops โ check for tasks retrying infinitely. Run: sqlite3 data/tasks.db "SELECT model, SUM(cost_usd) FROM costs WHERE date(created_at)=date('now') GROUP BY model"
The dedup engine should prevent this. Check: (1) dedup.js is in runtime/, (2) Discovery caps are set in AGENT.md frontmatter (default: maxDiscoveryPerDay: 8), (3) Task titles are similar enough to trigger dedup (70% word overlap threshold).