Home/Blog/AI Infrastructure
AI Infrastructure·February 28, 2026·4 min read

Building a Multi-Channel Agent Runtime from Scratch

Why I built Hairy rather than relying on LangChain. Deep dives into channel abstraction, dynamic skill lifecycle, multi-provider LLM routing, and Rust/Go sidecars for heavy tasks.

AI AgentsTypeScriptRustGoAgent RuntimeLLM

Why I Didn't Use LangChain

The simple truth: I tried it. The real answer: LangChain's abstraction layer introduced overhead without resolving the specific constraints I faced.

My requirements were strict:

  1. An agent behaving identically across Telegram, WhatsApp, webhooks, and the CLI.
  2. A skill system where new capabilities could be tested, promoted, and retired without redeploying the core.
  3. Multi-provider LLM routing (Ollama local, OpenRouter, Anthropic, OpenAI) with automatic failover.
  4. Offloading heavy compute to compiled sidecars, not stuffed into a Python process.

LangChain manages #3 adequately. It fails to address #1, #2, or #4 within my architecture. Instead of battling the framework, I built the runtime I needed.

Channel Abstraction

The key insight: an agent shouldn't care if a message originates from Telegram, a webhook, or a terminal. The channel is purely a transport concern, not an intelligence one.

Telegram → Adapter → Normalized Message → Agent Core → Response → Adapter → Telegram
WhatsApp → Adapter → Normalized Message → Agent Core → Response → Adapter → WhatsApp
Webhook  → Adapter → Normalized Message → Agent Core → Response → Adapter → Webhook
CLI      → Adapter → Normalized Message → Agent Core → Response → Adapter → CLI

Each adapter manages platform specifics: Telegram's format quirks, WhatsApp's media handling, webhook auth, CLI output. The normalized message carries sender identity, text, optional media, conversation context, and metadata.

The agent core processes this normalized input and returns a normalized response. Adapters translate the output back into the platform's native format.

Adding a new channel, like Discord or Slack, requires only a new adapter. The agent's internal behavior remains untouched.

Skill Lifecycle

Static tool lists are too rigid. I wanted evolving skills: new capabilities added during operation, tested in staging, promoted to production, and retired when obsolete.

The lifecycle flows like this:

Draft → Active → Retired
  ↑       ↓
  └── Rejected

Draft — A skill registered by an operator or found by the agent. It's available for testing but excluded from production conversations.

Active — Promoted after validation. The agent invokes this skill during requests.

Retired — Superseded or unnecessary. Kept in the registry for reference but excluded from active routing.

Each skill is a JSON definition:

{
  "name": "check_disk_usage",
  "description": "Check disk usage on monitored hosts",
  "trigger_patterns": ["disk", "storage", "space"],
  "handler": "infrastructure.disk_check",
  "status": "active",
  "promoted_at": "2026-02-15",
  "usage_count": 47,
  "last_used": "2026-03-25"
}

The agent relies on descriptions and trigger patterns for routing. Usage metrics reveal which skills add value and which are dead weight.

Multi-Provider LLM Routing

Different tasks demand different models. A quick status check doesn't need Claude Opus, while a complex architecture query shouldn't run on a 7B local model.

The routing logic follows:

Estimate request complexity → Select provider → Execute → Fail over on error

Local Ollama (Qwen 3) — Default for fast, low-stakes work. Status checks, simple lookups, formatting. Delivers sub-second responses over the local network.

OpenRouter (DeepSeek R1) — For reasoning-heavy tasks. Multi-step problem solving, code generation, and analysis.

Anthropic (Claude) — High-stakes outputs. Customer-facing replies, document generation, complex tool use.

OpenAI (GPT-4) — Fallback and specific tool-calling patterns where GPT-4's function calling is preferred.

If the selected provider fails (timeout, rate limit, error), the system drops to the next tier. A local Ollama failure routes to OpenRouter; an OpenRouter failure routes to Anthropic. The user sees a response regardless; the provider is an implementation detail.

Rust and Go Sidecars

Some tasks don't belong in a TypeScript event loop:

  • PDF parsing — Extracting text and structure from large engineering documents
  • Image processing — Resizing, format conversion, OCR prep
  • Data crunching — Statistical analysis on market data, time-series aggregation

These run as sidecar processes. The agent sends work to them via JSON-RPC over Unix sockets locally or HTTP across machines.

Go sidecar — Handles network operations, concurrent HTTP clients, and WebSocket management. Go's goroutine model manages high-concurrency I/O cleanly.

Rust sidecar — CPU-bound processing. PDF parsing, data transformation, anything where raw throughput matters.

The agent doesn't wait for sidecar results synchronously. It dispatches work, processes other messages, and collects results when ready. This keeps the agent responsive even while heavy compute runs.

What I'd Do Differently

Start with fewer channels. I built four adapters before the agent was stable. I should have started with the CLI, validated the core, then added Telegram.

Skill definitions need versioning. When a skill's handler changes, old conversations referencing previous behavior get confused. Versioned skills with migration paths would have saved debugging time.

Centralized logging from day one. Distributed across multiple channels and sidecars, debugging required checking four different log streams. A centralized aggregator should have been the first infrastructure piece, not an afterthought.


Hairy is in active development. Follow progress at github.com/maddefientist/hairy.