Home/Blog/Infrastructure
Infrastructure·March 10, 2026·4 min read

Running AI Infrastructure on a Home Lab: What Actually Works

I maintain 12+ production services on self-hosted hardware — Proxmox, Docker, Cloudflare Tunnels, and local LLMs. Here's the real architecture, costs, and failure points.

Home LabSelf-HostedDockerProxmoxOllamaCloudflare Tunnel

Why Self-Host AI Infrastructure

Cloud bills add up quickly. Running a dozen services across AWS and GCP becomes pricey fast if you're prototyping as a solo technical founder. More critically, I needed total control over my AI inference—no rate limits, no data leaving my network for experimentation, and the freedom to test models without per-token charges.

My home lab hosts all non-customer-facing services. The architecture has stabilized enough to discuss.

Hardware

The core system runs on repurposed gear:

  • Proxmox node — Manages virtualization. Spins up multiple VMs for isolated workloads like development, monitoring, databases, and agent services.
  • GPU node — Dedicated entirely to local LLM inference via Ollama.
  • NAS — Unraid handles storage, media archival, and backups.

Total hardware cost stayed under $2,000 using used enterprise equipment. Monthly power usage runs roughly $40-60, fluctuating based on GPU load.

The VM Architecture

Proxmox slices the hardware into purpose-built VMs:

proxmox
├── dockers    → Docker host for most services (portfolio site, APIs, agents)
├── agenthive  → AgentSSOT + ChromaDB (isolated for stability)
├── webvm      → Public-facing web services via Cloudflare Tunnel
├── blink      → n8n automation workflows
└── monitoring → NetWatcher + alerting

Each VM maintains its own Docker environment. Services within a VM compose together; services across VMs communicate over the internal network. This isolation ensures a runaway container in the development VM doesn't take down the monitoring stack.

Cloudflare Tunnels for Zero-Trust Ingress

No ports are exposed to the internet. Every public-facing service routes through a Cloudflare Tunnel (cloudflared), which establishes an outbound connection to Cloudflare's network and proxies traffic back.

Benefits:

  • No dynamic DNS hassle — Cloudflare manages routing regardless of ISP IP changes.
  • Free SSL — Certificates handled automatically by Cloudflare.
  • DDoS protection — Cloudflare's edge filters traffic before it hits my hardware.
  • Access control — Cloudflare Access policies secure admin panels, requiring authentication before traffic reaches the tunnel.

Each tunnel maps to a specific service. mohsenjahanshahi.com routes to the portfolio site's nginx container, while API endpoints direct to their respective FastAPI containers.

Local LLM Inference with Ollama

Ollama runs on the GPU node and serves models over the internal network. Any service can call it like an OpenAI-compatible API.

What I run locally:

  • Qwen 3 — Primary model for development agent tasks, offering a strong balance of speed and capability.
  • DeepSeek R1 — Handles reasoning-heavy tasks, routed through OpenRouter when local resources aren't enough.
  • Smaller models — Used for embeddings, classification, and quick completions.

The key insight: local inference isn't a replacement for frontier models. It serves as a cost-zero development tier. I prototype and test against Ollama, then switch to Claude or GPT-4 for production workloads needing higher capability. The API interface remains identical, so switching is just a config change.

Docker Compose Patterns That Survive

After testing various deployment approaches, these patterns stuck:

One compose file per service group. Avoid one giant file or one per container. Group services that depend on each other (web + API + database) into a single compose file.

Named volumes for persistence. Bind mounts work but create permission headaches across VM rebuilds. Named volumes survive container recreation cleanly.

Health checks on everything. If a container lacks a health check, you won't know it's broken until a user reports it. Every service gets a health endpoint; every container has a Docker health check hitting it.

Restart policies aren't monitoring. restart: unless-stopped keeps things running but masks underlying problems. NetWatcher alerts on restart loops so I fix root causes instead of accepting flapping.

What Breaks

DNS resolution inside Docker networks. Containers resolving external domains through Docker's embedded DNS server occasionally fail under load. Fix: Explicit DNS configuration in compose files pointing to Cloudflare's 1.1.1.1.

Ollama memory pressure. Loading large models consumes GPU VRAM. If two services request different models simultaneously, Ollama swaps them and response times spike. Fix: Pin the primary model and queue requests needing alternatives.

Proxmox storage fills up silently. VM disk images grow, Docker images accumulate, and logs pile up. Without monitoring, you discover this when a service crashes. Fix: Automated disk usage alerts at 80% threshold, plus weekly Docker image pruning.

Power outages. A UPS buys time for a graceful shutdown but doesn't help with extended outages. All services auto-recover on boot, but database corruption remains a risk. Fix: WAL mode for SQLite, regular pg_dump for PostgreSQL, and accepting that 100% uptime isn't the goal for a home lab.

Cost Comparison

Rough monthly comparison for my workload:

Home Lab Cloud Equivalent
Compute (12 VMs) ~$45 power ~$400 EC2/GCE
LLM inference $0 (local Ollama) ~$150-300 API costs
Storage (4TB) $0 (owned) ~$90 S3/GCS
Monitoring $0 (self-hosted) ~$50 Datadog/etc
Monthly total ~$45 ~$700+

The tradeoff: I spend 2-3 hours/month on maintenance. A cloud deployment would need less hands-on time but cost 15x more.

When Not to Self-Host

Customer-facing services with SLA requirements belong in the cloud. My portfolio site runs on the home lab because downtime is acceptable. VectorData.solutions client workloads run on managed infrastructure because they're not.

The split: development, experimentation, personal tools, and cost-sensitive batch processing on the home lab. Production customer workloads on cloud providers.


More about my infrastructure at mohsenjahanshahi.com.