Infrastructure Stack

Production infrastructure across Hetzner, Cloudflare, Vercel, and AWS with full observability.

Hetzner Cloud Cloudflare Pro Vercel GitHub Actions MongoDB Atlas Tailscale Docker Sentry UptimeRobot Langfuse Greptile Amazon Bedrock

Overview

The Badland infrastructure spans multiple cloud providers, each chosen for its strengths. Hetzner for cost-effective compute, Cloudflare for edge security and DNS, Vercel for frontend hosting, MongoDB Atlas for managed database, and AWS for additional AI model access. Everything is connected via Tailscale mesh VPN and monitored with Sentry, UptimeRobot, and Langfuse.

This is not a hobby setup — it runs production workloads serving real users 24/7 with CI/CD, automated deployments, monitoring, and incident alerting.

Compute — Hetzner Cloud

Two Hetzner servers form the compute backbone:

Production Server (nebulatio-hz-prod): Runs the Badland API, staging environment, and handles deployments triggered by GitHub Actions. Ubuntu 24.04 with systemd service management.
Agent Server (badclaw): Hetzner CPX41 (8 vCPU, 16GB RAM) dedicated to multi-tenant container hosting. Runs 20+ Docker containers with gVisor sandboxing, the mux service, provisioner, and Nginx reverse proxy.

Both servers are managed remotely via Tailscale SSH — no public SSH ports are exposed. Configuration is managed through dotfiles repo with bootstrap scripts for reproducible server setup.

Hetzner Cloud console — badclaw (CPX41) and nebulatio-hz-prod (CPX31) servers in Ashburn, VA

CDN & Security — Cloudflare Pro

Cloudflare Pro handles DNS, WAF, rate limiting, and DDoS protection for all Badland domains. Key configurations include:

DNS Management: Full DNS for badland.ai (primary) and nebulatio.com (legacy), including wildcard records for *.badland.ai tenant sites
WAF Rules: Custom rules to protect against common attacks, with a skip rule for /trpc/ endpoints that use long query strings (maxURLLength: 2083)
Rate Limiting: Per-endpoint rate limits to prevent abuse of AI API endpoints
SSL/TLS: Full (Strict) mode with automatic certificate management
Cloudflare Tunnels: Secure ingress for agent.badland.ai without exposing public ports on the server
Bot Management: Automated bot detection and challenge pages

Cloudflare DNS records for badland.ai — A records, Tunnels, CNAMEs, wildcard, and MX records

Cloudflare WAF — 16 custom security rules including bot challenges, attack path blocks, and rate limiting

Frontend Hosting — Vercel

All frontend applications deploy to Vercel with zero-config CI/CD:

Auto-deploy on push to master: Every merge triggers a production deployment
Preview deployments: Every PR gets a unique preview URL for testing
Multiple projects: Chat app (chat.badland.ai), landing page, portfolio, and more
Edge functions: API routes and middleware run at the edge for low latency

Vercel dashboard — chat, portfolio, landing, terminal, studio, and jamesbsolutions projects with recent deployments

CI/CD — GitHub Actions

Automated pipelines handle quality checks and deployments:

Backend Quality (PRs): Runs bun run check (ESLint + TypeScript type-checking) on every pull request. PRs cannot merge if checks fail.
Backend Deploy (master): On push to master, self-hosted GitHub Actions runners SSH into the Hetzner production server, pull latest code, install dependencies, and restart services.
Frontend Quality: Vercel's built-in build check catches TypeScript and build errors on every PR via preview deployments.

GitHub Actions — Deploy API (Hetzner) workflow with all steps green: lint, type-check, build, deploy in 50s

Database — MongoDB Atlas

MongoDB Atlas provides managed database hosting with three separate clusters:

badland-prod: Production data (conversations, users, settings)
badland-dev: Development and testing
badland-staging: Staging environment for pre-production validation
badclaw: Agent metadata, provisioning records, audit logs

Atlas handles automated backups, point-in-time recovery, monitoring, and alerting. Connection strings are managed via environment variables on each deployment target.

MongoDB Atlas Data Explorer — badland-prod cluster with collections and document view

Networking — Tailscale

Tailscale mesh VPN connects all machines in the infrastructure, providing:

5+ connected nodes: Home dev (cole-pc), work laptop (roc-xe102101), production server (nebulatio-hz-prod), agent server (badclaw), Mac Mini (Coles-Mac-mini)
SSH via Tailscale: No public SSH ports — all remote access goes through Tailscale's encrypted WireGuard tunnels
Inter-service communication: Services communicate using Tailscale IPs (100.x.x.x) for secure internal networking
MagicDNS: Hostname-based access to all machines without managing DNS records

Tailscale admin — 7 machines connected: nebulatio-hz-prod, badclaw, cole-pc, coles-mac-mini, iPhone, raspi

Code Quality — Greptile

Greptile provides AI-powered code review on every pull request, analyzing changes for bugs, security issues, and style violations with full codebase context. It understands the project's patterns and conventions, catching issues that static linters miss.

Monitoring — Sentry + UptimeRobot + Langfuse

Sentry — Error Tracking

Sentry captures frontend and backend errors with full stack traces, source maps, breadcrumbs, and release tracking. Each deployment creates a Sentry release tied to the git commit, making it easy to correlate errors with specific code changes.

Sentry Frontend Insights — throughput, duration, Web Vitals score 91, and project list

UptimeRobot — Uptime Monitoring

UptimeRobot monitors all public-facing services with 5-minute check intervals. Alerts are sent via email and webhook when services go down. Current monitors cover chat.badland.ai, the API health endpoint, agent.badland.ai, and tenant personal sites.

UptimeRobot — 13 monitors tracking Badland services, most at 100% uptime

Langfuse — LLM Observability

Langfuse provides observability into LLM API calls, tracking:

Token usage and costs per model, per user, per conversation
Response latency (time-to-first-token, total generation time)
Trace waterfall views showing the full request lifecycle
Model comparison analytics

Langfuse trace — council-chairman AI call showing latency, tokens, cost, and model response preview

Cloud AI — Amazon Bedrock

Amazon Bedrock provides access to additional model providers beyond direct API integrations. Integrated with the Vercel AI SDK, it enables seamless model switching between direct provider APIs and Bedrock-hosted models without changing application code.

Amazon Bedrock — 59 inference profiles including Claude Opus 4.5/4.6, Nova, Cohere, and Pegasus models