Infrastructure Stack
Production infrastructure across Hetzner, Cloudflare, Vercel, and AWS with full observability.
Overview
The Badland infrastructure spans multiple cloud providers, each chosen for its strengths. Hetzner for cost-effective compute, Cloudflare for edge security and DNS, Vercel for frontend hosting, MongoDB Atlas for managed database, and AWS for additional AI model access. Everything is connected via Tailscale mesh VPN and monitored with Sentry, UptimeRobot, and Langfuse.
This is not a hobby setup — it runs production workloads serving real users 24/7 with CI/CD, automated deployments, monitoring, and incident alerting.
Compute — Hetzner Cloud
Two Hetzner servers form the compute backbone:
- Production Server (nebulatio-hz-prod): Runs the Badland API, staging environment, and handles deployments triggered by GitHub Actions. Ubuntu 24.04 with systemd service management.
- Agent Server (badclaw): Hetzner CPX41 (8 vCPU, 16GB RAM) dedicated to multi-tenant container hosting. Runs 20+ Docker containers with gVisor sandboxing, the mux service, provisioner, and Nginx reverse proxy.
Both servers are managed remotely via Tailscale SSH — no public SSH ports are exposed. Configuration is managed through dotfiles repo with bootstrap scripts for reproducible server setup.
CDN & Security — Cloudflare Pro
Cloudflare Pro handles DNS, WAF, rate limiting, and DDoS protection for all Badland domains. Key configurations include:
- DNS Management: Full DNS for badland.ai (primary) and nebulatio.com (legacy), including wildcard records for *.badland.ai tenant sites
- WAF Rules: Custom rules to protect against common attacks, with a skip rule for /trpc/ endpoints that use long query strings (maxURLLength: 2083)
- Rate Limiting: Per-endpoint rate limits to prevent abuse of AI API endpoints
- SSL/TLS: Full (Strict) mode with automatic certificate management
- Cloudflare Tunnels: Secure ingress for agent.badland.ai without exposing public ports on the server
- Bot Management: Automated bot detection and challenge pages
Frontend Hosting — Vercel
All frontend applications deploy to Vercel with zero-config CI/CD:
- Auto-deploy on push to master: Every merge triggers a production deployment
- Preview deployments: Every PR gets a unique preview URL for testing
- Multiple projects: Chat app (chat.badland.ai), landing page, portfolio, and more
- Edge functions: API routes and middleware run at the edge for low latency
CI/CD — GitHub Actions
Automated pipelines handle quality checks and deployments:
- Backend Quality (PRs): Runs
bun run check(ESLint + TypeScript type-checking) on every pull request. PRs cannot merge if checks fail. - Backend Deploy (master): On push to master, self-hosted GitHub Actions runners SSH into the Hetzner production server, pull latest code, install dependencies, and restart services.
- Frontend Quality: Vercel's built-in build check catches TypeScript and build errors on every PR via preview deployments.
Database — MongoDB Atlas
MongoDB Atlas provides managed database hosting with three separate clusters:
- badland-prod: Production data (conversations, users, settings)
- badland-dev: Development and testing
- badland-staging: Staging environment for pre-production validation
- badclaw: Agent metadata, provisioning records, audit logs
Atlas handles automated backups, point-in-time recovery, monitoring, and alerting. Connection strings are managed via environment variables on each deployment target.
Networking — Tailscale
Tailscale mesh VPN connects all machines in the infrastructure, providing:
- 5+ connected nodes: Home dev (cole-pc), work laptop (roc-xe102101), production server (nebulatio-hz-prod), agent server (badclaw), Mac Mini (Coles-Mac-mini)
- SSH via Tailscale: No public SSH ports — all remote access goes through Tailscale's encrypted WireGuard tunnels
- Inter-service communication: Services communicate using Tailscale IPs (100.x.x.x) for secure internal networking
- MagicDNS: Hostname-based access to all machines without managing DNS records
Code Quality — Greptile
Greptile provides AI-powered code review on every pull request, analyzing changes for bugs, security issues, and style violations with full codebase context. It understands the project's patterns and conventions, catching issues that static linters miss.
Monitoring — Sentry + UptimeRobot + Langfuse
Sentry — Error Tracking
Sentry captures frontend and backend errors with full stack traces, source maps, breadcrumbs, and release tracking. Each deployment creates a Sentry release tied to the git commit, making it easy to correlate errors with specific code changes.
UptimeRobot — Uptime Monitoring
UptimeRobot monitors all public-facing services with 5-minute check intervals. Alerts are sent via email and webhook when services go down. Current monitors cover chat.badland.ai, the API health endpoint, agent.badland.ai, and tenant personal sites.
Langfuse — LLM Observability
Langfuse provides observability into LLM API calls, tracking:
- Token usage and costs per model, per user, per conversation
- Response latency (time-to-first-token, total generation time)
- Trace waterfall views showing the full request lifecycle
- Model comparison analytics
Cloud AI — Amazon Bedrock
Amazon Bedrock provides access to additional model providers beyond direct API integrations. Integrated with the Vercel AI SDK, it enables seamless model switching between direct provider APIs and Bedrock-hosted models without changing application code.