Engineering

Our Migration from Kubernetes to Nomad

Why we migrated from Kubernetes to Nomad, and how it cut our server count by 40% and deployment times by 87%.

FN Published on 30 Mar 2026 1 min read

Last year, we made the controversial decision to migrate NeuralFlow's orchestration layer from Kubernetes to HashiCorp Nomad. Six months later, we're running 40% fewer servers, our deployment times have dropped from 12 minutes to 90 seconds, and our ops team is sleeping better.

Why We Left Kubernetes

Kubernetes is an incredible platform, but for our workload profile — mixed GPU and CPU jobs with bursty scaling patterns — it introduced more complexity than it solved. Our Kubernetes cluster required 3 full-time engineers to maintain, and debugging production issues often took hours of kubectl gymnastics.

What Nomad Gave Us

Simpler operations — Nomad is a single binary. No etcd, no API server, no controller manager
Native GPU scheduling — First-class GPU resource management without device plugins
Faster deployments — Job file changes deploy in seconds, not minutes
Multi-runtime — We run Docker containers, raw executables, and Java JARs in the same cluster

The Migration Process

We ran both systems in parallel for 3 months, gradually shifting traffic from Kubernetes to Nomad. Each service was migrated independently with a rollback plan. The entire migration was completed with zero customer-facing downtime.


// Nomad job spec for our API gateway
job "api-gateway" {
  datacenters = ["us-east-1a", "us-east-1b"]
  type = "service"

  group "api" {
    count = 6

    network {
      port "http" { to = 8080 }
    }

    task "gateway" {
      driver = "docker"
      config {
        image = "neuralflow/api-gateway:3.0.1"
        ports = ["http"]
      }
      resources {
        cpu    = 2000
        memory = 1024
      }
    }
  }
}

Would we recommend the same move for every team? No. But if your Kubernetes cluster feels like it's running you instead of the other way around, Nomad is worth a serious look.