Our Migration from Kubernetes to Nomad
Why we migrated from Kubernetes to Nomad, and how it cut our server count by 40% and deployment times by 87%.
Last year, we made the controversial decision to migrate NeuralFlow's orchestration layer from Kubernetes to HashiCorp Nomad. Six months later, we're running 40% fewer servers, our deployment times have dropped from 12 minutes to 90 seconds, and our ops team is sleeping better.
Why We Left Kubernetes
Kubernetes is an incredible platform, but for our workload profile — mixed GPU and CPU jobs with bursty scaling patterns — it introduced more complexity than it solved. Our Kubernetes cluster required 3 full-time engineers to maintain, and debugging production issues often took hours of kubectl gymnastics.
What Nomad Gave Us
- Simpler operations — Nomad is a single binary. No etcd, no API server, no controller manager
- Native GPU scheduling — First-class GPU resource management without device plugins
- Faster deployments — Job file changes deploy in seconds, not minutes
- Multi-runtime — We run Docker containers, raw executables, and Java JARs in the same cluster
The Migration Process
We ran both systems in parallel for 3 months, gradually shifting traffic from Kubernetes to Nomad. Each service was migrated independently with a rollback plan. The entire migration was completed with zero customer-facing downtime.
// Nomad job spec for our API gateway
job "api-gateway" {
datacenters = ["us-east-1a", "us-east-1b"]
type = "service"
group "api" {
count = 6
network {
port "http" { to = 8080 }
}
task "gateway" {
driver = "docker"
config {
image = "neuralflow/api-gateway:3.0.1"
ports = ["http"]
}
resources {
cpu = 2000
memory = 1024
}
}
}
}
Would we recommend the same move for every team? No. But if your Kubernetes cluster feels like it's running you instead of the other way around, Nomad is worth a serious look.