Scale Your Model Development on a Budget With GCP Preemptible Instances

Intro

Training deep learning models on volatile cloud instances — where the cloud provider offers their excess infrastructure capacity at a steep discount in exchange for the right to pull the rug on you — is often a great idea: training and tuning models can take days to months of offline computation, so delays due to instance preemption might be acceptable in order to garner the significant cost savings. That is, a great idea in theory. Making AI applications preemptible-friendly in practice is another story for many folks.

In this post, we show how bursting deep learning training workloads on GCP preemptible instances is both easy and flexible with Determined. Spoiler alert: model developers don’t have to account for instance preemption in their model code. No automatic checkpointing to implement, no tricky restart logic, no infrastructure provisioning harness code. Determined does all of that for you.

When Do Preemptibles Make Sense?

Back up. Before we talk about when preemptible instances make sense, we first must ask when the cloud itself makes sense for AI workloads. As we’ve discussed previously, AI infrastructure in the cloud is a thorny topic. While the burstiness of model developers’ computational workloads seems a perfect match with the cloud giants’ on-demand infrastructure offerings, there’s a not-so-minor economic problem: cloud GPUs are so expensive that running AI applications in the cloud often isn’t as cost-effective as owning and operating on-premise infrastructure, at least not yet.

While we wait for cloud GPU pricing to get friendlier, there are still many scenarios where GPUs on the cloud make sense:

  1. Small-scale AI workloads
  2. Small teams willing to pay a high premium to avoid having to operate on-premise infrastructure
  3. Teams with bursty workloads: the higher the variability in capacity needs, the more economic sense it makes to run in the cloud rather than operate on-premise GPUs that sit unused
  4. Teams that normally leverage on-premise infrastructure, but that infrastructure is saturated and the team needs more GPUs now

This last example is a common reason for companies to adopt a hybrid on-premise / cloud infrastructure model: the team maintains on-premise infrastructure capacity that is highly utilized most of the time, and infrastructure “bursts into the cloud” when capacity needs spike.

Because the cloud decision hinges so heavily on the underlying economics, preemptible instances change the game if teams are willing to forfeit reliability and availability guarantees. Given the nature of deep learning training workloads — it can take weeks or even months to train models and find good hyperparameters — many deep learning teams find preemptible instances an attractive infrastructure option. If the price is right, they don’t mind if instance preemptions cause their model to converge next Wednesday rather than next Tuesday.

Just How Much Can You Save?

A lot. GCP’s discount on preemptible GPUs varies by region and GPU type, but, on average, it’s roughly 70% off of on-demand pricing. NVIDIA® Tesla® V100s in multiple regions are discounted more than 70% at the time of writing this blog:

Price per GPU in us-west-1 region
  On-demand price Preemptible price 1 year commitment price
NVIDIA® Tesla® K80 $0.45 $0.135 $0.283
NVIDIA® Tesla® P100 $1.46 $0.43 $0.919
NVIDIA® Tesla® V100 $2.48 $0.74 $1.562

It’s worth noting the sheer magnitude of GPU pricing compared to vCPUs. A cloud instance with 2 vCPUs might cost on the order of ten cents per hour; attaching 2 NVIDIA® Tesla® V100s GPUs costs an additional $4.96 per hour. Even a modest 16 GPU cluster can run you over a quarter million dollars over a year with on-demand pricing.

Granted, not all on-demand pricing is the same — GCP offers sustained and committed use discounts, not to mention custom pricing arrangements that will kick in for GCP’s whale customers — but the ~70% preemptible discount over on-demand actually makes AI in the cloud economically sensible under many utilization scenarios. The discrepancy that we showed between cloud and on-premise investment needed for high utilization scenarios tightens substantially if we assume preemptible instance pricing. That 16 NVIDIA® Tesla® V100 GPU cluster might cost closer to $100k over a year. That’d at least be less than the cost of purchasing the same infrastructure; with preemptible instances, the budgetary numbers are starting to make sense.

What Do Preemptibles Mean for Infrastructure Teams?

Thus far I’ve been avoiding the elephant in the room: instance preemption and what it entails for both infrastructure teams and model developers. GCP offers preemptible instances with a few bitter pills:

  1. There is no guarantee that preemptible GPUs will be available
  2. GCP can terminate a preemptible instance at any time
  3. Preemptible instances will terminate after 24 hours if they weren’t already preempted

While you may be wincing at that 1-2-3 punch, bear in mind that your experience with preemptible instance availability and preemption rates in practice might not be so bad. In my testing creating Determined clusters with tens of preemptible GPUs, I always managed to provision the preemptible GPUs I wanted, and instance preemption always happened around the 24-hour mark. Take it with a grain of salt, of course.

Other research also suggests that preemptible instances exhibit friendly availability and uptime properties. In this study, the authors find that preemptions don’t occur uniformly at random, but instead follow a “bathtub distribution” where preemption risk is high at the outset. In other words, if you don’t get preempted early, you’re likely to be able to hang onto your preemptible instance for most of the maximum 24 hours. Again, add salt to taste: this study does not explicitly cover GPUs.

GCP isn’t likely to publish relevant data on the availability and uptime of preemptible instances, nor is any third party study likely to cover the region, accelerator type, and scale that apply to you. Not to mention, given how early we are with GPUs in the cloud, any study on preemptible GPU availability or preemption rates would likely become obsolete quickly. Our bottom-line recommendation is to try it out with Determined and see if GCP’s preemptible GPU availability regularly meets your capacity needs.

What Do Preemptibles Mean for Model Developers?

Assuming you manage to get your hands on preemptible GPUs, now comes the scary part for model developers. From GCP’s documentation on preemptible VM instances:

If your apps are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly.

We estimate the size of this “if” to be roughly the size of the state of Texas. Now you have to take those long-running model training and hyperparameter tuning jobs that probably weren’t fault-tolerant and refactor them to be fault-tolerant. For many teams, this is standard operating procedure in preparing to move applications onto preemptible instances, because implementing fault tolerance doesn’t always make sense. Nothing comes for free — when instance termination is unlikely, maybe it’s not worth the time, effort, and bug potential to make training jobs and hyperparameter search workloads fault-tolerant. However, when instance termination is a guaranteed frequent occurrence, as is the case for preemptible instances, distributed systems buzzwords like “idempotency” or “fault tolerance” become absolutely essential baseline requirements.

Before migrating application workloads onto preemptible instances, many platforms present it as a given that model developers must refactor their applications to run on preemptible instances. Running Kubeflow on GKE? Pay attention to this:

To get correct results when using preemptible VMs, the steps that you identify as preemptible should either be idempotent (that is, if you run a step multiple times, it will have the same result), or should checkpoint work so that the step can pick up where it left off if it gets interrupted.

Imagine a world where model developers can train and tune their models on preemptible instances without having to refactor their code to account for instance preemptions. This is a world where model developers don’t have to implement tricky harness code that checkpoints automatically and picks up where it left off on fresh preemptible instances, even if preemption occurs at 3am. This is a world where model developers don’t even have to think about whether their cloud compute resources are preemptible or not. Preemptible instances will still impact their lives in some unavoidable ways — instances may not be available, and jobs may require more time to complete given the preemption possibility — but the buck stops there, where we at Determined believe it should.

At this point, model developers might be wondering how this is possible. It comes with Determined’s platform design. From the model developer’s point of view, infrastructure sits beneath a friendly abstraction layer. Because we built our platform to support resource sharing, fair scheduling of training workloads, and ad hoc workload management features like manual pause and resume, our workload execution model naturally clicks with preemptible instances. Next to manual and fair scheduling-based experiment pausing, preemptible cloud instances aren’t special; they are just another beneficiary of Determined’s fault-tolerant execution model.

Test Drive Determined on GCP Preemptible Instances

For the crowd that has to see it to believe it, Determined is open source with freely available cloud-native deployment options. All you need is a GCP account to try it out. Just deploy your Determined cluster with a preemptible flag and your training workloads will execute on preemptible dynamic agents. Bear in mind that workloads will only proceed if preemptible instances are available. If you’re interested in deploying Determined in GCP with a mix of preemptible and normal instances so that you always have some GPUs available (a.k.a Operation Have Your Cake and Eat It Too), we support that too: simply spin up static agents on normal instances, while configuring the dynamic agent pool to be preemptible.

Once your Determined cluster is up and running, get started with one of our tutorials to see single-GPU and distributed training jobs, as well as hyperparameter tuning experiments, execute on preemptible instances. Watch experiments tolerate instance preemptions by picking up where they left off on other agents. The final requirement is to drop us a line and let me know how it goes!

Recent Posts