January 28, 2021
Determined is an open-source deep learning training platform that makes building models fast and easy, featuring seamless distributed training and efficient hyperparameter tuning. With its focus on the training portion of model development, Determined works with other best in breed tools for serving models after they have been trained in Determined. Algorithmia, a leading machine learning operations (MLOps) platform, integrates perfectly with Determined to allow users of both platforms to train their models and easily serve them at scale, delivering value from AI for their businesses.
This blog post will show you how to use Algorithmia and Determined together in a streamlined workflow to train a deep-learning model and then put it into production at scale.
The machine learning pipeline consists of many components—in particular, model development and model serving. These components typically present a challenge to both infrastructure and model developer teams, as they require a balance between managing complex hardware while allowing users flexibility in developing and serving models. Determined and Algorithmia both tackle these challenges by simplifying management of the underlying infrastructure of training and serving models, respectively, while still enabling advanced capabilities in these areas for users.
Determined provides a platform to manage a cluster that can train your models quickly with distributed training, tune them with advanced hyperparameter search, and manage the most performant checkpoints for your trained models. Although using these checkpoints locally is straightforward, most users will want to deploy their best models to a more scalable endpoint. This is where Algorithmia comes in.
Algorithmia is MLOps software that manages all stages of the ML lifecycle within existing operational processes. With Algorithmia, teams and enterprises can put models into production quickly, securely, and cost-effectively. Algorithmia automates ML deployment, optimizes collaboration between operations and development, leverages existing SDLC and CI/CD systems, and provides advanced security and governance—so companies can get their models out of the lab and into production, delivering value from AI for their businesses. Algorithmia enables fast deployment of serverless code, which simplifies deploying your machine learning models at scale. Since Algorithmia manages the hardware for serving behind the scenes, you simply provide the code you’d like to execute and Algorithmia can scale that execution on CPU and GPU enabled hardware.
The Algorithmia-Determined integration facilitates the interaction between Determined and Algorithmia. Instead of only running inference locally or manually setting up an endpoint and maintaining it, the Algorithmia-Determined integration allows you to seamlessly deploy a model trained on Determined to an endpoint on Algorithmia. This provides a few key advantages:
The way the integration works is straightforward: Users continue to train and tune models in Determined as they wish. Then, once the model is trained, the final model checkpoint is pushed to Algorithmia, along with inference code, to create the serving endpoint.
Determined provides best-in-class model training—producing model artifacts that can be used in many downstream applications. Paired with Algorithmia, you can train a deep-learning model and serve it at scale in a few easy steps.
In this example, we’ll show how you can get started training an object detection model on Determined and then create an endpoint on Algorithmia to scale out serving the model.
The example first reviews how to deploy a Determined cluster and train your first model on Determined. Once you have a model trained, you can retrieve the best checkpoint for the model with Determined’s checkpoint API:
checkpoint = Determined().get_experiment(experiment_id).top_checkpoint()
model = Determined().create_model(MODEL_NAME)
model.register_version(checkpoint.uuid)
Then, it’s a straightforward process to run prediction locally using this checkpoint and Determined’s predict function:
model = Determined().get_model(MODEL_NAME)
trial = model.get_version().load()
inference_model = trial.model
from predict import predict
predict(inference_model, 'test.jpg', inference="local")
However, what we really want to do is scale out our serving. To do this, we can create a new algorithm on Algorithmia:
algo_utility.create_algorithm("pytorch-1.5.x")
Then, we can clone it locally:
algo_utility.clone_algorithm_repo()
With the repo cloned, we can update the serving code with our predict function and push it to finalize the serving endpoint:
algo_utility.push_algo_script_with_dependencies(filenames=[
f"{ALGORITHM_NAME}.py",
"predict.py",
])
Once the algorithm code has been pushed, we can easily make a prediction, with the endpoint being hosted by Algorithmia on the back end:
algo_result = algo_utility.call_latest_algo_version({
"img_path": TEST_IMG_PATH
})
Ready to try out Determined and Algorithmia for yourself? Get started with our most recent example, which walks you through training an object detection model on the Determined platform and serving the model on Algorithmia.
If you’re interested in learning more about Determined, check us out on Github or join our Slack community. We’re always looking to provide more examples to help users integrate Determined with other tools, so if you have any requests or suggestions, let us know!
And to learn more about Algorithmia, check out this additional step-by-step tutorial for deploying a model from a Jupyter notebook into production at scale and explore the product in greater depth.
This piece was written in collaboration with Aslı Sabancı, an Applied Machine Learning Engineer at Algorithmia.