Tianqi Chen on Efficiently Deploying Models on Specialized Hardware

Over the last couple of months, we’ve had fascinating conversations with some of the leaders in the Machine Learning Systems community. We spoke with Dave Patterson about the golden age of specialized hardware, with Joe Hellerstein on the importance of data wrangling to achieve reproducibility, with Alex Ratner on why programmatic data labeling is a vital step in machine learning, and with Determined co-founder and CTO Neil Conway on the challenges of deep learning model development. To wrap up the Determined Podcast Series, we sat down with my CMU colleague and OctoML CTO and co-founder Tianqi “TQ” Chen. TQ is the creator of several widely popular machine learning systems, including XGBoost, Apache MXNet, and Apache TVM, the last of which is the focus of our conversation. You can listen to TQ’s episode on any of the platforms below, and recap our other conversations as well.

Listen on Spotify

Overcast

Listen on Apple Podcasts

Read the full transcript here.


On the end-to-end philosophy of TVM

In a lot of cases where we are building applications, there are always application boundaries. So, the idea is that, say, the hardware vendors will just give you hardware and then people build software libraries on top of that, and then people build an application on top of it. However, more recently we started to see a trend of more need for end-to-end, because hardware designers are starting to design domain specific hardware for, say, running image recognition or speech recognition. Meanwhile, machine learning researchers are also designing new models that try to fit to those specialized accelerators. For example, one important case is Google’s transformer model, which more recently there’s a variant called GPT3, which is really popular. And that model got started mainly because it actually runs very well on all those GPUs and TPUs. So, in some sense, in order to get to the next mile in AI, people are starting to try to co-design the machine learning models and the software and hardware backend together.

On TVM for deployment

TVM originally was mainly focused on deployment. The goal is that, because there are a wide spectrum of hardware devices nowadays, it’s really hard for humans to go and manually optimize machine learning models on each type of device. So TVM aims to provide an automated solution that takes a model from TensorFlow or PyTorch and automatically generates a deployable model. However, more recently, we also started to optimize training-based workloads as well. So, the steps are, you want to build a machine learning model to do something. You collect the data. You prepare the data. You build the model. You train the model. And the last step is deployment. And you’re going to deploy it on a server, or on an edge device like a cell phone, or on some hardware that doesn’t even exist today. The hardware needs a set of instructions, a piece of software to unlock the power of that hardware. And today that’s very manually intensive. So TVM is using machine learning to automatically create that software to unlock the power of that hardware.

On the role of machine learning in TVM itself

We’re using machine learning itself to learn, to predict, if we try this program out how fast it will be. And usually that prediction runs faster than actually running on the target hardware.

On the different types of folks using TVM

There are people who are hardware vendors who want to build better software support for their hardware. For example, in the TVM open source community, we got contributions from, Qualcomm, Nvidia, Intel, basically almost all the hardware vendors that you can think of. There are also end users who are interested in deploying their machine learning applications on different settings, like, deploying their machine learning models on tiny devices, or on server class devices.

On OctoML

OctoML is a startup company that we built around TVM. The goal is to support the open source community, but also build a product on top of it. One of our products we are currently building is called Octomizer, which is basically a service that exposes TVM as a service, so that users can directly upload their models to the service, and it will return to the user a module that the user can then go and plug into their software environment. So, they don’t have to worry about managing the compilation process, as well as compiling against different hardware backends and so on.

Ameet on building an open source company

To get a community behind a project, to get contributions, to get feedback, and also to help with adoption, the machine learning community kind of demands open source. Even if they eventually want to pay for something, they want to get their hands dirty first. That’s a great way for the folks developing the software to get feedback, too. So, it makes the product a lot better, a lot quicker, and it kind of helps to get the word out.

Ameet on the importance of being hardware agnostic

One thing that we are banking on at Determined is this idea that there already is and there will continue to be a proliferation of different sorts of hardware, both for training at deployment. For somebody developing models, it’d be nice to not have to be locked into one piece of hardware, or to have to worry about which one to use when, and be able to benefit from different sorts of them…we think being hardware agnostic is more and more important moving forward, because there’s going to be more and more hardware to be agnostic to. And I think TVM and OctoML are also banking on the assumption that this proliferation is going to continue to get bigger and bigger.

—-

That’s a wrap! Thanks for tuning in to the Determined Podcast Series over the last few weeks, and a special thanks to Craig Smith as well as our spectacular guests. If you missed any of the episodes, you can click on the icons at the top of the page to listen on your preferred streaming platform. Of course, if you’d like to get in touch we suggest joining our growing Slack community, checking out Determined on GitHub, or drop us a line at ai-open-source@hpe.com.