March 13, 2019
We are entering the golden age of artificial intelligence. Model-driven, statistical AI has already been responsible for breakthroughs on applications such as computer vision, speech recognition and machine translation, with countless more use-cases on the horizon. But if AI is the lynchpin to a new era of innovation, why does the infrastructure it’s built upon feel trapped in the 20th century? Worse, why is advanced AI tooling locked within the walls of a handful of multi-billion-dollar tech companies, inaccessible to anyone else?
Today we’re formally introducing Determined AI, a company that exists to let AI engineers everywhere focus on models, not infrastructure. Determined AI is backed by GV (formerly Google Ventures), Amplify Partners, CRV, Haystack, SV Angel, The House, and Specialized Types.
AI, and specifically deep learning (DL), is becoming the most important computational workload for businesses and industries of all kinds. For example, DL has dramatically advanced the performance of autonomous vehicles at Waymo; DL powers Siri, Apple’s personal assistant that communicates via speech synthesis; and it has revolutionized Facebook’s ability to understand user sentiment. These applications, pioneered by a handful of cutting-edge technology firms, speak to the power of DL, but also the need for it to be accessible to a much wider range of businesses and developers.
These firms have a key advantage when it comes to exploiting the power of deep learning: they have built sophisticated AI-native infrastructure for internal use. Everyone else has to make do with existing tools, which are woefully inadequate for AI-driven application development, as this paradigm is radically different from conventional software development. Indeed, the vast majority of engineers today are forced to cobble together tools that speak non-standard protocols, with non-standard file formats on top of ad-hoc, multi-step workflows. These point solutions lead to enormous complexity and huge amounts of time and productivity lost to inefficiencies. As a result, organizations that depend on advances in AI – like anyone working with vision, speech, or natural language – risk being held back without a radically new approach to AI infrastructure.
At Determined AI, our goal is to power deep learning at the speed of thought. We build specialized software that directly addresses the challenges DL developers struggle with every day. Here’s what engineers can expect from Determined AI:
Given how important deep learning has become – and how different DL is from traditional computational tasks – it is time to rethink how we’re building AI infrastructure from the ground up.
To achieve this, we started by assembling the right people. We believe that building AI-native infrastructure requires a team with a rare combination of skills: a deep understanding of modern AI workloads, but also expertise in building large-scale data-intensive systems. We’re fortunate that our team includes world-leading experts in both domains. Creating an environment where these two groups of people can collaborate and co-design the system together was a key first step.
Next, we adopted two key design principles:
Combining these principles – to build an integrated platform that is specialized for the unique challenges of deep learning – yields massive improvements to both performance and usability. For example, many companies employ cluster schedulers like Kubernetes, Mesos, or YARN, which can be used to run deep learning workloads. However, traditional cluster schedulers and leading DL frameworks have been designed independently, which results in both poor performance and usability. In contrast, we have developed a specialized GPU scheduler that natively understands key deep learning workloads, including distributed training, hyperparameter tuning, and batch inference. This yields dramatically better performance: for example, our software performs hyperparameter tuning more than 50x faster than conventional methods! Moreover, DL workloads on our platform automatically support seamless fault tolerance, dynamic elasticity, and can scale from on-premise resources to cloud capacity on demand.
Although we are announcing the company today, our software has been running on production GPUs for more than a year. Our customers tell us that we have already saved them hundreds of engineering hours per person per year and hundreds of thousands of GPU-hours across their teams. However, there is still much work to be done to reinvent the software stack for the AI-native era ahead, and we’re excited to build that future together with our customers.
There are many reasons to be optimistic about the enormous potential of AI, but to realize that potential, AI development must be broadly accessible in the same way that software development is accessible today. Anyone should be able to apply AI to problems that they’re working to solve, and we’re excited to be a part of that journey.