Recapping our first Women in Infrastructure meetup

Last week, we at Determined AI were honored to sponsor a meetup of the Women in Infrastructure group focused on ML infrastructure. The event featured three fantastic speakers from Yelp, Microsoft, and Twitter.

Rita Zhang, a software engineer on Microsoft’s Azure Cloud Native Compute team, walked us through a demo using GPU-enabled Kubernetes clusters on Azure to train deep learning models.

Lydian Lee, lead infrastructure engineer for Yelp’s Spam and Abuse detection team, discussed Yelp’s model development framework, which enforces production-quality code at each step of the pipeline to avoid costly bugs at deployment.

Cibele Montez Halasz, who works on Twitter Cortex, the company’s machine learning platform, described the company’s move from Lua Torch to Tensorflow, and the resulting challenges and benefits of this transition.

Like many ML leaders I have interviewed over the last few months, the speakers described issues their teams encountered around versioning models, running hyperparameter searches, reproducing prior work, managing datasets, and visualizing results. They then described a few of the solutions they have developed to address these problems. For example:

  • At Microsoft, they’ve leveraged the power of Kubernetes and Helm to sweep a range of hyperparameter values in parallel across multiple machines with a centralized TensorBoard instance to visualize results.
  • At Yelp, they’ve defined each step in the ML model development pipeline as an interface, which allows them to standardize inputs and outputs as well as better reuse existing functionality.
  • At Twitter, they’ve created a wrapper around TensorFlow’s Estimator API to simplify model definition, as well as an internal DataRecord format to help manage their vast amounts of sparse data. While the ML infrastructure at these organizations is impressive, the speakers admitted that it had taken a lot of time and effort to get to where they are today. We’ve heard a similar narrative from our customers: we are still in the early days of widespread industrial adoption of ML (we’ve discussed this point at length in another blog post). Compared to modern software development, we have a long way to go in developing the appropriate tools, APIs, and processes for modern ML engineering. Simply adopting what works for traditional software development is not enough: ML is unique in a variety of ways, which means it requires its own set of engineering best practices.

Besides the fantastic learnings from the evening, I personally loved the opportunity to hang out with such a talented group of female software engineers. I’ve become somewhat used to regularly being the only woman in the room, so it’s refreshing to see the balance flipped. For me, just reminding myself that I’m not actually the only one out there, and moreover that there is some incredible work being done by my female colleagues, is both encouraging and motivating.

I’m proud of my company for supporting this event, not only financially but also with their time by turning out in full force. It shows that fostering diversity is a priority for the entire group and that we’re working hard to create a culture where everyone feels at home.

I’d like to say a huge thank you to my event co-organizer, Vicki Cheung, as well as all three of our speakers. If you’d like to read more about their work, I’ve included some links below:

Keep an eye on the Women In Infrastructure meetup page for announcements about upcoming events!