How to Structure your Code to Build a Scalable Deep Learning Model

Most ML code that floats around in open source repositories isn’t built to scale. These two nearly-identical chunks of code both implement the same MNIST classifier in Keras. One of the two will turn into a nightmare as it evolves, the other will scale with almost no changes.

First, some Python boilerplate:

Traditional
def main():
    pass

if __name__ == '__main__':
    main()
Determined
class MNISTTrial(TFKerasTrial):
    def __init__(self, context: TFKerasTrialContext) -> None:
        self.context = context

Now lets load the data we’ll use to train and test our model. Since we’re using a simple dataset (MNIST) this will be quick, but this gets pretty complicated with large, custom datasets.

Traditional
def load_data():
    img_rows, img_cols = 28, 28
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train /= 255
    x_test /= 255
    return (x_train, y_train), (x_test, y_test)
Determined
    def build_training_data_loader(self) -> InputData:
        (train_images, train_labels), _ = mnist.load_data()
        train_images = train_images / 255.0

        return train_images, train_labels

    def build_validation_data_loader(self) -> InputData:
        _, (test_images, test_labels) = mnist.load_data()
        test_images = test_images / 255.0

        return test_images, test_labels

Now the fun part, defining the model itself:

Traditional
def build_model(input_shape):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    return model
Determined
    def build_model(self):
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(3, 3),
                         activation='relu',
                         input_shape=input_shape))
        model.add(Conv2D(64, (3, 3), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(self.context.get_hparam("dropout1")))
        model.add(Flatten())
        model.add(Dense(128, activation='relu'))
        model.add(Dropout(self.context.get_hparam("dropout2")))
        model.add(Dense(num_classes, activation='softmax'))
        model = self.context.wrap_model(model)
        model.compile(
            optimizer=RMSprop(
                lr=self.context.get_hparam("learning_rate"),
            ),
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
        )
        return model

Finally we’ll add the last few pieces we need to actually train the model locally:

Traditional
def main():
    batch_size = 128
    num_classes = 10
    epochs = 12
    input_shape = (28, 28, 1)
    (x_train, y_train), (x_test, y_test) = load_data()
    model = build_model(input_shape)
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)
Determined
description: mnist_tf_keras_const
hyperparameters:
  learning_rate: 1e-4
  dropout1: .25
  dropout2: .5
searcher:
  name: single
  metric: accuracy
  smaller_is_better: true
  max_steps: 50
entrypoint: model_def:MNISTTrial

The one on the left looks like most of the ML code that exists across the internet, including throughout dozens of popular repositories (link to many repos that are like this, StyleGan, Wide ResNets). The one on the right is structured** – it’s built to conform to a specific API, in this case the Determined Trial Interface. Both are almost identical – they took the same amount of time to write, and are essentially the same length. They can both be used to train a model with a single command.

The hard part starts when you need to scale this code out for a larger problem than MNIST. You’ll be working with a large, custom dataset and you’ll need a bigger model to fit that dataset. Suddenly your local machine won’t be able to train the model and you’ll need to investigate distributed training. As you perfect your model you’ll need to perform some sort of hyperparameter search. In most cases, your code will evolve something like this:

alt text

To meet the challenges of your larger problem, you’ll need to layer a lot of complexity onto your code base. Maybe you’ll distribute training using Horovod and a GPU cluster, or maybe you’ll set up a hyperparameter search framework like Ray Tune to tune your model. Each of these steps comes with huge overhead – you’ll need to write new code to integrate these pieces, configure complex environments, and even then without a large IT team to help you you’ll likely run into resiliency issues.

And it gets worse. In reality, your code will probably look more like this:

alt text

Pictured: Smashed code together

As more complexity gets layered in, most of the time there isn’t a way to cleanly separate the different pieces. There will be infrastructure code everywhere, your hyperparameter tuning will get tangled up with your training loops, and you’ll have 6 different places where distributed training logic comes in. Imagine having to hand that off to a coworker! Working on these jumbled code bases can be a nightmare, and you’ve completely lost the clean model definition you had at the beginning. Even worse, the next time you build a model, you’ll go through all of these same challenges again!

Now let’s look at how your structured model will evolve:

alt text

You just need a config file. By conforming to a structured API, you can take advantage of a wide range of tools Determined has to offer (like automatic hyperparameter tuning, distributed training, and fault tolerance), without having to manually integrate all of these pieces into your code. By simply modifying your config file, you’ll be able to specify sophisticated experiments, like cluster wide distributed training, or adaptive hyperparameter searches.

For essentially the same initial investment, you get distributed training, hyperparameter tuning, fault tolerance, better code readability, and built-in cluster management. Determined makes it that easy. Try it out now.

Recent posts

JUN 24, 2020

How to Build an Enterprise Deep Learning Platform, Part Three

JUN 12, 2020

How to Structure your Code to Build a Scalable Deep Learning Model

MAY 27, 2020

How to Build an Enterprise Deep Learning Platform, Part Two

Stay updated