June 12, 2020
Most ML code that floats around in open source repositories isn’t built to scale. These two nearly-identical chunks of code both implement the same MNIST classifier in Keras. One of the two will turn into a nightmare as it evolves, the other will scale with almost no changes.
First, some Python boilerplate:
def main(): pass if __name__ == '__main__': main()
class MNISTTrial(TFKerasTrial): def __init__(self, context: TFKerasTrialContext) -> None: self.context = context
Now lets load the data we’ll use to train and test our model. Since we’re using a simple dataset (MNIST) this will be quick, but this gets pretty complicated with large, custom datasets.
def load_data(): img_rows, img_cols = 28, 28 (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train /= 255 x_test /= 255 return (x_train, y_train), (x_test, y_test)
def build_training_data_loader(self) -> InputData: (train_images, train_labels), _ = mnist.load_data() train_images = train_images / 255.0 return train_images, train_labels def build_validation_data_loader(self) -> InputData: _, (test_images, test_labels) = mnist.load_data() test_images = test_images / 255.0 return test_images, test_labels
Now the fun part, defining the model itself:
def build_model(input_shape): model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) return model
def build_model(self): model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(self.context.get_hparam("dropout1"))) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(self.context.get_hparam("dropout2"))) model.add(Dense(num_classes, activation='softmax')) model = self.context.wrap_model(model) model.compile( optimizer=RMSprop( lr=self.context.get_hparam("learning_rate"), ), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")], ) return model
Finally we’ll add the last few pieces we need to actually train the model locally:
def main(): batch_size = 128 num_classes = 10 epochs = 12 input_shape = (28, 28, 1) (x_train, y_train), (x_test, y_test) = load_data() model = build_model(input_shape) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0)
description: mnist_tf_keras_const hyperparameters: learning_rate: 1e-4 dropout1: .25 dropout2: .5 searcher: name: single metric: accuracy smaller_is_better: true max_steps: 50 entrypoint: model_def:MNISTTrial
The one on the left looks like most of the ML code that exists across the internet, including throughout dozens of popular repositories (link to many repos that are like this, StyleGan, Wide ResNets). The one on the right is structured** – it’s built to conform to a specific API, in this case the Determined Trial Interface. Both are almost identical – they took the same amount of time to write, and are essentially the same length. They can both be used to train a model with a single command.
The hard part starts when you need to scale this code out for a larger problem than MNIST. You’ll be working with a large, custom dataset and you’ll need a bigger model to fit that dataset. Suddenly your local machine won’t be able to train the model and you’ll need to investigate distributed training. As you perfect your model you’ll need to perform some sort of hyperparameter search. In most cases, your code will evolve something like this:
To meet the challenges of your larger problem, you’ll need to layer a lot of complexity onto your code base. Maybe you’ll distribute training using Horovod and a GPU cluster, or maybe you’ll set up a hyperparameter search framework like Ray Tune to tune your model. Each of these steps comes with huge overhead – you’ll need to write new code to integrate these pieces, configure complex environments, and even then without a large IT team to help you you’ll likely run into resiliency issues.
And it gets worse. In reality, your code will probably look more like this:
As more complexity gets layered in, most of the time there isn’t a way to cleanly separate the different pieces. There will be infrastructure code everywhere, your hyperparameter tuning will get tangled up with your training loops, and you’ll have 6 different places where distributed training logic comes in. Imagine having to hand that off to a coworker! Working on these jumbled code bases can be a nightmare, and you’ve completely lost the clean model definition you had at the beginning. Even worse, the next time you build a model, you’ll go through all of these same challenges again!
Now let’s look at how your structured model will evolve:
You just need a config file. By conforming to a structured API, you can take advantage of a wide range of tools Determined has to offer (like automatic hyperparameter tuning, distributed training, and fault tolerance), without having to manually integrate all of these pieces into your code. By simply modifying your config file, you’ll be able to specify sophisticated experiments, like cluster wide distributed training, or adaptive hyperparameter searches.
For essentially the same initial investment, you get distributed training, hyperparameter tuning, fault tolerance, better code readability, and built-in cluster management. Determined makes it that easy. Try it out now.