December 29, 2020
The goal of segmantic segmentation is to identify groups of pixels of an image that belong together, for instance to make up an object. Algorithms that perform semantic segmentation classifying each pixel of an image as a particular label. Image segmentation is necessary in a variety of applications spanning from autonomous driving to Google’s portrait mode camera setting. FasterRCNN is a popular deep learning network architecture for performing segmentation. Today, we’ll walk through how to train FasterRCNN to perform image segmentation using Determined and PyTorch.
For the example, we’ll be training FasterRCNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation. Thanks to the PyTorch FasterRCNN tutorial, its easy to get started. We will adapt code from this tutorial to run on Determined so that we can easily scale up training, run a hyperaprameter search, and achieve a better final validation IOU.
In advance, we’ve organized the tutorial code in a Determined PyTorch Trial Interface. By organizing the model this way, we can use Determined to track our experiments, scale to distributed training, and do hyperparameter tuning. To get started, you’ll need to install Determined, and configure the Determined cli. The code for this example can be found here.
To run this example, first install Determined either locally or on the cloud. Since, we will be running a hyperparameter search consisting of many training runs, we recommend running on the cloud.
Once you have Determined installed, you can train FasterRCNN and track the progress of training with:
det experiment create const.yaml .
The configuration of this experiment is defined in const.yaml
:
description: fasterrcnn_coco_pytorch_const
data:
url: https://determined-ai-public-datasets.s3-us-west-2.amazonaws.com/PennFudanPed/PennFudanPed.zip
hyperparameters:
learning_rate: 0.005
momentum: 0.9
weight_decay: 0.0005
global_batch_size: 2
searcher:
name: single
metric: val_avg_iou
smaller_is_better: false
max_length:
batches: 800
entrypoint: model_def:ObjectDetectionTrial
For full documentation about how to configure experiments, check out the Determined experiment configuration documentation. Today, we will modify this configuration to run a hyperparameter search.
In our new configuration, called adaptive.yaml
, we will add sweeps of the learning_rate
and momentum
hyperparameters:
hyperparameters:
learning_rate:
type: double
minval: 0.0001
maxval: 0.001
momentum:
type: double
minval: 0.2
maxval: 1.0
We will then configure the searcher with the search algorithm name, the optimization metric and the size of the hyperparameter search. We will use the state-of-the-art ASHA algorithm. We’ll start with a small experiment, running 30 trials of 8 batches of training each.
searcher:
name: adaptive_asha
metric: val_avg_iou
smaller_is_better: false
max_length:
batches: 8
max_trials: 30
The final configuration looks like:
description: fasterrcnn_coco_pytorch_adaptive_search
data:
url: https://determined-ai-public-datasets.s3-us-west-2.amazonaws.com/PennFudanPed/PennFudanPed.zip
hyperparameters:
learning_rate:
type: double
minval: 0.0001
maxval: 0.001
momentum:
type: double
minval: 0.2
maxval: 1.0
weight_decay: 0.0005
global_batch_size: 2
searcher:
name: adaptive_asha
metric: val_avg_iou
smaller_is_better: false
max_length:
batches: 8
max_trials: 30
entrypoint: model_def:ObjectDetectionTrial
This can be run from the command line:
det experiment create adaptive.yaml .
When training has completed, your model should obtain a validation IOU score of ~52.
Next, to further improve the IOU score, we’ll increase the size of the hyperparameter search to run for 300 trials.
searcher:
name: adaptive_asha
metric: val_avg_iou
smaller_is_better: false
max_length:
batches: 8
max_trials: 300
Determined automatically parallelizes our hyperparameter search across multiple machines. Our Determined cluster is configured to spin up up to 40 agents during training, so even though we’re running 300 trials, training only takes minutes.
When training is complete, your model should obtain a validation IOU score of ~67.
We encourage you to give Determined a spin by trying this example or any others available in the Determined repository. If you have any questions along the way, hop on our community Slack or reach out our GitHub – we’d love to help!