Scaletorch | AI Training 10x-200x Faster

AI Training 10x - 200x faster

Introducing the
Deep Learning Offload Processor

Reduce AI cost by 5x - 100x with zero code change.

The Scaletorch Deep Learning Offload Processor is technology to speedup deep learning training. It is available purely as a software appliance or as a software+hardware combination.

The solution works in conjunction with GPUs and other deep learning accelerators (such as TPUs, IPUs) to transparently speed up AI training by 10x-200x without any changes to your Pytorch script and set up.

Scaletorch can be run in AWS, Google Cloud, Azure as well as On-premise.

Technology

The speed up with the same number of GPUs using the Scaletorch Deep Learning Offload Processor (DLOP)

31x

Audio Recognition Workload with a Resnet-101 Model

19x

Medical Imaging workload with a Unet 2D model

18x

Video Pose Detection workload using 3D Resnet 50

12x

3D Medical Imaging using Unet

Developing a DL model is a sprint,
not a marathon anymore.

DLOP - Training On Steroids
Hours will transform into minutes

Deep Learning Offload Processor runs your training scripts running 10x - 200x faster on the same number of GPUs.

A video AI startup used Scaletorch's DLOP to train a pose detection model for their crowd control feature and went from 4 hours per epoch to 13 minutes.

Scaletorch Editions

On-Premise

For On-Premise setups the Scaletorch DLOP is available as a rack-mounted appliance from 40 to 256 offload cores. Multiple such appliances can be clustered together.

Cloud

Scaletorch DLOP is available as a Virtual Appliance that runs on spot CPU instances in AWS, Azure and Google Cloud. Scaletorch DLOP automatically scales the number of such virtual appliances, so as to optimize the cost to performance ratio.