The Scaletorch Deep Learning Offload Processor is technology to speedup deep learning training. It is available purely as a software appliance or as a software+hardware combination.
The solution works in conjunction with GPUs and other deep learning accelerators (such as TPUs, IPUs) to transparently speed up AI training by 10x-200x without any changes to your Pytorch script and set up.
Scaletorch can be run in AWS, Google Cloud, Azure as well as On-premise.
The speed up with the same number of GPUs using the Scaletorch Deep Learning Offload Processor (DLOP)
31x
Audio Recognition Workload with a Resnet-101 Model
19x
Medical Imaging workload with a Unet 2D model
18x
Video Pose Detection workload using 3D Resnet 50
12x
3D Medical Imaging using Unet
Developing a DL model is a sprint,
not a marathon anymore.
Scaletorch Editions
On-Premise
For On-Premise setups the Scaletorch DLOP is available as a rack-mounted appliance from 40 to 256 offload cores. Multiple such appliances can be clustered together.
Cloud
Scaletorch DLOP is available as a Virtual Appliance that runs on spot CPU instances in AWS, Azure and Google Cloud. Scaletorch DLOP automatically scales the number of such virtual appliances, so as to optimize the cost to performance ratio.
Why Scaletorch DLOP?
Increase
Productivity
Reduce your model training time from days to hours and from hours to minutes.
Decrease AI compute cost upto 100x
Fast training naturally leads to smaller charge from cloud providers, since they charge on an hourly basis.
Privacy & Security
Operate inside your infrastructure wether it's on-premise or your cloud accounts.
Zero Code Change,
Seriously!
Launch Pytorch script with our CLI, API or Web UI, and we will take care of the rest.
Virtual
Mounts
Scaletorch DLOP connects to your filesystems, object stores or any remote data source.