Developing a DL model is a sprint,
not a marathon anymore.
DAP - Training On Steroids
Hours will transform into minutes
DAP (Disaggregated Asynchronous Processing Engine), an engine that relies on asynchronous and disaggregated execution of Pytorch training workloads. This results in training running 4x-30x faster on the same number of GPUs.
A video AI startup used Scaletorch's DAP* engine to train a pose detection model for their crowd control feature and went from 4 hours per epoch to 13 minutes.
How it works? 4D Scaling!
Unique Scaletorch invention.
Existing PyTorch code with zero code change speeds up by 4x – 30x on the same number of GPUs.
Automatic Multi-Node training that linearly scales to 128 GPUs
100s of trials of the experiment run in parallel.
Time for 100 trials equals to time for 1 trial
Get more GPUs across clouds
Privacy & Security
Store the data
Anywhere you Want
Our Virtual Mount technology presents any remote data source (S3, GCS, Google Drive, HTTPS, etc.) as a local drive. We use a combination of encryption, caching and prefetching to make Virtual Mounts secure and high-performing.
Stick to Something
you are used to
Use your own cloud accounts! Scaletorch will connect to your account and automatically run training jobs faster. All data flow happens inside your existing infrastructure, Scaletorch doesn't save any of your data inside the platform.
Privacy and Security
Operate inside your infrastructure. Use your existing cloud accounts and storage.
Decrease cloud costs by 30x
Fast training naturally leads to smaller charge from cloud providers, since they charge on an hourly basis.
Get results faster using our DAP engine and run more experiments.
Zero code change. Seriously.
Launch Pytorch script with our Web App or CLI, and we will take care of the rest.
Present cloud storage as a local disk and scale experiments in one click.