Developing a DL model is a sprint,
not a marathon anymore.
DAP - Training On Steroids
Hours will transform into minutes
DAP (Disaggregated Asynchronous Processing Engine), an engine that relies on asynchronous and disaggregated execution of Pytorch training workloads. This results in training running 4x-30x faster on the same number of GPUs.
A video AI startup used Scaletorch's DAP* engine to train a pose detection model for their crowd control feature and went from 4 hours per epoch to 13 minutes.
How it works? 4D Scaling!
Unique Scaletorch invention.
Existing PyTorch code with zero code change speeds up by 4x – 30x on the same number of GPUs.
Automatic Multi-Node training that linearly scales to 128 GPUs
100s of trials of the experiment run in parallel.
Time for 100 trials equals to time for 1 trial
Get more GPUs across clouds
Privacy & Security
Scaletorch is purely software that works with your cloud accounts/VPCs as well as your existing data sources (S3, Azure Blob, HTTPS, GCS, etc).
Scaletorch isn't a cloud provider and hence no data or code flows through or is stored in Scaletorch
Use your cloud accounts
Connect clouds that you are already using (AWS, GCP, Azure) to Scaletorch.
Scaletorch uses your cloud accounts/VPCs to create GPU VMs, execute the training script, and then perform a cleanup after the training job has completed.
Data from any source is automatically streamed into the GPU VMs, with Scaletorch purely working like an orchestrator
Use one or multiple clouds.
Connect to any of your existing data source (S3, Google Drive, Google Storage, HTTPS, etc.)
Virtual mounts present any data source as a local folder and leverages encryption, caching and prefetching to make the process secure and fast
Virtual Mounts are created directly inside the GPU VMs that run in your cloud infrastructure. Hence, no data passes through Scaletorch
Privacy and Security
Operate inside your infrastructure. Use your existing cloud accounts and storage.
Decrease cloud costs by 30x
Fast training naturally leads to smaller charge from cloud providers, since they charge on an hourly basis.
Get results faster using our DAP engine and run more experiments.
Zero code change. Seriously.
Launch Pytorch script with our Web App or CLI, and we will take care of the rest.
Present cloud storage as a local disk and scale experiments in one click.