How it works?
Scaleotrch is the software that connects to existing cloud accounts in AWS, Azure and GCP and runs your existing script without any code change inside your infrastructure. Scaletorch will create VMs with specific GPU types and number of GPUs based on your configuration, and virtually mount the data from your existing cloud storage like S3, Azure Blob, Google Drive and many more. When the training is done, the results are written back to the stprage of your choice and all VMs will be shut down and clean data from cache.
How to start?
Connect to your clouds
Use your cloud accounts: stick to something you are used to. Scaletorch will connect to your account in AWS, GCP or Azure. You can use one cloud or multiple clouds simultaniously.
Specify regions you want to operate in or go worldwide.
Connect Data Storage
Virtual Mounts: Store the data anywhere you want. We don’t require you to save your data on a particular cloud or in Scaletorch. Our Virtual Mount technology presents any remote data source (S3, GCS, Google Drive, HTTPS, etc.) as a local drive. We use a combination of encryption, caching and prefetching to make Virtual Mounts secure and high-performing.
Run experiments using Scaletorch and see massive speed up!
Unique DAP engine to speed up your deep learning model development 4x – 30x faster on the same GPU capacity. DAP will optimize the key bottlenecks of the training procedure in your PyTorch code, using a combination of asynchronous execution and compilation. No code changes needed.
Distributed training: Divide your training workload across multiple host machines while training a huge deep learning model and get 120x speedup.
Trials parallelization: Run your script with multiple variations of hyperparameters in parallel and find the best combination faster! Instead of waiting for 10 trials to run on 1 GPU for 10 hours, run 10 GPUs with trials simultaneously for 1 hour. Benefit from productivity.
Multi-cloud: If you face issues with cloud providers' quotas all the time, there is a way to increase GPU availability. Register cloud accounts in multiple providers (e.g. AWS, GCP, Azure) and Scaletorch will create a hypercloud based on the all available GPU capacity that you have in all three clouds. No more quota issues!
1) Speeding up training by 4x to 30x on the same number of GPUs. Since cloud providers charge for GPUs by the hour, that simply translates to a 4x - 30x cost saving.
2) Spot Instance Automation - Spot Instances cost 70% lower. Scalertorch automates away the headaches of using spot instances.
3) Scaletorch can work with low-cost cloud/object storage such as AWS S3, Azure Blob and GCS. No need to use expensive file and block storage.
We charge $0.30 per GPU per hour on top of what your cloud provider charge you.