Run Jobs in
AWS, GCP and Azure
using Scaletorch to speed-up
Scaletorch DLOP technology runs as a virtual appliance using spot CPU or CPU+FPGA instances in conjunction with the GPU VMs in the cloud.
Scaletorch works with any data source such as filesystems, S3, GCS, Azure Blob, etc.
Multiple DLOP Instances are launched to optimize the cost to performance ratio.
Run your training scripts using Scaletorch and see massive speedups
STEP - 1
Connect To Cloud
Use your cloud accounts: stick to something you are used to.
Scaletorch will connect to your account in AWS, GCP or Azure. You can use one cloud or multiple clouds simultaniously.
Specify regions you want to operate in or go worldwide.
STEP - 2
Configure Data Source
Virtual Mounts: Store the data anywhere you want. We don’t require you to save your data on a particular cloud or in Scaletorch. Our Virtual Mount technology presents any remote data source as a local drive. We use a combination of encryption, caching and prefetching to make Virtual Mounts secure and high-performing.
Support: S3, GCS, Google Drive, HTTPS, SFTP, Azure
To help configure how to setup virtual mounts and artifacts storage, click here
STEP - 3
Point Us To Your
To get started with training your models, we require your code as a zip file or repo details of where the code is stored.
Support: GitHub, Gitlab, Bitbucket, S3, Azure, Gsutil, Google Drive and Dropbox
STEP - 4
Run your training job and benefit from insane speedup
Trials parallelization: Run your script with multiple variations of hyperparameters in parallel and find the best combination faster!
Multi-cloud: Register cloud accounts in multiple providers (e.g. AWS, GCP, Azure) and Scaletorch will create a hypercloud based on the all available GPU capacity that you have in all three clouds. No more quota issues!
Pick the GPUs, specify the epoch time and run the job. Scaletorch will automatically do the speed-up and orchestration.