top of page

Video Classification - 18x faster

Dataset: Charades Dataset -

The Charades dataset is composed of 9,848 videos of daily indoors activities with an average length of 30 seconds, involving interactions with 46 objects classes in 15 types of indoor scenes and containing a vocabulary of 30 verbs leading to 157 action classes. Each video in this dataset is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacting objects. 267 different users were presented with a sentence, which includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence. In total, the dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos. In the standard split there are7,986 training video and 1,863 validation video.

Machine config: 8 x NVIDIA A100-SXM4-40GB

Model architecture: 3D ResNet50

Remarks w/ Throughput:

  • With DLOP:

    • Configuration: {batch_size: 256, prefetch_factor: 8, num_workers: 16}

    • Average throughput: 436.89 imgs/sec

  • Without DLOP:

    • Configuration: {batch_size: 256, prefetch_factor: 4, num_workers: 32}

    • Average throughput: 23.59 samples/sec

Speedup: ~18x over regular training.


bottom of page