# MMAction2 — OpenMMLab Video Understanding Toolbox > MMAction2 provides a modular framework for action recognition, temporal action detection, and spatial-temporal action detection with 20+ methods and support for major video benchmarks. ## Install Save as a script file and run: # MMAction2 — OpenMMLab Video Understanding Toolbox ## Quick Use ```bash pip install mmaction2 mmengine mmcv python demo/demo.py configs/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.pth demo/demo.mp4 tools/data/kinetics/label_map_k400.txt ``` ## Introduction MMAction2 is the next-generation video understanding toolbox from OpenMMLab. It covers action recognition, temporal action localization, and spatial-temporal action detection, providing a consistent PyTorch-based framework for researchers and practitioners working with video data. ## What MMAction2 Does - Classifies human actions in video clips using 20+ recognition models - Localizes action segments temporally within untrimmed videos - Detects actions in space and time with spatial-temporal models - Supports skeleton-based action recognition via PoseC3D - Benchmarks on Kinetics, Something-Something, AVA, and more ## Architecture Overview MMAction2 uses MMEngine as its training backend with a registry pattern for models, datasets, and pipelines. Recognition models process fixed-length clips through backbones like ResNet3D, SlowFast, or Video Swin Transformer. Temporal detectors use proposal generation and classification stages. All components are configured via Python config files. ## Self-Hosting & Configuration - Install mmaction2, mmengine, and mmcv via pip - Download pre-trained checkpoints from the model zoo - Prepare video datasets in the expected directory structure - Modify config files for custom class labels and data paths - Use torchrun for multi-GPU distributed training ## Key Features - Comprehensive coverage of action recognition paradigms (RGB, flow, skeleton) - UniFormerV2 and VideoMAE models achieve state-of-the-art on Kinetics - Modular design allows swapping backbones and temporal heads - Pre-built data pipelines for common video dataset formats - Integration with MMDeploy for production model conversion ## Comparison with Similar Tools - **SlowFast (FAIR)** — reference implementation of the SlowFast network; MMAction2 includes SlowFast plus many other methods - **PyTorchVideo** — provides video-specific transforms and models; MMAction2 offers a broader set of methods and benchmarks - **TimeSformer** — single Transformer architecture; MMAction2 supports TimeSformer alongside CNN and hybrid approaches - **Decord** — video decoding library; MMAction2 uses Decord internally but adds full training and evaluation pipelines ## FAQ **Q: Can I use MMAction2 for real-time action detection?** A: Yes. Lightweight models like MobileNetV2-TSM can run in real time on modern GPUs. **Q: Does it support skeleton-based recognition?** A: Yes. PoseC3D and ST-GCN models accept skeleton sequences extracted with MMPose. **Q: What video formats are supported?** A: MMAction2 reads any format supported by Decord or OpenCV, including MP4, AVI, and MKV. **Q: Can I fine-tune on my own action classes?** A: Yes. Update the label map and annotation files, then fine-tune from a Kinetics-pretrained checkpoint. ## Sources - https://github.com/open-mmlab/mmaction2 - https://mmaction2.readthedocs.io/ --- Source: https://tokrepo.com/en/workflows/asset-bca17f13 Author: Script Depot