# MMAction2 — OpenMMLab Video Understanding Toolbox

> MMAction2 provides a modular framework for action recognition, temporal action detection, and spatial-temporal action detection with 20+ methods and support for major video benchmarks.

## Install

Save as a script file and run:

# MMAction2 — OpenMMLab Video Understanding Toolbox

## Quick Use
```bash
pip install mmaction2 mmengine mmcv
python demo/demo.py 
    configs/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py 
    https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.pth 
    demo/demo.mp4 tools/data/kinetics/label_map_k400.txt
```

## Introduction
MMAction2 is the next-generation video understanding toolbox from OpenMMLab. It covers action recognition, temporal action localization, and spatial-temporal action detection, providing a consistent PyTorch-based framework for researchers and practitioners working with video data.

## What MMAction2 Does
- Classifies human actions in video clips using 20+ recognition models
- Localizes action segments temporally within untrimmed videos
- Detects actions in space and time with spatial-temporal models
- Supports skeleton-based action recognition via PoseC3D
- Benchmarks on Kinetics, Something-Something, AVA, and more

## Architecture Overview
MMAction2 uses MMEngine as its training backend with a registry pattern for models, datasets, and pipelines. Recognition models process fixed-length clips through backbones like ResNet3D, SlowFast, or Video Swin Transformer. Temporal detectors use proposal generation and classification stages. All components are configured via Python config files.

## Self-Hosting & Configuration
- Install mmaction2, mmengine, and mmcv via pip
- Download pre-trained checkpoints from the model zoo
- Prepare video datasets in the expected directory structure
- Modify config files for custom class labels and data paths
- Use torchrun for multi-GPU distributed training

## Key Features
- Comprehensive coverage of action recognition paradigms (RGB, flow, skeleton)
- UniFormerV2 and VideoMAE models achieve state-of-the-art on Kinetics
- Modular design allows swapping backbones and temporal heads
- Pre-built data pipelines for common video dataset formats
- Integration with MMDeploy for production model conversion

## Comparison with Similar Tools
- **SlowFast (FAIR)** — reference implementation of the SlowFast network; MMAction2 includes SlowFast plus many other methods
- **PyTorchVideo** — provides video-specific transforms and models; MMAction2 offers a broader set of methods and benchmarks
- **TimeSformer** — single Transformer architecture; MMAction2 supports TimeSformer alongside CNN and hybrid approaches
- **Decord** — video decoding library; MMAction2 uses Decord internally but adds full training and evaluation pipelines

## FAQ
**Q: Can I use MMAction2 for real-time action detection?**
A: Yes. Lightweight models like MobileNetV2-TSM can run in real time on modern GPUs.

**Q: Does it support skeleton-based recognition?**
A: Yes. PoseC3D and ST-GCN models accept skeleton sequences extracted with MMPose.

**Q: What video formats are supported?**
A: MMAction2 reads any format supported by Decord or OpenCV, including MP4, AVI, and MKV.

**Q: Can I fine-tune on my own action classes?**
A: Yes. Update the label map and annotation files, then fine-tune from a Kinetics-pretrained checkpoint.

## Sources
- https://github.com/open-mmlab/mmaction2
- https://mmaction2.readthedocs.io/

---
Source: https://tokrepo.com/en/workflows/asset-bca17f13
Author: Script Depot