Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 12, 2026·2 min de lecture

MMAction2 — OpenMMLab Video Understanding Toolbox

MMAction2 provides a modular framework for action recognition, temporal action detection, and spatial-temporal action detection with 20+ methods and support for major video benchmarks.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
MMAction2 Video AI
Commande CLI universelle
npx tokrepo install bca17f13-4ddd-11f1-9bc6-00163e2b0d79

Introduction

MMAction2 is the next-generation video understanding toolbox from OpenMMLab. It covers action recognition, temporal action localization, and spatial-temporal action detection, providing a consistent PyTorch-based framework for researchers and practitioners working with video data.

What MMAction2 Does

  • Classifies human actions in video clips using 20+ recognition models
  • Localizes action segments temporally within untrimmed videos
  • Detects actions in space and time with spatial-temporal models
  • Supports skeleton-based action recognition via PoseC3D
  • Benchmarks on Kinetics, Something-Something, AVA, and more

Architecture Overview

MMAction2 uses MMEngine as its training backend with a registry pattern for models, datasets, and pipelines. Recognition models process fixed-length clips through backbones like ResNet3D, SlowFast, or Video Swin Transformer. Temporal detectors use proposal generation and classification stages. All components are configured via Python config files.

Self-Hosting & Configuration

  • Install mmaction2, mmengine, and mmcv via pip
  • Download pre-trained checkpoints from the model zoo
  • Prepare video datasets in the expected directory structure
  • Modify config files for custom class labels and data paths
  • Use torchrun for multi-GPU distributed training

Key Features

  • Comprehensive coverage of action recognition paradigms (RGB, flow, skeleton)
  • UniFormerV2 and VideoMAE models achieve state-of-the-art on Kinetics
  • Modular design allows swapping backbones and temporal heads
  • Pre-built data pipelines for common video dataset formats
  • Integration with MMDeploy for production model conversion

Comparison with Similar Tools

  • SlowFast (FAIR) — reference implementation of the SlowFast network; MMAction2 includes SlowFast plus many other methods
  • PyTorchVideo — provides video-specific transforms and models; MMAction2 offers a broader set of methods and benchmarks
  • TimeSformer — single Transformer architecture; MMAction2 supports TimeSformer alongside CNN and hybrid approaches
  • Decord — video decoding library; MMAction2 uses Decord internally but adds full training and evaluation pipelines

FAQ

Q: Can I use MMAction2 for real-time action detection? A: Yes. Lightweight models like MobileNetV2-TSM can run in real time on modern GPUs.

Q: Does it support skeleton-based recognition? A: Yes. PoseC3D and ST-GCN models accept skeleton sequences extracted with MMPose.

Q: What video formats are supported? A: MMAction2 reads any format supported by Decord or OpenCV, including MP4, AVI, and MKV.

Q: Can I fine-tune on my own action classes? A: Yes. Update the label map and annotation files, then fine-tune from a Kinetics-pretrained checkpoint.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires