Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 20, 2026·3 min de lecture

MediaPipe — Cross-Platform ML Solutions by Google

A framework for building multimodal applied ML pipelines, providing ready-to-use solutions for face detection, hand tracking, pose estimation, object detection, and text classification across mobile, web, and desktop.

Introduction

MediaPipe is Google's framework for building perception pipelines that process video, audio, and sensor data. It provides production-ready ML solutions for common tasks like face detection, hand tracking, and pose estimation, optimized to run in real-time on mobile devices, web browsers, and desktops.

What MediaPipe Does

  • Detects faces, hands, and full-body poses in real-time video streams
  • Classifies images, objects, and text with pretrained on-device models
  • Segments images into foreground and background or semantic categories
  • Generates face mesh landmarks and hand gesture recognition
  • Runs ML inference on-device without requiring a server or internet connection

Architecture Overview

MediaPipe uses a graph-based pipeline where processing nodes (calculators) are connected in a directed acyclic graph. Each calculator performs one operation such as image preprocessing, model inference, or post-processing. The framework handles scheduling, synchronization, and memory management across graph nodes. The Solutions API provides high-level wrappers that hide graph complexity for common tasks.

Self-Hosting & Configuration

  • Install Python package: pip install mediapipe for CPU inference
  • Use the Solutions API for quick integration: mp.solutions.hands, mp.solutions.face_mesh, etc.
  • Configure detection confidence thresholds and model complexity per solution
  • Deploy on Android via the MediaPipe AAR or on iOS via the framework package
  • Run in web browsers using the MediaPipe JavaScript or WASM packages

Key Features

  • Real-time performance on mobile and edge devices without GPU requirements
  • 15+ pretrained solutions covering vision, text, and audio tasks
  • Model Maker tool for fine-tuning models on custom datasets with transfer learning
  • Cross-platform support: Python, Android, iOS, web (JavaScript), and C++
  • On-device inference with no network dependency for privacy-sensitive applications

Comparison with Similar Tools

  • OpenCV — General-purpose CV library; MediaPipe provides higher-level ML solutions
  • TensorFlow Lite — Lower-level inference runtime; MediaPipe adds pipeline orchestration
  • Core ML (Apple) — Apple-only; MediaPipe runs cross-platform
  • ONNX Runtime — Model inference without pipeline management or prebuilt solutions
  • Ultralytics YOLO — Focused on detection; MediaPipe covers pose, hands, face, and more

FAQ

Q: Does MediaPipe require a GPU? A: No. MediaPipe solutions are optimized for CPU inference on mobile and desktop. GPU acceleration is optional and platform-dependent.

Q: Can I train custom models with MediaPipe? A: Yes. MediaPipe Model Maker supports fine-tuning classification, detection, and text models on your own labeled data.

Q: Does MediaPipe work offline? A: Yes. All inference runs locally on-device with bundled model weights and no network calls.

Q: Which platforms are supported? A: Python (Linux, macOS, Windows), Android, iOS, and web browsers via JavaScript and WebAssembly.

Sources

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires