# MediaPipe — Cross-Platform ML Solutions by Google > A framework for building multimodal applied ML pipelines, providing ready-to-use solutions for face detection, hand tracking, pose estimation, object detection, and text classification across mobile, web, and desktop. ## Install Save as a script file and run: # MediaPipe — Cross-Platform ML Solutions by Google ## Quick Use ```bash pip install mediapipe python -c "import mediapipe as mp; print(mp.__version__)" ``` ## Introduction MediaPipe is Google's framework for building perception pipelines that process video, audio, and sensor data. It provides production-ready ML solutions for common tasks like face detection, hand tracking, and pose estimation, optimized to run in real-time on mobile devices, web browsers, and desktops. ## What MediaPipe Does - Detects faces, hands, and full-body poses in real-time video streams - Classifies images, objects, and text with pretrained on-device models - Segments images into foreground and background or semantic categories - Generates face mesh landmarks and hand gesture recognition - Runs ML inference on-device without requiring a server or internet connection ## Architecture Overview MediaPipe uses a graph-based pipeline where processing nodes (calculators) are connected in a directed acyclic graph. Each calculator performs one operation such as image preprocessing, model inference, or post-processing. The framework handles scheduling, synchronization, and memory management across graph nodes. The Solutions API provides high-level wrappers that hide graph complexity for common tasks. ## Self-Hosting & Configuration - Install Python package: `pip install mediapipe` for CPU inference - Use the Solutions API for quick integration: `mp.solutions.hands`, `mp.solutions.face_mesh`, etc. - Configure detection confidence thresholds and model complexity per solution - Deploy on Android via the MediaPipe AAR or on iOS via the framework package - Run in web browsers using the MediaPipe JavaScript or WASM packages ## Key Features - Real-time performance on mobile and edge devices without GPU requirements - 15+ pretrained solutions covering vision, text, and audio tasks - Model Maker tool for fine-tuning models on custom datasets with transfer learning - Cross-platform support: Python, Android, iOS, web (JavaScript), and C++ - On-device inference with no network dependency for privacy-sensitive applications ## Comparison with Similar Tools - **OpenCV** — General-purpose CV library; MediaPipe provides higher-level ML solutions - **TensorFlow Lite** — Lower-level inference runtime; MediaPipe adds pipeline orchestration - **Core ML (Apple)** — Apple-only; MediaPipe runs cross-platform - **ONNX Runtime** — Model inference without pipeline management or prebuilt solutions - **Ultralytics YOLO** — Focused on detection; MediaPipe covers pose, hands, face, and more ## FAQ **Q: Does MediaPipe require a GPU?** A: No. MediaPipe solutions are optimized for CPU inference on mobile and desktop. GPU acceleration is optional and platform-dependent. **Q: Can I train custom models with MediaPipe?** A: Yes. MediaPipe Model Maker supports fine-tuning classification, detection, and text models on your own labeled data. **Q: Does MediaPipe work offline?** A: Yes. All inference runs locally on-device with bundled model weights and no network calls. **Q: Which platforms are supported?** A: Python (Linux, macOS, Windows), Android, iOS, and web browsers via JavaScript and WebAssembly. ## Sources - https://github.com/google/mediapipe - https://ai.google.dev/edge/mediapipe/solutions/guide --- Source: https://tokrepo.com/en/workflows/b379a90f-3c92-11f1-9bc6-00163e2b0d79 Author: Script Depot