Introduction
Screenpipe continuously records your screen and microphone on your local machine, then indexes everything with OCR and speech-to-text. The resulting data is exposed via a local API that AI agents can query to understand what you have been doing, enabling context-aware automation without cloud dependency.
What Screenpipe Does
- Captures screen frames at configurable intervals with OCR extraction
- Records audio and transcribes speech to text locally
- Stores all data in a local SQLite database
- Exposes a REST API for searching and retrieving captured context
- Enables AI agents to act based on what you see and hear
Architecture Overview
Screenpipe runs as a background daemon written in Rust. A capture pipeline grabs screen frames and audio buffers, feeds them through OCR (via platform-native engines) and Whisper-based STT models, then stores structured results in a local SQLite database. A built-in HTTP server exposes search and retrieval endpoints. Plugins can subscribe to real-time events for immediate agent triggers.
Self-Hosting & Configuration
- Install via the one-line script or download binaries from Releases
- All data remains on your machine in ~/.screenpipe
- Configure capture resolution, FPS, and audio device in config.toml
- Set retention policies to auto-delete data older than N days
- Runs on macOS, Linux, and Windows
Key Features
- Fully local processing with no data leaving your machine
- Combined screen OCR and audio transcription in one tool
- Plugin system for triggering custom actions on captured events
- REST API for integration with any AI agent or automation
- Low CPU overhead through smart frame-diffing and batched processing
Comparison with Similar Tools
- Rewind.ai — macOS-only closed source; Screenpipe is open source and cross-platform
- Windows Recall — OS-level feature limited to Windows; Screenpipe works everywhere
- anarlog — meeting-focused note-taking; Screenpipe captures all screen activity continuously
- OpenRecall — similar concept but less actively maintained
- ActivityWatch — tracks app usage time; Screenpipe captures actual content via OCR
FAQ
Q: How much disk space does continuous recording use? A: With default settings (1 FPS, compressed), roughly 2-5 GB per day depending on screen resolution and audio duration.
Q: Does it impact system performance? A: Minimal. Frame diffing skips duplicate captures, and transcription runs in batched background threads.
Q: Can I query it from Claude Code or other AI agents? A: Yes. The REST API is agent-friendly and returns structured JSON with timestamps and content.
Q: Is the audio recording always on? A: You control which audio devices are recorded and can pause capture at any time.