Quick Use
pip install open-interpreterinterpreter --os(the OS-control flag)- Type natural-language commands; confirm prompts before destructive ops
Intro
Open Interpreter's OS Mode extends the natural-language CLI into full computer control. The agent takes screenshots, sees the screen, and drives any GUI app via clicks, keystrokes, and shell commands. Best for: research / experimentation / one-off automation that touches GUI apps you can't script (Photoshop, Excel macros, Zoom, third-party desktop tools). Works with: macOS, Windows, Linux. Setup time: 5 minutes.
Install + start
pip install open-interpreter
interpreter --osThe first run prompts for your LLM API key (OpenAI default; use --model claude-3-5-sonnet-20241022 for Claude).
Sample session
> Open Photoshop, create a new 1200x630 document, fill it with a navy-to-orange
gradient, and add the text "Q3 Report" centered in white.
[OS Mode takes a screenshot, identifies the dock, clicks Photoshop]
[Wait for app launch...]
[Clicks File > New, types dimensions, creates document]
[Selects gradient tool, picks colors, drags from top-left to bottom-right]
[Adds text layer, types "Q3 Report", aligns center]
> Done. Want me to save the file?Safety prompts
OS Mode asks for confirmation before destructive actions:
About to: Empty Trash (irreversible). Confirm? [y/N]You can preset auto-approve for whitelisted tools:
interpreter --os --auto_run --safe_mode highsafe_mode high rejects file deletion, network calls to unknown hosts, and shell commands containing rm, dd, etc.
When NOT to use OS Mode
- Production automation — use Browser Use (browser only) or platform APIs instead
- Time-critical work — OS Mode latency is ~5-15s per click
- Anything sensitive — the screenshots leave your machine to the LLM
OS Mode shines for one-off, exploratory, "I'd rather describe this than learn the GUI" tasks.
FAQ
Q: Is Open Interpreter free? A: Yes — Apache-2.0 open-source. You bring your own LLM API key (OpenAI / Anthropic / local). Inference cost depends on the model and how visual the task is (vision-capable models are more expensive).
Q: How does this differ from Browser Use? A: Browser Use is browser-only (clicks inside Chrome). OS Mode is whole-OS (any GUI app, terminal, dock). Use Browser Use for web scraping; OS Mode for desktop-app automation. Different latency and reliability profiles.
Q: Will it work on a remote server?
A: Limited — OS Mode needs a screen and input devices. For headless / server contexts, use Open Interpreter's standard mode (no --os) which is shell-only and works on any Linux box.
Source & Thanks
Built by Open Interpreter. Licensed under Apache-2.0.
OpenInterpreter/open-interpreter — ⭐ 60,000+