How do I install Open-AutoGLM — Open-Source Phone Agent Model and Framework?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Open-AutoGLM — Open-Source Phone Agent Model and Framework

Introduction

Open-AutoGLM is an open-source phone agent model and framework that enables AI to understand mobile screens, plan actions, and execute tasks on Android and iOS devices. It bridges the gap between large language models and real-world mobile device interaction.

What Open-AutoGLM Does

Understands mobile device screenshots to identify UI elements and context
Plans and executes multi-step tasks on phones through touch and gesture actions
Supports both Android and iOS device interaction via platform-specific bridges
Handles complex workflows like navigating settings, filling forms, and managing apps
Provides a framework for building custom phone automation agents

Architecture Overview

Open-AutoGLM combines a vision-language model for screen understanding with an action planner that generates touch sequences. The screen understanding module processes device screenshots to build a structured representation of the UI. The planner takes this representation plus a natural language instruction and outputs a sequence of actions (tap, swipe, type, scroll). An executor bridges these actions to the target device via ADB (Android) or accessibility APIs (iOS).

Self-Hosting & Configuration

Clone the repository and install Python dependencies
Connect to a target device via ADB (Android) or configure iOS bridge
Download model weights from the project's model hub
Configure the target LLM backend in the environment file
Run the agent with a natural language task description

Key Features

Open-source phone agent with pre-trained screen understanding model
Supports natural language task descriptions for intuitive control
Cross-platform support for Android and iOS devices
Framework for building custom phone automation workflows
Extensible action space for new gesture types and interaction patterns

Comparison with Similar Tools

Appium — test automation framework that requires coded scripts; AutoGLM uses natural language
Android Accessibility Service — platform API for assistive tools; AutoGLM adds AI understanding
Computer Use (Anthropic) — desktop-focused; AutoGLM specializes in mobile device interaction
AppAgent — similar phone agent; AutoGLM provides a pre-trained model and framework together

FAQ

Q: Does it require root access on the device? A: No. It uses standard ADB for Android and accessibility APIs for iOS, which do not require root.

Q: What models power the screen understanding? A: It uses a custom vision-language model trained on mobile UI datasets. Weights are available for download.

Q: Can it handle any app? A: It works with most standard apps but may struggle with heavily customized UIs or games.

Q: Is it safe to use on my personal phone? A: Use it on a test device or emulator first. The agent executes real taps and gestures.

Sources

https://github.com/zai-org/Open-AutoGLM

Open-AutoGLM — Open-Source Phone Agent Model and Framework

Ready-to-run agent install

Introduction

What Open-AutoGLM Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

CodeWhale — Open-Weight AI Coding Agent for the Terminal

Open-Sora — Open-Source Text-to-Video Generation

Reactive Resume — AI-Powered Open-Source Resume Builder

Botpress — Open Source Chatbot and AI Agent Platform