Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJun 1, 2026·3 min de lecture

Open-AutoGLM — Open-Source Phone Agent Model and Framework

An open-source AI phone agent model and framework that enables AI to interact with and automate tasks on mobile devices through screen understanding and action generation.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Open-AutoGLM Overview
Commande d'installation directe
npx -y tokrepo@latest install f329821b-5df6-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

Open-AutoGLM is an open-source phone agent model and framework that enables AI to understand mobile screens, plan actions, and execute tasks on Android and iOS devices. It bridges the gap between large language models and real-world mobile device interaction.

What Open-AutoGLM Does

  • Understands mobile device screenshots to identify UI elements and context
  • Plans and executes multi-step tasks on phones through touch and gesture actions
  • Supports both Android and iOS device interaction via platform-specific bridges
  • Handles complex workflows like navigating settings, filling forms, and managing apps
  • Provides a framework for building custom phone automation agents

Architecture Overview

Open-AutoGLM combines a vision-language model for screen understanding with an action planner that generates touch sequences. The screen understanding module processes device screenshots to build a structured representation of the UI. The planner takes this representation plus a natural language instruction and outputs a sequence of actions (tap, swipe, type, scroll). An executor bridges these actions to the target device via ADB (Android) or accessibility APIs (iOS).

Self-Hosting & Configuration

  • Clone the repository and install Python dependencies
  • Connect to a target device via ADB (Android) or configure iOS bridge
  • Download model weights from the project's model hub
  • Configure the target LLM backend in the environment file
  • Run the agent with a natural language task description

Key Features

  • Open-source phone agent with pre-trained screen understanding model
  • Supports natural language task descriptions for intuitive control
  • Cross-platform support for Android and iOS devices
  • Framework for building custom phone automation workflows
  • Extensible action space for new gesture types and interaction patterns

Comparison with Similar Tools

  • Appium — test automation framework that requires coded scripts; AutoGLM uses natural language
  • Android Accessibility Service — platform API for assistive tools; AutoGLM adds AI understanding
  • Computer Use (Anthropic) — desktop-focused; AutoGLM specializes in mobile device interaction
  • AppAgent — similar phone agent; AutoGLM provides a pre-trained model and framework together

FAQ

Q: Does it require root access on the device? A: No. It uses standard ADB for Android and accessibility APIs for iOS, which do not require root.

Q: What models power the screen understanding? A: It uses a custom vision-language model trained on mobile UI datasets. Weights are available for download.

Q: Can it handle any app? A: It works with most standard apps but may struggle with heavily customized UIs or games.

Q: Is it safe to use on my personal phone? A: Use it on a test device or emulator first. The agent executes real taps and gestures.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires