Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsJun 1, 2026·3 min de lectura

Open-AutoGLM — Open-Source Phone Agent Model and Framework

An open-source AI phone agent model and framework that enables AI to interact with and automate tasks on mobile devices through screen understanding and action generation.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Open-AutoGLM Overview
Comando de instalación directa
npx -y tokrepo@latest install f329821b-5df6-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

Open-AutoGLM is an open-source phone agent model and framework that enables AI to understand mobile screens, plan actions, and execute tasks on Android and iOS devices. It bridges the gap between large language models and real-world mobile device interaction.

What Open-AutoGLM Does

  • Understands mobile device screenshots to identify UI elements and context
  • Plans and executes multi-step tasks on phones through touch and gesture actions
  • Supports both Android and iOS device interaction via platform-specific bridges
  • Handles complex workflows like navigating settings, filling forms, and managing apps
  • Provides a framework for building custom phone automation agents

Architecture Overview

Open-AutoGLM combines a vision-language model for screen understanding with an action planner that generates touch sequences. The screen understanding module processes device screenshots to build a structured representation of the UI. The planner takes this representation plus a natural language instruction and outputs a sequence of actions (tap, swipe, type, scroll). An executor bridges these actions to the target device via ADB (Android) or accessibility APIs (iOS).

Self-Hosting & Configuration

  • Clone the repository and install Python dependencies
  • Connect to a target device via ADB (Android) or configure iOS bridge
  • Download model weights from the project's model hub
  • Configure the target LLM backend in the environment file
  • Run the agent with a natural language task description

Key Features

  • Open-source phone agent with pre-trained screen understanding model
  • Supports natural language task descriptions for intuitive control
  • Cross-platform support for Android and iOS devices
  • Framework for building custom phone automation workflows
  • Extensible action space for new gesture types and interaction patterns

Comparison with Similar Tools

  • Appium — test automation framework that requires coded scripts; AutoGLM uses natural language
  • Android Accessibility Service — platform API for assistive tools; AutoGLM adds AI understanding
  • Computer Use (Anthropic) — desktop-focused; AutoGLM specializes in mobile device interaction
  • AppAgent — similar phone agent; AutoGLM provides a pre-trained model and framework together

FAQ

Q: Does it require root access on the device? A: No. It uses standard ADB for Android and accessibility APIs for iOS, which do not require root.

Q: What models power the screen understanding? A: It uses a custom vision-language model trained on mobile UI datasets. Weights are available for download.

Q: Can it handle any app? A: It works with most standard apps but may struggle with heavily customized UIs or games.

Q: Is it safe to use on my personal phone? A: Use it on a test device or emulator first. The agent executes real taps and gestures.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados