Configs2026年6月1日·1 分钟阅读

Open-AutoGLM — Open-Source Phone Agent Model and Framework

An open-source AI phone agent model and framework that enables AI to interact with and automate tasks on mobile devices through screen understanding and action generation.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Open-AutoGLM Overview
直接安装命令
npx -y tokrepo@latest install f329821b-5df6-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

Open-AutoGLM is an open-source phone agent model and framework that enables AI to understand mobile screens, plan actions, and execute tasks on Android and iOS devices. It bridges the gap between large language models and real-world mobile device interaction.

What Open-AutoGLM Does

  • Understands mobile device screenshots to identify UI elements and context
  • Plans and executes multi-step tasks on phones through touch and gesture actions
  • Supports both Android and iOS device interaction via platform-specific bridges
  • Handles complex workflows like navigating settings, filling forms, and managing apps
  • Provides a framework for building custom phone automation agents

Architecture Overview

Open-AutoGLM combines a vision-language model for screen understanding with an action planner that generates touch sequences. The screen understanding module processes device screenshots to build a structured representation of the UI. The planner takes this representation plus a natural language instruction and outputs a sequence of actions (tap, swipe, type, scroll). An executor bridges these actions to the target device via ADB (Android) or accessibility APIs (iOS).

Self-Hosting & Configuration

  • Clone the repository and install Python dependencies
  • Connect to a target device via ADB (Android) or configure iOS bridge
  • Download model weights from the project's model hub
  • Configure the target LLM backend in the environment file
  • Run the agent with a natural language task description

Key Features

  • Open-source phone agent with pre-trained screen understanding model
  • Supports natural language task descriptions for intuitive control
  • Cross-platform support for Android and iOS devices
  • Framework for building custom phone automation workflows
  • Extensible action space for new gesture types and interaction patterns

Comparison with Similar Tools

  • Appium — test automation framework that requires coded scripts; AutoGLM uses natural language
  • Android Accessibility Service — platform API for assistive tools; AutoGLM adds AI understanding
  • Computer Use (Anthropic) — desktop-focused; AutoGLM specializes in mobile device interaction
  • AppAgent — similar phone agent; AutoGLM provides a pre-trained model and framework together

FAQ

Q: Does it require root access on the device? A: No. It uses standard ADB for Android and accessibility APIs for iOS, which do not require root.

Q: What models power the screen understanding? A: It uses a custom vision-language model trained on mobile UI datasets. Weights are available for download.

Q: Can it handle any app? A: It works with most standard apps but may struggle with heavily customized UIs or games.

Q: Is it safe to use on my personal phone? A: Use it on a test device or emulator first. The agent executes real taps and gestures.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产