Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsApr 20, 2026·3 min de lectura

Maxun — Self-Hosted No-Code Web Scraping Platform

An open-source no-code platform for web scraping, crawling, and AI data extraction that turns websites into structured APIs.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Maxun
Comando de instalación directa
npx -y tokrepo@latest install bcbe0dcf-3cd3-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

Maxun is an open-source no-code web scraping platform that lets users visually extract data from websites without writing code. It uses Playwright for browser automation and provides a point-and-click interface to define extraction rules, making web scraping accessible to non-developers while remaining self-hostable for full data control.

What Maxun Does

  • Enables visual point-and-click data extraction from any website without coding
  • Automates pagination, scrolling, and multi-page crawling with built-in logic
  • Exports scraped data as JSON, CSV, or directly into databases via API
  • Schedules recurring scraping jobs with cron-based automation
  • Provides anti-detection features including proxy rotation and browser fingerprint management

Architecture Overview

Maxun is built on a Node.js backend with a React frontend. It uses Playwright as the browser automation engine to render pages and execute extraction workflows. A PostgreSQL database stores workflow definitions and scraped results. The platform runs headless Chromium instances in Docker containers, with a WebSocket-based real-time preview that shows the browser as users define their extraction rules.

Self-Hosting & Configuration

  • Deploy with Docker Compose using the provided configuration with Postgres and Redis services
  • Set environment variables in .env for database credentials, proxy settings, and API keys
  • Configure proxy rotation by adding proxy URLs to the designated environment variable
  • Adjust concurrency settings to control how many parallel scraping sessions run
  • Expose the web UI on your preferred port and secure with a reverse proxy for production use

Key Features

  • Visual no-code workflow builder with live browser preview
  • Built-in pagination and infinite scroll handling
  • Scheduled and recurring scraping with cron expressions
  • Proxy support with rotation for anti-blocking
  • REST API for programmatic trigger and data retrieval

Comparison with Similar Tools

  • Scrapy — Python framework requiring code; Maxun offers a visual no-code interface
  • Crawlee — Developer-focused Node.js library vs Maxun's point-and-click approach
  • Apify — Cloud SaaS platform; Maxun is fully self-hosted with no per-page costs
  • Browse AI — Closed-source cloud tool; Maxun gives you full control of your data
  • Firecrawl — API-first crawling for LLMs; Maxun focuses on structured data extraction with visual workflows

FAQ

Q: Does Maxun handle JavaScript-rendered pages? A: Yes. Maxun uses Playwright with full Chromium rendering, so it handles SPAs and dynamic content.

Q: Can I run Maxun on low-resource servers? A: Each scraping session uses a headless browser instance. For production, at least 2 GB RAM per concurrent session is recommended.

Q: How do I avoid getting blocked? A: Maxun supports proxy rotation, request delays, and user-agent randomization to reduce detection risk.

Q: Is there an API to trigger scrapes programmatically? A: Yes, all workflows can be triggered and results retrieved via the REST API.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados