Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsApr 20, 2026·3 min de lectura

Maxun — Self-Hosted No-Code Web Scraping Platform

An open-source no-code platform for web scraping, crawling, and AI data extraction that turns websites into structured APIs.

Introduction

Maxun is an open-source no-code web scraping platform that lets users visually extract data from websites without writing code. It uses Playwright for browser automation and provides a point-and-click interface to define extraction rules, making web scraping accessible to non-developers while remaining self-hostable for full data control.

What Maxun Does

  • Enables visual point-and-click data extraction from any website without coding
  • Automates pagination, scrolling, and multi-page crawling with built-in logic
  • Exports scraped data as JSON, CSV, or directly into databases via API
  • Schedules recurring scraping jobs with cron-based automation
  • Provides anti-detection features including proxy rotation and browser fingerprint management

Architecture Overview

Maxun is built on a Node.js backend with a React frontend. It uses Playwright as the browser automation engine to render pages and execute extraction workflows. A PostgreSQL database stores workflow definitions and scraped results. The platform runs headless Chromium instances in Docker containers, with a WebSocket-based real-time preview that shows the browser as users define their extraction rules.

Self-Hosting & Configuration

  • Deploy with Docker Compose using the provided configuration with Postgres and Redis services
  • Set environment variables in .env for database credentials, proxy settings, and API keys
  • Configure proxy rotation by adding proxy URLs to the designated environment variable
  • Adjust concurrency settings to control how many parallel scraping sessions run
  • Expose the web UI on your preferred port and secure with a reverse proxy for production use

Key Features

  • Visual no-code workflow builder with live browser preview
  • Built-in pagination and infinite scroll handling
  • Scheduled and recurring scraping with cron expressions
  • Proxy support with rotation for anti-blocking
  • REST API for programmatic trigger and data retrieval

Comparison with Similar Tools

  • Scrapy — Python framework requiring code; Maxun offers a visual no-code interface
  • Crawlee — Developer-focused Node.js library vs Maxun's point-and-click approach
  • Apify — Cloud SaaS platform; Maxun is fully self-hosted with no per-page costs
  • Browse AI — Closed-source cloud tool; Maxun gives you full control of your data
  • Firecrawl — API-first crawling for LLMs; Maxun focuses on structured data extraction with visual workflows

FAQ

Q: Does Maxun handle JavaScript-rendered pages? A: Yes. Maxun uses Playwright with full Chromium rendering, so it handles SPAs and dynamic content.

Q: Can I run Maxun on low-resource servers? A: Each scraping session uses a headless browser instance. For production, at least 2 GB RAM per concurrent session is recommended.

Q: How do I avoid getting blocked? A: Maxun supports proxy rotation, request delays, and user-agent randomization to reduce detection risk.

Q: Is there an API to trigger scrapes programmatically? A: Yes, all workflows can be triggered and results retrieved via the REST API.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados