Introduction
Maxun is an open-source no-code web scraping platform that lets users visually extract data from websites without writing code. It uses Playwright for browser automation and provides a point-and-click interface to define extraction rules, making web scraping accessible to non-developers while remaining self-hostable for full data control.
What Maxun Does
- Enables visual point-and-click data extraction from any website without coding
- Automates pagination, scrolling, and multi-page crawling with built-in logic
- Exports scraped data as JSON, CSV, or directly into databases via API
- Schedules recurring scraping jobs with cron-based automation
- Provides anti-detection features including proxy rotation and browser fingerprint management
Architecture Overview
Maxun is built on a Node.js backend with a React frontend. It uses Playwright as the browser automation engine to render pages and execute extraction workflows. A PostgreSQL database stores workflow definitions and scraped results. The platform runs headless Chromium instances in Docker containers, with a WebSocket-based real-time preview that shows the browser as users define their extraction rules.
Self-Hosting & Configuration
- Deploy with Docker Compose using the provided configuration with Postgres and Redis services
- Set environment variables in
.envfor database credentials, proxy settings, and API keys - Configure proxy rotation by adding proxy URLs to the designated environment variable
- Adjust concurrency settings to control how many parallel scraping sessions run
- Expose the web UI on your preferred port and secure with a reverse proxy for production use
Key Features
- Visual no-code workflow builder with live browser preview
- Built-in pagination and infinite scroll handling
- Scheduled and recurring scraping with cron expressions
- Proxy support with rotation for anti-blocking
- REST API for programmatic trigger and data retrieval
Comparison with Similar Tools
- Scrapy — Python framework requiring code; Maxun offers a visual no-code interface
- Crawlee — Developer-focused Node.js library vs Maxun's point-and-click approach
- Apify — Cloud SaaS platform; Maxun is fully self-hosted with no per-page costs
- Browse AI — Closed-source cloud tool; Maxun gives you full control of your data
- Firecrawl — API-first crawling for LLMs; Maxun focuses on structured data extraction with visual workflows
FAQ
Q: Does Maxun handle JavaScript-rendered pages? A: Yes. Maxun uses Playwright with full Chromium rendering, so it handles SPAs and dynamic content.
Q: Can I run Maxun on low-resource servers? A: Each scraping session uses a headless browser instance. For production, at least 2 GB RAM per concurrent session is recommended.
Q: How do I avoid getting blocked? A: Maxun supports proxy rotation, request delays, and user-agent randomization to reduce detection risk.
Q: Is there an API to trigger scrapes programmatically? A: Yes, all workflows can be triggered and results retrieved via the REST API.