# Maxun — Self-Hosted No-Code Web Scraping Platform

> An open-source no-code platform for web scraping, crawling, and AI data extraction that turns websites into structured APIs.

## Install

Save in your project root:

# Maxun — Self-Hosted No-Code Web Scraping Platform

## Quick Use
```bash
git clone https://github.com/getmaxun/maxun.git
cd maxun
cp .env.example .env
docker compose up -d
# Open http://localhost:3000
```

## Introduction
Maxun is an open-source no-code web scraping platform that lets users visually extract data from websites without writing code. It uses Playwright for browser automation and provides a point-and-click interface to define extraction rules, making web scraping accessible to non-developers while remaining self-hostable for full data control.

## What Maxun Does
- Enables visual point-and-click data extraction from any website without coding
- Automates pagination, scrolling, and multi-page crawling with built-in logic
- Exports scraped data as JSON, CSV, or directly into databases via API
- Schedules recurring scraping jobs with cron-based automation
- Provides anti-detection features including proxy rotation and browser fingerprint management

## Architecture Overview
Maxun is built on a Node.js backend with a React frontend. It uses Playwright as the browser automation engine to render pages and execute extraction workflows. A PostgreSQL database stores workflow definitions and scraped results. The platform runs headless Chromium instances in Docker containers, with a WebSocket-based real-time preview that shows the browser as users define their extraction rules.

## Self-Hosting & Configuration
- Deploy with Docker Compose using the provided configuration with Postgres and Redis services
- Set environment variables in `.env` for database credentials, proxy settings, and API keys
- Configure proxy rotation by adding proxy URLs to the designated environment variable
- Adjust concurrency settings to control how many parallel scraping sessions run
- Expose the web UI on your preferred port and secure with a reverse proxy for production use

## Key Features
- Visual no-code workflow builder with live browser preview
- Built-in pagination and infinite scroll handling
- Scheduled and recurring scraping with cron expressions
- Proxy support with rotation for anti-blocking
- REST API for programmatic trigger and data retrieval

## Comparison with Similar Tools
- **Scrapy** — Python framework requiring code; Maxun offers a visual no-code interface
- **Crawlee** — Developer-focused Node.js library vs Maxun's point-and-click approach
- **Apify** — Cloud SaaS platform; Maxun is fully self-hosted with no per-page costs
- **Browse AI** — Closed-source cloud tool; Maxun gives you full control of your data
- **Firecrawl** — API-first crawling for LLMs; Maxun focuses on structured data extraction with visual workflows

## FAQ
**Q: Does Maxun handle JavaScript-rendered pages?**
A: Yes. Maxun uses Playwright with full Chromium rendering, so it handles SPAs and dynamic content.

**Q: Can I run Maxun on low-resource servers?**
A: Each scraping session uses a headless browser instance. For production, at least 2 GB RAM per concurrent session is recommended.

**Q: How do I avoid getting blocked?**
A: Maxun supports proxy rotation, request delays, and user-agent randomization to reduce detection risk.

**Q: Is there an API to trigger scrapes programmatically?**
A: Yes, all workflows can be triggered and results retrieved via the REST API.

## Sources
- https://github.com/getmaxun/maxun
- https://maxun.dev

---
Source: https://tokrepo.com/en/workflows/bcbe0dcf-3cd3-11f1-9bc6-00163e2b0d79
Author: AI Open Source