# shot-scraper — Headless Chrome Screenshots from CLI > Simon Willison's CLI for screenshots + HTML capture from any URL via headless Chrome. Selector cropping, JS injection, YAML batch, cron snapshots. ## Install Copy the content below into your project: ## Quick Use 1. `pip install shot-scraper && shot-scraper install` 2. `shot-scraper https://url --selector .target -o out.png` 3. Batch via YAML: `shot-scraper multi shots.yml` --- ## Intro shot-scraper is Simon Willison's CLI wrapper around Playwright that captures screenshots of any URL via headless Chrome — full page, element selector cropping, JS injection, batch YAML configs, automated GitHub Actions cron snapshots for tracking page changes over time. Best for: documentation screenshots, visual regression tests, OG image generation, scheduled scraping of public pages. Works with: any OS, Python 3.10+. Setup time: 3 minutes. --- ### Install + first shot ```bash pip install shot-scraper shot-scraper install # downloads Chromium # Full-page screenshot shot-scraper https://tokrepo.com -o tokrepo.png ``` ### Element + selector ```bash # Just the hero section shot-scraper https://tokrepo.com --selector ".hero" -o hero.png # Wait for an element to appear before capturing shot-scraper https://tokrepo.com --wait-for "document.querySelector('.assets').children.length > 5" ``` ### Inject JS before screenshot ```bash # Hide cookie banner, then shoot shot-scraper https://example.com \ --javascript "document.querySelector('.cookie-banner').remove()" \ -o cleaned.png ``` ### Batch YAML config ```yaml # shots.yml - url: https://tokrepo.com output: home.png width: 1280 height: 800 - url: https://tokrepo.com/en/packs output: packs.png selector: ".arsenal" width: 1200 - url: https://tokrepo.com/en/authors output: authors.png full_page: true ``` ```bash shot-scraper multi shots.yml ``` ### GitHub Actions cron for change tracking ```yaml # .github/workflows/snapshot.yml on: schedule: [{ cron: "0 8 * * *" }] jobs: snap: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: pip install shot-scraper && shot-scraper install - run: shot-scraper multi shots.yml - run: git add . && git diff --cached --stat && git commit -m "Daily snapshot $(date +%F)" && git push ``` --- ### FAQ **Q: shot-scraper vs raw Playwright?** A: shot-scraper is the 80% case in 1 line. Use Playwright directly when you need authentication flows, complex form fills, or multi-step navigation. shot-scraper wraps Playwright so falling back is one step away. **Q: Can it capture HTML too, not just images?** A: Yes — `shot-scraper html https://example.com -o page.html` saves rendered HTML after JS. Combine with `--javascript` to run scrape logic before extraction. Useful for SPA scraping. **Q: Cookies / auth?** A: Pass `--auth auth.json` with a Playwright storage state file. Generate the state file once via `shot-scraper auth https://example.com` which opens a real browser for you to log in. After that, automated shots run authenticated. --- ## Source & Thanks > Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0. > > [simonw/shot-scraper](https://github.com/simonw/shot-scraper) — ⭐ 1,700+ --- ## 快速使用 1. `pip install shot-scraper && shot-scraper install` 2. `shot-scraper https://url --selector .target -o out.png` 3. YAML 批量:`shot-scraper multi shots.yml` --- ## 简介 shot-scraper 是 Simon Willison 在 Playwright 之上包的 CLI —— headless Chrome 抓任何 URL 截图:全页、按 selector 裁剪元素、注入 JS、批量 YAML 配置、用 GitHub Actions cron 追踪页面变化。适合文档截图、视觉回归测试、OG 图生成、定时抓公开页面。任何 OS、Python 3.10+ 都行。装机时间 3 分钟。 --- ### 安装 + 第一张图 ```bash pip install shot-scraper shot-scraper install # 下载 Chromium # 整页截图 shot-scraper https://tokrepo.com -o tokrepo.png ``` ### 元素 + 选择器 ```bash # 只截 hero 区 shot-scraper https://tokrepo.com --selector ".hero" -o hero.png # 等某个元素出现再截 shot-scraper https://tokrepo.com --wait-for "document.querySelector('.assets').children.length > 5" ``` ### 截图前注入 JS ```bash # 先隐藏 cookie banner 再截 shot-scraper https://example.com \ --javascript "document.querySelector('.cookie-banner').remove()" \ -o cleaned.png ``` ### 批量 YAML 配置 ```yaml # shots.yml - url: https://tokrepo.com output: home.png width: 1280 height: 800 - url: https://tokrepo.com/en/packs output: packs.png selector: ".arsenal" width: 1200 - url: https://tokrepo.com/en/authors output: authors.png full_page: true ``` ```bash shot-scraper multi shots.yml ``` ### GitHub Actions cron 追踪变化 ```yaml # .github/workflows/snapshot.yml on: schedule: [{ cron: "0 8 * * *" }] jobs: snap: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: pip install shot-scraper && shot-scraper install - run: shot-scraper multi shots.yml - run: git add . && git diff --cached --stat && git commit -m "Daily snapshot $(date +%F)" && git push ``` --- ### FAQ **Q: shot-scraper vs 原生 Playwright?** A: shot-scraper 是 80% 场景一行搞定。需要登录流、复杂表单填写、多步导航就直接用 Playwright。shot-scraper 包了 Playwright,下沉一步就回到原生。 **Q: 能抓 HTML 不只是图片吗?** A: 能 —— `shot-scraper html https://example.com -o page.html` 保存 JS 渲染后的 HTML。配 `--javascript` 在抓取前跑抓取逻辑。SPA 抓取适用。 **Q: Cookie / 鉴权?** A: 用 `--auth auth.json` 传 Playwright storage state 文件。先用 `shot-scraper auth https://example.com` 打开真浏览器登录一次。之后自动截图都带身份。 --- ## 来源与感谢 > Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0. > > [simonw/shot-scraper](https://github.com/simonw/shot-scraper) — ⭐ 1,700+ --- Source: https://tokrepo.com/en/workflows/shot-scraper-headless-chrome-screenshots-from-cli Author: Simon Willison