# shot-scraper — Headless Chrome Screenshots from CLI

> Simon Willison's CLI for screenshots + HTML capture from any URL via headless Chrome. Selector cropping, JS injection, YAML batch, cron snapshots.

## Install

Copy the content below into your project:

## Quick Use

1. `pip install shot-scraper && shot-scraper install`
2. `shot-scraper https://url --selector .target -o out.png`
3. Batch via YAML: `shot-scraper multi shots.yml`

---

## Intro

shot-scraper is Simon Willison's CLI wrapper around Playwright that captures screenshots of any URL via headless Chrome — full page, element selector cropping, JS injection, batch YAML configs, automated GitHub Actions cron snapshots for tracking page changes over time. Best for: documentation screenshots, visual regression tests, OG image generation, scheduled scraping of public pages. Works with: any OS, Python 3.10+. Setup time: 3 minutes.

---

### Install + first shot

```bash
pip install shot-scraper
shot-scraper install   # downloads Chromium

# Full-page screenshot
shot-scraper https://tokrepo.com -o tokrepo.png
```

### Element + selector

```bash
# Just the hero section
shot-scraper https://tokrepo.com --selector ".hero" -o hero.png

# Wait for an element to appear before capturing
shot-scraper https://tokrepo.com --wait-for "document.querySelector('.assets').children.length > 5"
```

### Inject JS before screenshot

```bash
# Hide cookie banner, then shoot
shot-scraper https://example.com \
  --javascript "document.querySelector('.cookie-banner').remove()" \
  -o cleaned.png
```

### Batch YAML config

```yaml
# shots.yml
- url: https://tokrepo.com
  output: home.png
  width: 1280
  height: 800

- url: https://tokrepo.com/en/packs
  output: packs.png
  selector: ".arsenal"
  width: 1200

- url: https://tokrepo.com/en/authors
  output: authors.png
  full_page: true
```

```bash
shot-scraper multi shots.yml
```

### GitHub Actions cron for change tracking

```yaml
# .github/workflows/snapshot.yml
on:
  schedule: [{ cron: "0 8 * * *" }]
jobs:
  snap:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install shot-scraper && shot-scraper install
      - run: shot-scraper multi shots.yml
      - run: git add . && git diff --cached --stat && git commit -m "Daily snapshot $(date +%F)" && git push
```

---

### FAQ

**Q: shot-scraper vs raw Playwright?**
A: shot-scraper is the 80% case in 1 line. Use Playwright directly when you need authentication flows, complex form fills, or multi-step navigation. shot-scraper wraps Playwright so falling back is one step away.

**Q: Can it capture HTML too, not just images?**
A: Yes — `shot-scraper html https://example.com -o page.html` saves rendered HTML after JS. Combine with `--javascript` to run scrape logic before extraction. Useful for SPA scraping.

**Q: Cookies / auth?**
A: Pass `--auth auth.json` with a Playwright storage state file. Generate the state file once via `shot-scraper auth https://example.com` which opens a real browser for you to log in. After that, automated shots run authenticated.

---

## Source & Thanks

> Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0.
>
> [simonw/shot-scraper](https://github.com/simonw/shot-scraper) — ⭐ 1,700+

---

<!-- ZH -->

## 快速使用

1. `pip install shot-scraper && shot-scraper install`
2. `shot-scraper https://url --selector .target -o out.png`
3. YAML 批量：`shot-scraper multi shots.yml`

---

## 简介

shot-scraper 是 Simon Willison 在 Playwright 之上包的 CLI —— headless Chrome 抓任何 URL 截图：全页、按 selector 裁剪元素、注入 JS、批量 YAML 配置、用 GitHub Actions cron 追踪页面变化。适合文档截图、视觉回归测试、OG 图生成、定时抓公开页面。任何 OS、Python 3.10+ 都行。装机时间 3 分钟。

---

### 安装 + 第一张图

```bash
pip install shot-scraper
shot-scraper install   # 下载 Chromium

# 整页截图
shot-scraper https://tokrepo.com -o tokrepo.png
```

### 元素 + 选择器

```bash
# 只截 hero 区
shot-scraper https://tokrepo.com --selector ".hero" -o hero.png

# 等某个元素出现再截
shot-scraper https://tokrepo.com --wait-for "document.querySelector('.assets').children.length > 5"
```

### 截图前注入 JS

```bash
# 先隐藏 cookie banner 再截
shot-scraper https://example.com \
  --javascript "document.querySelector('.cookie-banner').remove()" \
  -o cleaned.png
```

### 批量 YAML 配置

```yaml
# shots.yml
- url: https://tokrepo.com
  output: home.png
  width: 1280
  height: 800

- url: https://tokrepo.com/en/packs
  output: packs.png
  selector: ".arsenal"
  width: 1200

- url: https://tokrepo.com/en/authors
  output: authors.png
  full_page: true
```

```bash
shot-scraper multi shots.yml
```

### GitHub Actions cron 追踪变化

```yaml
# .github/workflows/snapshot.yml
on:
  schedule: [{ cron: "0 8 * * *" }]
jobs:
  snap:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install shot-scraper && shot-scraper install
      - run: shot-scraper multi shots.yml
      - run: git add . && git diff --cached --stat && git commit -m "Daily snapshot $(date +%F)" && git push
```

---

### FAQ

**Q: shot-scraper vs 原生 Playwright？**
A: shot-scraper 是 80% 场景一行搞定。需要登录流、复杂表单填写、多步导航就直接用 Playwright。shot-scraper 包了 Playwright，下沉一步就回到原生。

**Q: 能抓 HTML 不只是图片吗？**
A: 能 —— `shot-scraper html https://example.com -o page.html` 保存 JS 渲染后的 HTML。配 `--javascript` 在抓取前跑抓取逻辑。SPA 抓取适用。

**Q: Cookie / 鉴权？**
A: 用 `--auth auth.json` 传 Playwright storage state 文件。先用 `shot-scraper auth https://example.com` 打开真浏览器登录一次。之后自动截图都带身份。

---

## 来源与感谢

> Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0.
>
> [simonw/shot-scraper](https://github.com/simonw/shot-scraper) — ⭐ 1,700+


---
Source: https://tokrepo.com/en/workflows/shot-scraper-headless-chrome-screenshots-from-cli
Author: Simon Willison