# Katana — Fast and Configurable Web Crawler by ProjectDiscovery > Katana is a command-line web crawler written in Go by ProjectDiscovery, designed for security researchers and developers who need fast, configurable crawling with JavaScript rendering support. ## Install Save as a script file and run: # Katana — Fast and Configurable Web Crawler by ProjectDiscovery ## Quick Use ```bash # Install with Go go install github.com/projectdiscovery/katana/cmd/katana@latest # Basic crawl katana -u https://example.com # Crawl with headless browser rendering katana -u https://example.com -headless # Crawl and output only JavaScript files katana -u https://example.com -ef png,jpg,gif -f url | grep ".js$" ``` ## Introduction Katana fills the gap between simple link extractors and heavy-weight browser automation tools. It provides configurable crawling with both standard HTTP and headless browser modes, outputs clean structured data, and integrates well with other command-line security tools. Built in Go for speed and portability. ## What Katana Does - Crawls websites using standard HTTP mode or headless Chromium for JavaScript-rendered pages - Extracts URLs from HTML, JavaScript files, inline scripts, CSS, robots.txt, and sitemap.xml - Supports scope control with domain, subdomain, and regex-based filters to stay within target boundaries - Outputs results in plain text, JSON, or JSONL format for easy pipeline integration - Handles authentication via custom headers, cookies, and form-based login ## Architecture Overview Katana uses a concurrent crawler engine with configurable parallelism and rate limiting. In standard mode, it makes HTTP requests and parses responses with a custom HTML parser optimized for link extraction. In headless mode, it launches a Chromium instance via the Rod library and captures network requests, DOM mutations, and dynamically generated URLs. A deduplication layer prevents re-crawling the same endpoints. ## Self-Hosting & Configuration - Install via `go install`, download pre-built binaries from GitHub releases, or use the Docker image - Configure crawl depth, concurrency, rate limit, and timeout via CLI flags or a YAML config file - Set scope rules with `-cs` (crawl scope) and `-fs` (field scope) to control what gets crawled and extracted - Use `-H` for custom headers and `-proxy` for routing through an HTTP or SOCKS5 proxy - Pipe output directly into other tools like httpx, nuclei, or grep for security workflows ## Key Features - Dual crawling modes: fast HTTP parsing and full headless browser with JavaScript execution - Automatic form filling and submission for discovering authenticated endpoints - Passive extraction from JavaScript files, detecting API endpoints and hardcoded URLs - Built-in field extraction for URLs, paths, query parameters, emails, and custom regex patterns - Seamless integration with the ProjectDiscovery ecosystem (subfinder, httpx, nuclei) ## Comparison with Similar Tools - **Scrapy** — Python-based framework focused on data extraction; Katana is a Go CLI focused on URL discovery and security reconnaissance - **Crawlee** — Node.js crawling library for scraping at scale; Katana is lighter and designed for security workflows - **gospider** — similar Go-based crawler; Katana has headless support and better scope control - **Burp Spider** — built into Burp Suite; commercial and GUI-based while Katana is free and CLI-first - **wget --spider** — basic link checker; Katana extracts from JavaScript and supports headless rendering ## FAQ **Q: When should I use headless mode?** A: Use headless mode (`-headless`) for JavaScript-heavy single-page applications where content is rendered client-side. Standard mode is faster and sufficient for server-rendered sites. **Q: Can Katana handle authentication?** A: Yes. Pass cookies via `-H "Cookie: ..."`, use custom headers for token-based auth, or enable automatic form detection with `-aff` for form-based login. **Q: How do I limit the crawl scope?** A: Use `-cs` with a regex pattern to restrict crawling to specific domains or paths. The `-d` flag controls maximum crawl depth. **Q: Does Katana respect robots.txt?** A: By default Katana does not enforce robots.txt restrictions, as it is designed for security testing where full coverage is important. Use scope filters to restrict targets manually. ## Sources - https://github.com/projectdiscovery/katana - https://docs.projectdiscovery.io/tools/katana/overview --- Source: https://tokrepo.com/en/workflows/asset-adb11ee8 Author: Script Depot