Semgrep — Lightweight Static Analysis for Any Language
Semgrep is a fast, open-source static analysis tool that finds bugs and security issues using patterns that look like source code. Write rules in a syntax similar to the code you are searching — no complex AST queries or regex needed.
What it is
Semgrep is a fast, open-source static analysis tool that finds bugs and security vulnerabilities using patterns that look like the source code you are searching. Write rules in a syntax similar to the code itself, without needing compiler expertise or AST manipulation.
Semgrep targets security engineers, developers, and DevSecOps teams who want custom code analysis rules without the complexity of traditional SAST tools. It supports over 30 programming languages with a single rule syntax.
The project is actively maintained and suitable for both individual developers and teams looking to integrate it into their existing toolchain. Documentation and community support are available for onboarding.
How it saves time or tokens
Semgrep rules are readable by any developer, not just security specialists. A rule that catches SQL injection looks like the vulnerable code pattern itself. The Semgrep Registry provides thousands of pre-written rules for OWASP Top 10, framework-specific bugs, and code quality issues. Running Semgrep in CI takes seconds, not minutes like heavier analyzers.
How to use
- Install Semgrep via pip (
pip install semgrep) or Homebrew. - Run
semgrep --config autoto scan with community-recommended rules. - Write custom rules in YAML targeting patterns specific to your codebase.
- Add Semgrep to your CI pipeline to block PRs with security findings.
Example
# .semgrep/sql-injection.yaml
rules:
- id: sql-injection-string-concat
patterns:
- pattern: |
$QUERY = "..." + $INPUT + "..."
cursor.execute($QUERY)
message: >-
SQL injection via string concatenation.
Use parameterized queries instead.
severity: ERROR
languages: [python]
# Run the custom rule
semgrep --config .semgrep/sql-injection.yaml src/
# Run with the full community ruleset
semgrep --config auto --error
Related on TokRepo
- AI Tools for Security — Security scanning and vulnerability detection tools.
- AI Tools for Testing — Code quality and testing tools that complement static analysis.
Common pitfalls
- Running with
--config autoin CI without reviewing findings first. Some rules may produce false positives for your codebase. Curate your rule set before enforcing. - Writing rules that are too broad. A pattern like
$X + $Ymatches everything. Be specific about the dangerous pattern you want to catch. - Not using
metavariable-regexto constrain matches. Without constraints, rules match safe code patterns alongside vulnerable ones, creating noise. - Not reading the changelog before upgrading. Breaking changes between versions can cause unexpected failures in production. Pin your version and review release notes.
Frequently Asked Questions
ESLint and Pylint are language-specific linters focused on style and common errors. Semgrep is a multi-language analysis engine focused on security and custom code patterns. Semgrep rules work across 30+ languages with one syntax.
Semgrep OSS (the CLI tool) is free and open-source under LGPL-2.1. Semgrep Cloud (SaaS dashboard with team features) has paid tiers. The community rule registry is free to use.
Semgrep scans most codebases in under 30 seconds. It uses parallel execution and only parses files matching the rule's language filter. It is significantly faster than tools that build full program dependency graphs.
Yes. Semgrep patterns use the syntax of the target language. A Python pattern looks like Python. A JavaScript pattern looks like JavaScript. The only additions are metavariables ($X) for matching arbitrary expressions.
Yes. Rules can include a fix field that specifies the corrected code. When run with --autofix, Semgrep applies the fix automatically. Always review autofixes before committing.
Citations (3)
- Semgrep Official Site— Pattern-based static analysis for 30+ languages
- Semgrep GitHub— Open-source static analysis tool
- Semgrep Registry— Community rule registry with thousands of rules
Related on TokRepo
Discussion
Related Assets
Hugging Face Tokenizers — Fast Text Tokenization for ML Pipelines
Hugging Face Tokenizers is a Rust-powered tokenization library with Python bindings that implements BPE, WordPiece, Unigram, and SentencePiece tokenizers with training and encoding speeds of gigabytes per second, used as the backbone for Transformers model tokenization.
Cleanlab — Find and Fix Label Errors in Any ML Dataset
Cleanlab is a data-centric AI Python library that automatically detects label errors, outliers, and data quality issues in classification and regression datasets, helping improve model accuracy by cleaning training data rather than tuning models.
Hugging Face Datasets — Access and Process ML Datasets at Scale
Hugging Face Datasets is a Python library for efficiently loading, processing, and sharing machine learning datasets with Apache Arrow-backed memory mapping, streaming support, and access to thousands of community datasets on the Hub.