Introduction
CodeQL is a semantic code analysis engine developed by GitHub (originally Semmle). It builds a relational database from your source code, then lets you query that database to find vulnerabilities, anti-patterns, and compliance issues. It powers GitHub Advanced Security code scanning.
What CodeQL Does
- Builds queryable databases from source code in 10+ languages
- Provides thousands of pre-written queries for common vulnerability classes
- Supports custom query authoring in the CodeQL query language (QL)
- Outputs results in SARIF format for integration with CI/CD pipelines
- Powers code scanning alerts directly in GitHub pull requests
Architecture Overview
CodeQL works in two phases. First, the extractor compiles source code into a relational database representing the program's abstract syntax tree, data flow, and control flow. Second, the query engine evaluates QL queries against this database using Datalog-style recursive evaluation. QL is a declarative, object-oriented query language designed for code analysis. The query libraries implement taint tracking and data flow analysis for detecting security vulnerabilities across function boundaries.
Self-Hosting & Configuration
- Install the CodeQL CLI via GitHub CLI extension or direct download
- Create databases for your target language (JavaScript, Python, Java, C/C++, C#, Go, Ruby, Swift)
- Run built-in query packs or write custom queries in .ql files
- Integrate with GitHub Actions using the code-scanning workflow template
- Configure CodeQL in CI to block PRs that introduce new vulnerabilities
Key Features
- Semantic analysis goes beyond pattern matching to track data flow across functions
- Pre-built query packs cover OWASP Top 10, CWE, and language-specific vulnerability classes
- Custom QL queries let security teams encode organization-specific rules
- SARIF output integrates with GitHub, VS Code, and other SARIF-compatible tools
- Variant analysis helps find all instances of a vulnerability pattern across a codebase
Comparison with Similar Tools
- Semgrep — Pattern-based lightweight scanner; CodeQL provides deeper semantic and data-flow analysis
- SonarQube — Broad code quality platform; CodeQL specializes in security with deeper analysis
- Snyk Code — Proprietary SAST; CodeQL is open-source with a customizable query language
- Bandit — Python-only security linter; CodeQL covers 10+ languages with consistent analysis
- ESLint security plugins — Syntactic checks; CodeQL tracks data flow across function boundaries
FAQ
Q: Is CodeQL free to use? A: CodeQL is free for open-source projects on GitHub. For private repositories, it requires a GitHub Advanced Security license. The CLI and query libraries are open source.
Q: What languages does CodeQL support? A: CodeQL supports JavaScript/TypeScript, Python, Java/Kotlin, C/C++, C#, Go, Ruby, and Swift, with community packs for additional languages.
Q: Can I write my own queries? A: Yes. QL is a purpose-built query language. GitHub provides documentation, tutorials, and a VS Code extension with IntelliSense for authoring custom queries.
Q: How does CodeQL compare to running a linter? A: Linters check syntax and style. CodeQL performs semantic analysis including inter-procedural data flow and taint tracking, catching vulnerabilities that linters cannot detect.