Introduction
git-filter-repo is a Python script for rewriting git repository history. It is the officially recommended successor to git filter-branch, which the Git project itself warns against using. It is dramatically faster, safer, and simpler to use for tasks like removing sensitive data, splitting repositories, or renaming paths.
What git-filter-repo Does
- Removes files, directories, or patterns from the entire commit history
- Renames or moves paths across all historical commits in a single pass
- Strips large blobs to reduce repository size for migration or cleanup
- Replaces text strings (such as leaked credentials) across all commits
- Splits a subdirectory into its own standalone repository with full history
Architecture Overview
git-filter-repo works by reading the output of git fast-export, applying transformations via user-specified callbacks or built-in flags, and then feeding the result back through git fast-import. This approach avoids checking out each commit individually, which is why it runs orders of magnitude faster than filter-branch. The entire tool is a single Python file with no external dependencies beyond git.
Self-Hosting & Configuration
- Install via pip, Homebrew, or your Linux distribution package manager
- Requires Python 3.5+ and git 2.24+
- Run directly inside the repository you want to rewrite
- No configuration files needed; all options are passed via command-line flags
- Use
--forceto re-run on a previously filtered repository
Key Features
- Single-file Python script with zero dependencies beyond git and Python
- Runs 10-100x faster than git-filter-branch on real-world repositories
- Officially recommended by the Git project documentation
- Supports custom Python callbacks for complex rewriting logic
- Produces a detailed report of what changed after each run
Comparison with Similar Tools
- git-filter-branch — the legacy built-in tool; slow, error-prone, and officially deprecated in favor of git-filter-repo
- BFG Repo-Cleaner — fast Java-based tool focused on removing large files and secrets; git-filter-repo offers broader rewriting capabilities
- git rebase -i — works for recent commits but impractical for large-scale history rewrites
- GitHub support — GitHub can remove cached sensitive data, but only git-filter-repo rewrites the actual commit history
FAQ
Q: Will this break existing clones? A: Yes. Rewriting history changes commit hashes. All collaborators must re-clone or reset after a rewrite.
Q: Can I preview changes before applying?
A: Use --dry-run to see what would change without modifying the repository.
Q: Is it safe to use on shared repositories? A: It is safe to run locally. Coordinate with your team before force-pushing rewritten history.
Q: How do I remove a leaked secret from history?
A: Create a file with old==>new replacement pairs and run git filter-repo --replace-text replacements.txt, then rotate the secret.