What is Magika?
Magika is Google's AI-powered file type identification tool. Instead of relying on file extensions or magic bytes (like the Unix file command), Magika uses a trained deep learning model to identify 200+ file types with 99%+ accuracy. It is especially good at distinguishing similar types (JavaScript vs TypeScript, JSON vs JSONL) and detecting misnamed or obfuscated files — critical for security scanning.
Answer-Ready: Magika is Google's AI file type detector. Deep learning model identifies 200+ file types with 99%+ accuracy. Better than magic bytes for similar types and obfuscated files. Used in Gmail and Google Drive security. Python library and CLI. 8k+ GitHub stars.
Best for: Security scanning, content processing pipelines, and file validation. Works with: Python, CLI, any pipeline. Setup time: Under 1 minute.
Core Features
1. 200+ File Types
| Category | Types |
|---|---|
| Code | Python, JavaScript, TypeScript, Rust, Go, Java, C++, ... |
| Documents | PDF, DOCX, XLSX, PPTX, Markdown, LaTeX |
| Data | JSON, CSV, XML, YAML, TOML, Parquet |
| Media | PNG, JPEG, GIF, WebP, MP3, MP4, WebM |
| Archives | ZIP, TAR, GZIP, RAR, 7Z |
| Executable | ELF, PE, Mach-O, Shell scripts |
| Web | HTML, CSS, SVG, WASM |
2. Batch Processing
from pathlib import Path
results = m.identify_paths([
Path("file1.txt"),
Path("file2.dat"),
Path("file3.bin"),
])
for path, result in zip(paths, results):
print(f"{path}: {result.output.ct_label} ({result.output.score:.0%})")3. Bytes Detection (No File Needed)
content = b'{"name": "test", "value": 42}'
result = m.identify_bytes(content)
print(result.output.ct_label) # "json"4. Security Use Cases
# Detect file type mismatches
uploaded_file = "profile_photo.jpg"
result = m.identify_path(uploaded_file)
if result.output.ct_label != "jpeg":
print(f"WARNING: File claims to be JPEG but is actually {result.output.ct_label}")
# Could be a disguised executable or scriptMagika vs Traditional Tools
| Feature | Magika | file (libmagic) | Python-magic |
|---|---|---|---|
| Method | Deep learning | Magic bytes | Magic bytes |
| Accuracy | 99%+ | ~90% | ~90% |
| Similar types | Excellent | Poor | Poor |
| Obfuscated files | Good | Poor | Poor |
| Speed | Fast (1ms/file) | Very fast | Very fast |
| Types | 200+ | 1000+ | 1000+ |
FAQ
Q: Is it fast enough for production? A: Yes, ~1ms per file after model loading. Batch mode processes thousands of files per second.
Q: Does it work on binary files? A: Yes, it identifies executables, archives, media, and any binary format.
Q: How is it used at Google? A: Magika powers file type detection in Gmail (attachment scanning) and Google Drive (content safety).