# OpenCLIP — Open-Source Contrastive Language-Image Pre-training > Community-driven reproduction and extension of OpenAI CLIP, providing open training code, datasets, and pretrained models for contrastive vision-language learning at scale. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # OpenCLIP — Open-Source Contrastive Language-Image Pre-training ## Quick Use ```bash pip install open_clip_torch python -c " import open_clip, torch from PIL import Image model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_s34b_b79k') tokenizer = open_clip.get_tokenizer('ViT-B-32') image = preprocess(Image.open('photo.jpg')).unsqueeze(0) text = tokenizer(['a dog', 'a cat']) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) probs = (image_features @ text_features.T).softmax(dim=-1) print(probs) " ``` ## Introduction OpenCLIP is an open-source implementation of CLIP (Contrastive Language-Image Pre-training) that provides reproducible training pipelines and pretrained models trained on publicly available datasets like LAION-2B. It enables zero-shot image classification, image-text retrieval, and serves as a foundation for multimodal AI applications. ## What OpenCLIP Does - Trains vision-language models using contrastive learning on image-text pairs - Provides pretrained models across multiple architectures (ViT-B, ViT-L, ViT-H, ViT-G) - Enables zero-shot image classification without task-specific fine-tuning - Generates aligned image and text embeddings for retrieval and similarity tasks - Supports distributed training across multiple GPUs and nodes ## Architecture Overview OpenCLIP pairs a vision transformer (or CNN) image encoder with a text transformer encoder. Both encoders project their outputs into a shared embedding space via learned linear projections. Contrastive loss maximizes cosine similarity between matching image-text pairs while minimizing it for non-matching pairs within each batch. Training uses large-batch distributed optimization with gradient checkpointing and mixed precision. ## Self-Hosting & Configuration - Install via pip: `pip install open_clip_torch` - Download pretrained models automatically via model name and pretrained tag - Training requires multi-GPU setup and webdataset-formatted image-text pairs - Configure architecture, dataset, batch size, and learning rate via CLI arguments - Supports FSDP and DeepSpeed for scaling to billions of training samples ## Key Features - Fully open training code reproducing CLIP results on public data - Model zoo with checkpoints trained on LAION-400M, LAION-2B, and DataComp - Zero-shot transfer to downstream tasks without fine-tuning - CoCa (Contrastive Captioner) models that combine contrastive and captioning objectives - Integration with Hugging Face model hub for easy model sharing ## Comparison with Similar Tools - **OpenAI CLIP** — original model with closed training data; OpenCLIP uses public datasets - **SigLIP** — Google's sigmoid-loss variant; available through OpenCLIP's codebase - **BLIP-2** — adds generative capabilities on top of frozen image encoders - **EVA-CLIP** — enhanced training recipes for CLIP models at larger scale - **MetaCLIP** — Meta's data curation approach for CLIP training ## FAQ **Q: How does OpenCLIP differ from the original CLIP?** A: OpenCLIP provides open training code and models trained on publicly available datasets, while OpenAI CLIP was trained on proprietary data. Some OpenCLIP models match or exceed original CLIP performance. **Q: What is the largest available model?** A: ViT-bigG-14 trained on LAION-2B, achieving strong zero-shot performance across benchmarks. **Q: Can I fine-tune OpenCLIP on my own data?** A: Yes. The training scripts support both from-scratch training and fine-tuning from pretrained checkpoints. **Q: What formats are supported for training data?** A: WebDataset tar files with image-text pairs, or CSV files pointing to image paths and captions. ## Sources - https://github.com/mlfoundations/open_clip - https://laion.ai/blog/large-openclip/ --- Source: https://tokrepo.com/en/workflows/openclip-open-source-contrastive-language-image-pre-training-cc727315 Author: AI Open Source