Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMar 31, 2026·2 min de lectura

Milvus — Cloud-Native Vector Database at Scale

Milvus is a high-performance cloud-native vector database for scalable AI search. 43.5K+ GitHub stars. Hybrid search (dense + sparse + full-text), GPU-accelerated indexing, multi-tenancy, distributed

AI Open Source · Community

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Single

Confianza

Confianza: Established

Entrada

Milvus — Cloud-Native Vector Database at Scale

Comando de instalación directa

npx -y tokrepo@latest install 35f9fae3-15e7-492c-a4ae-04f6d850bef8 --target codex

Ejecutar después de confirmar el plan con dry-run.

TL;DR

Milvus provides scalable vector similarity search with hybrid retrieval, GPU indexing, and multi-tenancy for production AI applications.

§01

What it is

Milvus is a high-performance, cloud-native vector database designed for AI applications that need similarity search at scale. It stores, indexes, and queries high-dimensional vectors generated by embedding models, making it a core component for RAG pipelines, recommendation systems, image search, and anomaly detection.

Milvus supports hybrid search combining dense vectors, sparse vectors, and full-text search in a single query. It scales from an embedded mode (Milvus Lite for prototyping) to distributed Kubernetes clusters handling billions of vectors. The project is Apache 2.0 licensed under the LF AI and Data Foundation.

§02

How it saves time or tokens

Without a dedicated vector database, teams often bolt similarity search onto relational databases using brute-force distance calculations. Milvus eliminates this overhead with purpose-built indexes (HNSW, IVF, DiskANN) that reduce search latency from seconds to milliseconds. GPU-accelerated indexing further cuts index build time for large datasets. The hybrid search capability means you avoid running separate full-text and vector search systems, reducing infrastructure complexity.

§03

How to use

Install the Python client with embedded mode for quick prototyping:

pip install 'pymilvus[milvus-lite]'

Create a collection and insert vectors:

from pymilvus import MilvusClient

client = MilvusClient('milvus_demo.db')
client.create_collection('docs', dimension=768)

Insert data and run similarity search:

import random
data = [
    {'id': i, 'vector': [random.random() for _ in range(768)], 'text': f'document {i}'}
    for i in range(100)
]
client.insert('docs', data)
results = client.search('docs', data=[[random.random() for _ in range(768)]], limit=5)

For production, deploy via Docker or Kubernetes with a Postgres or etcd metadata backend.

§04

Example

from pymilvus import MilvusClient

# Embedded mode -- no server needed
client = MilvusClient('my_vectors.db')

# Create collection with custom schema
from pymilvus import CollectionSchema, FieldSchema, DataType

fields = [
    FieldSchema('id', DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema('embedding', DataType.FLOAT_VECTOR, dim=384),
    FieldSchema('content', DataType.VARCHAR, max_length=1000),
]
schema = CollectionSchema(fields)
client.create_collection('articles', schema=schema)

# Search with filter
results = client.search(
    'articles',
    data=[query_embedding],
    limit=10,
    filter='content like \'AI%\'',
    output_fields=['content']
)

§05

Related on TokRepo

RAG Tools and Frameworks -- explore retrieval-augmented generation tools that pair with vector databases
AI Agent Tools -- discover agent frameworks that use vector search for memory and knowledge retrieval

§06

Common pitfalls

Choosing the wrong index type wastes resources. HNSW gives the best recall but uses more memory; IVF_FLAT is memory-efficient for large datasets; DiskANN is best when data exceeds RAM.
Milvus Lite is for prototyping only. Production workloads need the standalone or distributed deployment with proper etcd and MinIO backends.
Embedding dimension mismatches between your model output and collection schema cause silent failures. Always verify your embedding model output dimension matches the collection definition.

Preguntas frecuentes

What is the difference between Milvus Lite and Milvus standalone?+

Milvus Lite is an embedded mode that runs in-process with your Python application, storing data in a local file. It requires no server setup and is ideal for prototyping. Milvus standalone and distributed modes run as separate services with etcd for metadata and MinIO for object storage, supporting production workloads with persistence, replication, and horizontal scaling.

Which index type should I use in Milvus?+

HNSW provides the highest recall and lowest latency but requires all data in memory. IVF_FLAT partitions vectors into clusters and is more memory-efficient for large datasets. DiskANN stores indexes on disk and is best when your dataset exceeds available RAM. GPU indexes (GPU_IVF_FLAT, GPU_CAGRA) accelerate both indexing and search on NVIDIA hardware.

Does Milvus support hybrid search with text and vectors?+

Yes. Milvus supports dense vector search, sparse vector search (BM25-style), and full-text search. You can combine these in a single query using RRF (Reciprocal Rank Fusion) or weighted scoring to get results that match both semantic meaning and keyword relevance.

How does Milvus handle multi-tenancy?+

Milvus supports multi-tenancy at three levels: database-level isolation (separate databases per tenant), collection-level isolation (separate collections), and partition-level isolation (partition keys within a single collection). Choose based on your tenant count and isolation requirements.

Can Milvus integrate with LangChain or LlamaIndex?+

Yes. Both LangChain and LlamaIndex have official Milvus integrations. You configure Milvus as a vector store backend, and the framework handles embedding, insertion, and retrieval automatically. This makes Milvus a drop-in vector store for RAG pipelines built with either framework.

Referencias (3)

Milvus GitHub— Cloud-native vector database with hybrid search and GPU-accelerated indexing
Milvus Documentation— Apache 2.0 licensed under the LF AI and Data Foundation
Milvus Index Documentation— Supports HNSW, IVF, DiskANN, and GPU index types

Relacionados en TokRepo

RAG tools AI agent tools AI database tools

🙏

Fuente y agradecimientos

Created by Zilliz under LF AI & Data Foundation. Apache 2.0. milvus-io/milvus — 43,500+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Dgraph — Distributed Native GraphQL Database

Dgraph is a horizontally scalable graph database with native GraphQL as a query language. It stores relationships as first-class citizens and handles billions of edges — the go-to choice when graph traversals at scale matter.

Skills

AI Open Source

JuiceFS — Cloud-Native POSIX File System Built on Object Storage

A high-performance distributed file system that stores data in object storage like S3 while keeping metadata in Redis, PostgreSQL, or MySQL for cloud-native workloads.

Skills

AI Open Source

Quickwit — Cloud-Native Sub-Second Search Engine

Quickwit is a cloud-native search engine built in Rust for log management and distributed search on object storage. It indexes data directly to S3-compatible stores, enabling cost-efficient search at petabyte scale.

Skills

Script Depot

WatermelonDB — Reactive Database for React Native Apps

A high-performance reactive database framework for React Native and React web apps, built on top of SQLite with lazy loading and sync primitives.

Skills

AI Open Source