Guardrails AI — Validate LLM Outputs in Production
Add validation and guardrails to any LLM output. Guardrails AI checks for hallucination, toxicity, PII leakage, and format compliance with 50+ built-in validators.
Instalación con revisión previa
Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.
npx -y tokrepo@latest install ffbad589-cd32-4eca-9518-fdcf9167ca21 --target codexPrimero dry-run, confirma las escrituras y luego ejecuta este comando.
What it is
Guardrails AI is a Python library that adds validation and safety checks to LLM outputs. It provides 50+ built-in validators that check for hallucination, toxicity, PII leakage, format compliance, and factual accuracy. You wrap your LLM calls with a Guard object, and Guardrails automatically validates and optionally retries when outputs fail validation.
Guardrails AI is designed for teams deploying LLMs in production who need to ensure outputs meet quality and safety standards. It works with any LLM provider including OpenAI, Anthropic, and local models.
How it saves time or tokens
Without Guardrails, you write custom validation logic for every LLM output format. Guardrails provides a declarative approach: define your validators once, and the library handles validation, error reporting, and automatic retries. The hub of pre-built validators (guardrails hub) saves development time by providing tested implementations for common checks.
Automatic retries with corrective prompts reduce manual intervention. When an output fails validation, Guardrails re-prompts the LLM with the error details, often producing a valid output on the second attempt.
How to use
- Install Guardrails and a validator:
pip install guardrails-ai
guardrails hub install hub://guardrails/regex_match
- Create a Guard with validators:
from guardrails import Guard
from guardrails.hub import RegexMatch
guard = Guard().use(RegexMatch(regex=r'^\d{3}-\d{2}-\d{4}$'))
result = guard(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Generate a sample SSN format'}]
)
print(result.validated_output)
- The Guard validates the LLM output against the regex pattern and retries if it does not match.
Example
from guardrails import Guard
from guardrails.hub import DetectPII, ToxicLanguage
from pydantic import BaseModel, Field
class SafeResponse(BaseModel):
answer: str = Field(description='The response text')
confidence: float = Field(ge=0.0, le=1.0)
guard = Guard.from_pydantic(SafeResponse)
guard.use(DetectPII(pii_entities=['EMAIL', 'PHONE_NUMBER'], on_fail='fix'))
guard.use(ToxicLanguage(threshold=0.5, on_fail='reask'))
result = guard(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Summarize this customer feedback'}]
)
print(result.validated_output)
print(f'Validation passed: {result.validation_passed}')
This ensures the LLM output matches the Pydantic schema, contains no PII, and is not toxic. Failed PII checks are auto-fixed; toxic outputs trigger a retry.
Related on TokRepo
- AI agent tools -- Explore tools for building reliable AI agent pipelines
- AI monitoring tools -- Browse tools for monitoring AI systems in production
Common pitfalls
- Validators from the hub need separate installation. Running
guard.use(SomeValidator())without first installing viaguardrails hub installproduces import errors. - Stacking many validators increases latency. Each validator runs sequentially by default. For latency-sensitive applications, limit validators to the most critical checks.
- The
on_fail='reask'strategy sends additional LLM calls, increasing token costs. Useon_fail='fix'for deterministic corrections oron_fail='exception'when retries are not acceptable.
Preguntas frecuentes
The hub includes 50+ validators covering PII detection, toxicity filtering, hallucination checking, regex matching, JSON schema validation, SQL injection prevention, prompt injection detection, and more. You can browse available validators at hub.guardrailsai.com.
Yes. Guardrails works with any LLM that returns text. For local models via Ollama or vLLM, you can use the LiteLLM integration or pass raw text to the Guard for validation only. The validation logic is model-agnostic.
Yes. Guardrails provides a Validator base class that you extend with your custom validation logic. Custom validators integrate with the same retry and error handling mechanisms as built-in validators. You can also publish custom validators to the hub.
When a validator fails with on_fail='reask', Guardrails re-sends the prompt to the LLM with the validation error appended. The LLM sees what went wrong and generates a corrected output. The maximum number of retries is configurable. Each retry consumes additional tokens.
Yes. Guardrails supports streaming validation where outputs are checked incrementally as tokens arrive. This enables early detection of validation failures without waiting for the complete response, reducing wasted tokens on invalid outputs.
Referencias (3)
- Guardrails AI GitHub— 50+ built-in validators for LLM output validation
- Guardrails Hub— Guardrails Hub for community validators
- Guardrails Documentation— Pydantic-based structured output validation
Relacionados en TokRepo
Fuente y agradecimientos
Created by Guardrails AI. Licensed under Apache 2.0.
guardrails-ai/guardrails — 4k+ stars
Discusión
Activos relacionados
Guardrails — Validate & Secure LLM Outputs
Guardrails is a Python framework for validating LLM inputs/outputs to detect risks and generate structured data. 6.6K+ GitHub stars. Pre-built validators, Pydantic models. Apache 2.0.
NeMo Guardrails — Programmable Safety for LLM Applications
NeMo Guardrails is an open-source toolkit by NVIDIA for adding programmable guardrails to LLM-based conversational systems. It provides input/output moderation, fact-checking, hallucination detection, jailbreak prevention, and dialog management via a declarative Colang configuration language.
Text Generation Inference (TGI) — Hugging Face Production LLM Server
TGI is Hugging Face's production-grade LLM inference server. It powers HF Inference Endpoints with continuous batching, tensor parallelism, quantization, and OpenAI-compatible APIs — handling thousands of requests per second.
Instructor — Typed Structured Outputs for LLMs
Instructor turns LLM replies into validated Pydantic models with retries. `pip install instructor`, then extract typed objects across major providers.