Galileo

Don't just monitor AI failures. Stop them.

Last verified April 26, 2026

About Galileo

Galileo is an AI observability and evaluation engineering platform that helps enterprise teams evaluate, monitor, and protect generative AI applications and agents at scale. Its Agent Reliability platform enables teams to build ground-truth datasets, auto-tune evaluation metrics from live feedback, distill LLM-as-a-Judge evaluators into compact Luna models for production guardrails, and run the full eval engineering lifecycle from offline testing to production monitoring. The platform is trusted by teams at HP, Comcast, and Twilio, and is deployable as SaaS, VPC, or on-premises.

Framework coverage

Not yet catalogued. We only list frameworks when Galileo publicly documents coverage in their own materials. If you work at Galileo and want to add citations, use the correction link at the bottom of this page.

Capabilities

Features Galileo markets publicly. Inclusion means the capability is documented — not that it's best-in-class.

LLM Observability

Logging, tracing, and metrics for LLM applications in production — latency, cost, quality, and error rates across traces and users.

LLM Evaluation

Systematic testing of LLM outputs for correctness, relevance, safety, and consistency using automated scorers, rubrics, or human review.

Agent Tracing

End-to-end visibility into multi-step LLM agent runs: tool calls, intermediate reasoning, token usage, latency, and errors at each step.

Hallucination Detection

Automated scoring of LLM outputs for factual accuracy and groundedness, typically via retrieval-backed verification or reference comparison.

LLM Guardrails

Runtime input/output filtering for LLMs — PII redaction, toxicity blocking, prompt injection defense, policy enforcement.

Model Monitoring

Production monitoring for performance, drift, data quality, and fairness regressions.

Prompt Management

Versioning, templating, A/B testing, and deployment workflows for LLM prompts treated as production artifacts.

Drift Detection

Automated detection of distribution shift, feature drift, prediction drift, and performance degradation in deployed ML/AI models.

Audit Logging

Tamper-evident logging of governance events (approvals, model changes, policy decisions) required by EU AI Act Article 12 and similar regulations.

Pricing

Contact for pricing

Free tier: $0/month, 5,000 traces/month, unlimited users and custom evals. Pro: $100/month (billed yearly saves 33%), 50,000 traces/month, standard RBAC, Slack support. Enterprise: contact for pricing, unlimited traces, VPC/on-prem deploy, real-time guardrails, 24/7 support.

Sources

Keep reading

Alternatives

Fiddler AI Alternatives 2026: Top 4 Compared

Best of

Best LLM Observability Platforms 2026

7 picks

Vendor

Arize AI

Ship Agents that Work — AI & Agent Engineering Platform for development, observability, and evaluation.

Vendor

WhyLabs

The AI Observability Platform

Vendor

Braintrust

The AI observability platform for building quality AI products

Vendor

LangSmith

AI Agent & LLM Observability Platform

See an error or outdated detail?

Profiles carry a last-verified date. If something is out of date or wrong, send a correction and we will review it.

Submit a correction

Work at Galileo?

Claim this listing to propose edits to the tagline, description, pricing notes, and headquarters details. Every change is still reviewed by our editorial team.

Claim this listing