The First Observability Data Lake Built for Agents

Full-Fidelity Telemetry in your Object Storage—wrapped in an Agent Runtime with natural language investigation, deterministic troubleshooting skills, and evidence-backed answers.

Join the Community on Slack|Free to start
NetflixFounded by ex-Netflix engineers
CRVBacked by CRV
LogsMetricsTracesEvents
Observability Data Lake

Full-Fidelity Telemetry · Object Storage

logs
metrics
traces
events
Agent Runtime
Troubleshooting Skills
Investigation Engine
logsmetricstracesevents
Agents
Claude
OpenAI
Cursor
Gemini
Custom Agents

Trusted by teams at

Aviatrix
Ridecell
Moxxi
Readyset
Olto
Fleak
BMO
Aviatrix
Ridecell
Moxxi
Readyset
Olto
Fleak
BMO

Outcome Attribution for Agentic Engineering

Tokens are not outcomes.

Coding agents are the fastest-growing line item in engineering — and the least accounted for. Cardinal reads the OpenTelemetry your agents already emit and ties every session to the PR it produced, the merge that landed, and the dollars it took to get there.

Agent Outcomes dashboard: total, merged, in-flight, lost, and ad-hoc spend across 151 sessions, with a by-type breakdown of feature, bugfix, refactor, infra, and research work
where the agent's spend is going, and which work it's actually landing
Initiatives panel: agent spend clustered by initiative, each with its pull requests, engineers, achieved and pending tokens, cost, and session count, alongside spend donuts by initiative and by type

Spend, organized by what you're building

Sessions cluster into initiatives — the unit a manager actually thinks in — not a wall of UUIDs. Each initiative carries its PRs, its engineers, tokens achieved versus still pending, and what it cost. Three clicks from total spend to the single session that caused it.

Every dollar lands somewhere

Each session ends up merged, in flight, exploratory, or lost — there's no fifth bucket where waste hides. Surface cost per durable merged PR and unsuccessful-spend percentage, so the answer to "what are we getting for the agent bill?" is a number, not a shrug.

shipped beats declared intent — a research branch that merged counts as merged
spend limits
enforced in-session
sessionwebsite @ main
headroom
$11.02 of $100
engineerjdoe · weekly
warning sent
$615 of $750
initiativeauth-token-rotation
hard stop
$300 of $300
budget standing streams into the session — the agent works leaner as the ceiling nears

Limits where the spend happens

Cap spend per initiative, per engineer, per session — the levels where the money actually moves. Soft warnings as a budget tightens, hard stops before the surprise invoice. And a limit isn't an after-the-fact report: the agent sees its own budget standing mid-session and works leaner as it approaches the ceiling.

Every session, classified — with the cost next to it.

Sessions panel: each agent session classified as merged, in flight, lost, or ad hoc, with its model, effort level, tool count, duration, linked PR, and cost
Get started free

No credit card required — your first session lands in ~10 minutes

OpenTelemetry native. Works with Claude Code today — Cursor and Codex next

Data quality is agent quality

Own your data. Unleash your agents.

Stop sending your telemetry to vendor black boxes. Own it, and let agents do real work on it.

Your data in your Object StorageNo Cost Driven SamplingNo query rate limitsOpenTelemetry-native

Your data never leaves your cloud

OTEL-native ingestion straight to Object Storage. Cardinal Lakerunner indexes it in place. Agents query through the runtime.

1

Instrument with OTEL

Standard OpenTelemetry collectors ship logs, metrics, and traces to Object Storage via the OTEL exporter.

2

Index in Place

Cardinal Lakerunner indexes your data in Object Storage — full fidelity, zero egress, no vendor lock-in.

3

Agents Query

AI agents connect to the Agent Runtime with pre-built troubleshooting skills and investigation tools.

Customer VPC
App(s)LogsMetricsTracesOTEL CollectorObject Storage ExporterFull fidelity dataCardinal LakerunnerObject StorageAgent RuntimeAgents

Ultra High Cardinality

Lets agents quantify which slice actually moved the metric.

Unbounded Lookback

Agents trace root causes across months of data with sub-second query latencies.

Agent Skills, Not API Wrappers

Composite skills that reason, not just fetch.

Each skill is a re-runnable investigation that fires hundreds of queries against your data lake in seconds. No query limits, no per-seat costs. When you own the data, agents can dig as deep as they need to.

Outlier Detector

Automatically decomposes metric spikes across every tag combination and returns impact-ranked attributions agents can act on immediately — no exploratory querying required. Instead of iterating through dimensions one by one, agents receive a single structured answer like "region=us-east, endpoint=/checkout caused 63% of the spike", backed by evidence.

🚀 deploy v2.14.3 scale-up +3 pods
🚀deploy v2.14.3← root causescale-up +3 pods

Correlation Finder

Automatically aligns change events — deploys, config updates, scaling actions — against metric timelines so agents get pre-correlated cause-and-effect pairs instead of searching through changelogs themselves. Agents jump straight to "deploy v2.14.3 correlates with the latency spike" without spending tokens on timeline reconstruction.

Error Summarizer

Fingerprint-based clustering reduces thousands of raw errors into deduplicated, ranked error groups agents can reason over without exhausting their context window. Each group comes with volume timelines, first-seen timestamps, and per-service breakdowns — giving agents a structured summary instead of forcing them to parse noisy logs line by line.

p50120ms340ms(2.8×)p99430ms1.8s(4.2×)count12K12.1K(1.0×)

Anomaly Detector

Pre-computed statistical baselines let agents run cheap distributional comparisons across massive metric volumes without burning tokens on raw data. Lightweight statistical tests surface exactly which percentiles shifted and by how much — giving agents a fast, reliable way to separate real regressions from normal variance at scale.

Traffic Monitor

Condenses tracing data into request-flow graphs that let agents do effective attribution — who calls whom, how often, and at what error rate. Instead of sifting through raw spans, agents work from compact representations with real volume numbers and per-edge error percentages to pinpoint exactly which dependency is responsible.

Service traffic flow, Sankey diagram built from tracing data
Available via MCP. Works with Claude, ChatGPT, Cursor, and any MCP-compatible agent

Shared Knowledge

The agent gets smarter with every investigation.

When your SRE troubleshoots a database issue at 2am, the knowledge doesn't disappear with the incident. It becomes a guardrail, a cached pattern, a boosted tool preference. Next time anyone on the team hits something similar, the agent already knows what to do.

Entity graphGuardrailsCached patternsTool preferences

Ontology builds itself

Every tool call extracts entities, relationships, and schemas into a shared knowledge graph. Agents know your infrastructure without being told.

Mistakes become guardrails

When a query fails or a user corrects an argument, that lesson is stored as a decision trace. Future runs get injected with "don't do X" hints before they even start.

Patterns get cached

A successful investigation gets fingerprinted and stored. When a similar prompt comes in, the agent skips the entire planning phase and goes straight to execution.

No tribal knowledge loss

Your best engineer's debugging instincts don't leave when they go on vacation. Every correction, every preference, every shortcut lives in the shared stores.

Join Our Team!

Open Roles

Founding Engineer – Frontend

📍San Francisco Bay Area
🕓Full Time

Founding Engineer – Backend

📍San Francisco Bay Area
🕓Full Time

Founding GTM Leader

📍San Francisco Bay Area
🕓Full Time