arrow_back
Return to Launch Week

A New Chapter for Data Quality Starts Now

Mon, 09 June
Hakim Elakhrass
Maarten Masschelein

Today we’re kicking off Soda Launch Week with a major announcement: Soda has acquired NannyML.

Together, we’re building the most intelligent, context-aware data quality platform on the market. One that helps you prevent issues before they become business problems, detect anomalies that actually matter, and trace root causes across the entire stack, from data ingestion to automated decision-making.

This move brings together two teams with a shared goal: helping data and AI teams ship reliable, production-grade systems they can trust, whether those systems power dashboards, models, or autonomous agents.

Let’s get into what this means, why we’re doing it, and what’s coming next.

The Gap in Data Quality Is Getting Worse

If you’ve worked on data or AI infrastructure, you’ve lived this:

  • A pipeline silently drops a column, no schema failure, but a downstream metric flatlines.
  • A dashboard suddenly shows revenue down 30%, and nobody knows why.
  • A model in production starts drifting due to subtle shifts in user behavior.
  • An agent retrains or reacts based on corrupted inputs, and no one catches it until decisions are made.

Most data quality tooling today can’t handle this. It was built for a different era of batch jobs, static schemas, predictable data flows. It flags too much noise, misses critical context, and rarely shows downstream impact.

At the same time, the systems we’re building today are more dynamic than ever:

  • Agents
  • LLM-powered decisioning
  • Real-time personalization
  • Hybrid batch-streaming pipelines
  • Continuous retraining loops

In this world, traditional checks and anomaly detection aren’t enough. Data quality isn’t just about correctness anymore, it’s about consequence.

Why NannyML

NannyML tackled one of the hardest problems in modern AI systems:

How do you monitor model performance in production, when there’s no ground truth yet?

Their open-source library introduced estimation-based performance monitoring, robust drift detection, and alerting designed for real-world ML pipelines. It became the go-to toolkit for teams running models where labels are delayed, sparse, or unavailable.

But more importantly, they saw what was coming:

That models don’t fail in isolation. They fail when data pipelines degrade, when user behavior shifts, when upstream assumptions break. And they believed the only way to solve this was to close the loop between data quality and AI behavior.

We’ve believed the same from day one.

By bringing our teams and platforms together, we’re unifying those layers. Delivering a product that can monitor your entire system, not just pieces of it.

What We’re Building Together

With NannyML’s team and tech now integrated into Soda, here’s what this unlocks:

  • Smarter detection at the DQ layer
    NannyML’s algorithms will power a more intelligent core in Soda’s checks and observability. Reducing noise, surfacing real issues faster, and adapting to change.
  • Context-aware alerting across the stack
    Trace anomalies across systems: from a column drift in your warehouse, to a prediction shift in your model, to a behavior change in your agents.
  • End-to-end observability: from data to decision
    Monitor the full lifecycle, not just tables or checks. See how upstream issues ripple into downstream systems. Know what changed, why it matters, and what to fix.
  • AI-native quality infrastructure
    Whether you’re running batch analytics, near-real-time features, or LLM orchestration, we’re building foundational infrastructure that keeps data and behavior aligned.

And yes, NannyML’s open-source project will remain open, maintained, and fully supported. We’re not sunsetting it. We’re expanding it.

Why Now

Because the cost of bad data is rising, and fast.

The systems data powers today are higher-stakes, faster-moving, and harder to debug.

If your tooling doesn’t understand impact, it’s not helping. If it can’t handle emergence and drift, it’s irrelevant. And if it’s not built for AI-native environments, it’s already behind.

We’re not here to slap “AI” on legacy checks. We’re here to make data quality actually intelligent:

  • Impact-aware
  • Context-rich
  • Lifecycle-connected
  • And ready for systems that learn, adapt, and act

This acquisition accelerates that mission.

What’s Coming This Week

This is Day 1 of Launch Week. All week long, we’ll be announcing new capabilities and product drops that show what intelligent, AI-first data quality looks like in practice.

Here’s a preview of what’s coming:

  • The fastest and most accurate metrics observability
  • Collaborative data contracts
  • A free forever tier and transparent pricing

We’re just getting started, and we’re building fast.

Where To Go Next

  • Watch the full announcement webinar
    Hear directly from Maarten and Hakim about what’s changing, and what’s coming next.
  • Try Soda
    See how our platform is evolving to support AI-native teams. No fluff, just the signals that matter.

This is the next chapter for data quality.

Smarter. Faster. AI-ready.

And built for teams like yours.

Get Early Access

The team has been cooking. We'd love to show you around.

close
Access Requested.
The future of Data Quality is coming.
You're officially on the list to get early access to Soda's new AI-native metrics observability solution for Databricks. You'll receive your early access details as we launch between June 9–12, 2025.
5x your chances to win the custom mechanical keyboard by sharing this website on social media with the hashtag #SodaDatabricks2025
Oops! Something went wrong while submitting the form.