Disco Is Live

We've been building Disco for two years. It started as a research tool — a harness allowing us to apply deep learning interpretability to scientific datasets, to borrow machine learning's superhuman pattern recognition for scientific discovery. We used it internally on published data, then with collaborators at research institutes and universities. We made a bunch of novel discoveries, and co-authored papers about them. And now, Disco is publicly available.

You can use it today at disco.leap-labs.com. The free tier gives you 10 credits per month for private analyses, and unlimited public analyses.

What Disco does

You upload a tabular dataset and pick a target variable. Disco fits neural networks to your data, applies interpretability methods to extract the patterns those models learned, validates every finding on hold-out data, and contextualises the results against existing literature. You get back a ranked list of statistically significant patterns — with p-values, effect sizes, visualisations, and citations.

The whole thing is automated. A dataset goes in; a structured report comes out. Most of the time it completes in a few minutes.

This sounds simple, and we tried really hard to make using it feel simple. But the machinery underneath is doing something that's been genuinely difficult until recently: finding non-linear, combinatorial interaction effects in data — the kind of patterns that standard statistical methods miss — and rendering those patterns interpretable enough for a domain expert to evaluate.

Why we built it

The short version: hypothesis-driven science has structural problems, and we think data-first discovery is the fix.

When you start from a hypothesis, every subsequent decision is shaped by it — which variables to measure, which subgroups to analyse, which results to highlight. Confirmation bias, publication bias, and path dependence in the literature compound to produce a scientific record where most published findings don't replicate and research productivity is declining despite ever-increasing investment.

LLMs trained on that literature inherit all of it. They're excellent at synthesis, but they can't escape the biased corpus they were trained on. They generate hypotheses from the same path-dependent idea space as the researchers themselves.

Disco inverts this. Instead of starting with what you expect to find, you start with what's in the data. Deep learning finds patterns without being told what to look for. Interpretability makes those patterns legible, and hold-out validation and contextualisation makes them defensible. This allows us to more completely explore the space of possible discoveries in any given dataset, without bias or assumption.

I've written about this philosophy in more depth here.

What we've found so far

We've been running Disco on real scientific data for over a year, analysing hundreds of datasets and identifying over a thousand potentially novel patterns: from clinical trial data and agricultural field trials to manufacturing quality datasets, pharmacovigilance records, and insurance claims. You can see some of them here.

My favourite result to date is from one of our earliest case studies. Analysing ocean-atmosphere boundary layer data from the National Center for Atmospheric Research, we identified conditions under which the Surface Layer Hypothesis — a foundational assumption in global weather models, unchallenged for 60 years — breaks down. An NCAR scientist estimated manual analysis of the same dataset would take "one postdoc year" and our finding "could be worth billions".

Who it's for

Disco is general-purpose by design. If you have tabular data and a target variable, it works. We've seen it used on

For agents and developers

Disco is available as a Python SDK, an MCP server, and a REST API. AI agents can sign up, run analyses, and receive structured results entirely programmatically — no human in the loop required.

This matters because agents can't do what Disco does. They can certainly automate what a human would do – coming up with hypotheses, writing analysis code, literature search – but they share all our biases and assumptions. They miss things. (Plus, prompting an agent to do exhaustive data analysis is expensive and takes ages). Direct your agents to use Disco instead!

Our SDK returns structured results that agents can reason over directly, plus a shareable report URL for their human. Documentation is at github.com/leap-laboratories/discovery-engine.

Pricing

We wanted the barrier to entry to be zero. The Explorer tier is free — 10 credits per month for private analyses, unlimited public analyses queued on a shared worker. Public means your data and results are visible to other users, which is fine for open science and academic work. Credits scale with dataset size; a typical 10K-row dataset uses 1–3 credits.

Researcher ($49/month) and Team ($199/month) tiers add more credits, priority processing (typically 0 wait time), and deeper analysis — surfacing more patterns and ususally more novel results. Enterprise pricing is available for dedicated compute, custom integrations and on-prem deployments.

What's next

Research at Leap continues apace. Timeseries and image data support are coming, as is support for really large datasets. The ultimate goal is to be fully multimodal.

If you have a dataset and want to see what's in it, try Disco. If you're building agents that need scientific analysis capabilities, the SDK docs are the place to start. And if you want to talk about custom deployment, integration, or a specific use case, book a call.