Learning across battery fleets without sharing private data

A battery model gets sharper with every cell it sees. But the failures everyone most wants to predict are rare in any one fleet — and operators will not hand over their raw data to build a shared model. Federated learning offers a way out: let every fleet learn from every other fleet, while each operator's data never leaves home.

The data is locked in separate vaults

Predictive battery models follow a simple rule: more data, better predictions. An algorithm that has watched a thousand packs degrade will spot the early signs of trouble far more reliably than one that has watched ten. The obvious move, then, is to pool operating data from many fleets into one large training set.

It almost never happens. A fleet's operating data — charging logs, temperature histories, fault records — is commercially sensitive, often contractually restricted, and entangled with customer privacy. An EV maker will not expose how its packs really behave to a competitor. A grid-storage owner cannot share data it does not legally own. So the data stays in separate vaults, and every operator trains in isolation on a partial view.

That isolation hurts most exactly where it matters. The dangerous failure modes — thermal runaway precursors, sudden capacity collapse, a defect that surfaces years after assembly — are statistically rare in any single fleet. One operator may see only a handful of examples, far too few to model with confidence. Across the whole industry, those same events number in the thousands.

Move the model, not the data

Federated learning inverts the usual arrangement. Instead of moving the data to the model, it moves the model to the data. Each participating fleet keeps its raw records exactly where they are. The model travels to the data, trains locally inside each operator's own environment, and only the learning — not the records — comes back out.

What comes back is a set of model updates: the adjusted parameters and gradients that describe how the model changed once it had seen that fleet's data. Those updates are mathematical summaries. They carry the statistical lessons of a fleet's experience without carrying the underlying logs, the timestamps, or the customer identities.

The raw data never leaves the operator — only the lesson learned from it does.

How the shared model is built, round by round

The process runs as a repeating cycle of coordination rounds. It is deliberately simple, because simplicity is what makes it auditable.

Distribute — a coordinator sends the current shared model out to every participating fleet.
Train locally — each fleet trains that model on its own private data, inside its own infrastructure.
Return updates — each fleet sends back only its model updates, never its records.
Aggregate — the updates are combined, typically by federated averaging, into one improved model that reflects all of them.
Repeat — the improved model is distributed again, and the cycle continues until performance stabilises.

After enough rounds, every operator holds a model shaped by the whole group's experience — including the rare events it has never personally seen.

Closing the privacy gaps

Model updates are far safer to share than raw data, but they are not automatically private. A determined observer can sometimes infer details about a training set from the updates alone. Two techniques harden the system against that.

Secure aggregation ensures the coordinator only ever sees the combined update from all fleets, never any single fleet's contribution. Each operator's update is cryptographically masked so that the masks cancel out only when the updates are summed together — the total is visible, the parts are not.

Differential privacy adds a second layer: calibrated statistical noise is mixed into the updates before they leave each fleet. The noise is small enough that the aggregate model still learns the real pattern, but large enough that no individual record can be reconstructed from what was shared.

The honest difficulties

Federated learning is not a free lunch, and the hardest problem is the data itself. Fleets are not interchangeable. They run different chemistries, sit in different climates, follow different charging habits and serve different duty cycles. In statistical terms the data is non-IID — not independent and identically distributed — and naive averaging can blur genuinely useful signal into a vague industry average that fits no one well.

There is overhead, too. Coordination rounds require communication and synchronisation across many parties, which is slower and more fragile than training in one place. And because the coordinator never inspects the raw data, it must find other ways to verify that each fleet's contribution is honest and high quality rather than noisy or skewed.

Why a genomic layer makes the difference

This is where reading the battery genome changes the outcome. A genomic fingerprint describes what a cell actually is — its structure, composition, purity and electrochemical signature — independent of which fleet happens to operate it. With that fingerprint attached, the learning system can align like with like across fleets: comparing cells of similar genome to one another, rather than averaging unrelated cells into a muddy mean.

That alignment is the antidote to the non-IID problem. Cross-fleet learning becomes meaningful because the model is generalising over batteries that are genuinely comparable, not over an accident of who owns them.

Everyone learns, nobody loses control

The payoff is a different kind of industry relationship. Every operator's model becomes fluent in the rare, dangerous events that no single fleet sees enough of — because somewhere in the federation, those events did occur and the lesson was shared. Yet each operator keeps full ownership and control of its own data; nothing sensitive ever crosses a boundary. The whole industry's hard-won experience of failure becomes a shared safety asset, without anyone having to surrender the data that made them cautious in the first place.