I Asked GPT-4, Claude & Gemini to Design a Battery. All Violated Physics.

The scaling hypothesis just hit a wall. And it’s not a software problem.

May 17, 2026

I just ran an experiment.

I asked three leading LLMs — GPT-4, Claude, Gemini — to design a thermal management system for an EV battery pack. Operating conditions: 45°C ambient temperature, 150kW fast charging, 500-mile range requirement.

All three gave detailed, convincing answers. Liquid cooling loops. Thermal interface materials. Heat exchanger specs. Performance projections. Real citations.

Everything sounded perfect.

Except all three violated the second law of thermodynamics.

The systems they proposed would require extracting more heat than the battery generates under those conditions — physically impossible without adding external energy that would drain the battery faster than it charges. The math was internally consistent. The citations were real. The physics was fantasy.

Cost to discover this with a prototype? $8–12M and 18 months.

The Pattern Matching Problem

Here’s what actually happened: these models learned what battery thermal management sounds like from thousands of engineering papers, patents, and technical documents.

They learned the vocabulary. The structure. The citation patterns. The confident tone engineers use when they know what they’re talking about.

What they didn’t learn? That energy is conserved.

Because you can’t learn thermodynamics from text. You can learn the language of thermodynamics — entropy, enthalpy, heat transfer coefficients. You can learn to arrange those words in convincing patterns.

But the actual constraints? The hard limits of physical reality?

Those aren’t statistical patterns. They’re laws.

And LLMs don’t know the difference.

This isn’t hallucination in the traditional sense. The model isn’t making things up randomly. It’s doing exactly what it was trained to do: predict the next token that sounds most like what appears in technical documents about battery cooling.

The problem is that sounding right and being right are completely different things when physical laws are involved.

Why Scaling Won’t Fix This

The standard response to any AI limitation is: just scale. More parameters will solve it.

No. It won’t.

More parameters make the hallucinations more convincing, not more correct. A 10 trillion parameter model will give you thermal management specs that cite real papers, reference actual materials, include proper units, and sound like they came from a PhD thesis.

It will still violate conservation of energy.

The architecture is fundamentally wrong for physical reasoning. Physical laws aren’t emergent properties of text prediction — they’re hard constraints that exist independent of human language. You can’t learn that entropy increases from reading Reddit. You can’t infer conservation of momentum from GitHub code.

Dyson learned this the hard way. They spent $600M developing an electric vehicle — brilliant engineers applying known solutions from adjacent domains. The result was a flawlessly engineered car that was economically impossible because the battery physics imposed a $150,000 price floor. They discovered the physics constraint after building the thing.

The new version of this failure? Let an LLM design it, build what it recommends, discover the physics problem $50M later when your prototype fails during charging tests.

Same outcome. More convincing failure mode. Faster.

What Physics-Grounded AI Actually Requires

If you want AI that can reason about physical reality, you need a different architecture. Not bigger models. Different ones.

Three layers working together:

Layer 1 — Pattern recognition (LLMs): Cross-domain discovery. Find every industry that solved rapid heat extraction from constrained geometry. This is where LLMs genuinely shine — they’re extraordinary at finding analogies across domains that no human engineer would connect.

Layer 2 — Physics validation (symbolic AI): Hard limits on what’s physically possible. Conservation laws encoded as validation gates. Not “87% confident this is thermodynamically feasible.” Binary. Yes or no.

Layer 3 — Causal knowledge graphs: Structured representation of physical relationships. Not correlation — causation. If you increase cooling power, battery weight increases AND available energy decreases. The system needs to reason about trade-offs, not just patterns.

This is the architecture KRAFT runs in production — right now, with a large automotive OEM. An LLM found a battery cooling solution by recognising a pattern match between automotive thermal constraints and medical cryotherapy techniques. Then Physics-AI validated it against thermodynamic limits, materials constraints, and cost models before anyone built anything.

Result: better cooling performance. Zero prototypes wasted.

Why This Matters Beyond Batteries

This isn’t an R&D optimisation problem. It’s the grounding problem that will determine whether we get artificial general intelligence or just really convincing artificial confidence.

Robotics? Can’t violate physics when manipulating objects. Autonomous systems? Can’t approximate stopping distance at 70mph. Drug discovery? Molecules don’t care about statistical confidence intervals. Materials science? Can’t invent room-temperature superconductors by predicting plausible-sounding words.

Every real-world application of AGI eventually collides with physical reality. And physical reality doesn’t negotiate.

The companies building AGI like it’s purely a scaling problem — more data, more parameters, more compute — are optimising for a world that only exists in text.

The actual world has physics.

The race that actually matters

While everyone’s focused on who can build the biggest language model, there’s a quieter race happening.

Who can build AI that understands physical reality? Not simulates it. Not approximates it. Understands the hard constraints.

The $300B+ Fortune 500 companies spend annually on R&D is the testbed. Every failed prototype is a validation dataset. Every Dyson-style writeoff is proof that pattern matching alone doesn’t work when physics is involved.

The companies that solve physics-grounded AI don’t just win the R&D tools market. They win robotics. Autonomous systems. Manufacturing. Drug discovery. Materials science.

Every market where AI has to interact with reality instead of just predicting text.

Originally published on LinkedIn

. If you’re working on physics-constrained AI or the grounding problem — I’d like to compare notes: kreat.ai/kraft

Thanks for reading SRI'S BLOG! This post is public so feel free to share it.

Share SRI'S BLOG

Discussion about this post

Ready for more?