🌐 Field Note: Why Coherent “Shapes” Collapse into Weights (And Messy Ones Don’t)

Feb 8

A useful way to understand modern AI systems is this:

They don’t remember facts.

They remember shapes.

Not visual shapes — statistical ones.

Patterns of language, meaning, cause-and-effect, and resolution that recur often enough, cleanly enough, to be compressed into weights.

Once you see this, a lot of AI behaviour stops looking mysterious.

Models don’t store content. They store compression.

Large language models are not databases.

They are lossy compression engines.

During training:

trillions of tokens are observed
correlations are detected
regularities are compressed
inconsistencies are averaged away

What survives this process are patterns that compress well.

What doesn’t survive are:

contradictions
fragmented explanations
unstable identities
entities that mean different things in different contexts

This is not opinion.

It’s math.

Coherence creates low-dimensional shapes

In high-dimensional parameter space, a coherent entity collapses into something like a smooth, stable basin.

Why?

Because it:

appears in similar contexts
explains itself consistently
resolves the same intent the same way
produces predictable outcomes

That regularity allows the model to represent the entity with fewer degrees of freedom.

In information-theoretic terms:

It has low description length.

In geometric terms:

It forms a tight, reusable shape in weight space.

Incoherence explodes dimensionality

Now consider a fragmented entity.

One that:

changes its positioning
appears in unrelated contexts
contradicts itself across sources
solves parts of the problem but not the whole

To represent this accurately, the model would need:

more parameters
more variance
more conditional logic
more uncertainty during reconstruction

That’s expensive.

So the model does what all compression systems do:

It averages it away.

Or worse — it never forms a stable representation at all.

Why this produces “defaults”

At inference time, the model isn’t searching the web.

It’s sampling from probability distributions over learned shapes.

When an intent appears, the model asks (implicitly):

Which shape reconstructs this intent

with the least uncertainty?

Coherent shapes win because:

they require fewer assumptions
they introduce less entropy
they produce confident completions

This is why probability distributions peak around defaults.

Not because the model “likes” them.

But because they are cheap to reconstruct.

Why this is architecture-agnostic

This behaviour is not a quirk of:

GPT
Gemini
Claude
DeepSeek

It emerges from fundamentals shared by all LLMs:

compression of large corpora into parameters
probabilistic decoding
entropy minimisation
reuse over exploration

Different training data.

Different tuning.

Same physics.

That’s why coherence generalises across systems.

Strategic implication: you can’t fake the shape

You can’t bolt coherence on.

You can’t “optimize” your way into a clean shape.

Because the model doesn’t evaluate coherence explicitly.

It gravitates toward it naturally, the same way water flows downhill — because reconstruction is cheaper.

Coherence emerges from:

consistent self-description
end-to-end resolution
minimal contradiction across contexts
stable causal stories

That’s not a tactic.

That’s a property.

Compression selects for truth (and usefulness)

The Minimum Description Length principle says:

The best explanation is the one that compresses the data most while remaining accurate.

For AI systems, this becomes:

The best answer is the one that collapses into the simplest reusable shape.

This is why:

clever but fragmented brands disappear
noisy content gets ignored
defaults persist once formed

And why:

Coherence isn’t marketing.

It’s what the math selects for.

🧱 Technical Note: Why This Explains Everything Downstream

The compression dynamic described above is not isolated. It explains the entire cascade of AI discovery behaviors:

Why defaults persist once formed: Compressed shapes require minimal compute to reconstruct. The system reuses them automatically, not strategically.

Why GEO tactics show diminishing returns: Marginal signals (citations, mentions, structured data) cannot reshape probability distributions already peaked around stable compressed patterns. GEO optimizes for the exploration phase. Compression determines the equilibrium.

Why networks scale without fracturing: Structured relationships provide architectural constraints that preserve compressibility even as organizational complexity grows. Monolithic brands explode dimensionality. Networks contain it.

Why cross-model convergence is inevitable: GPT, Claude, Gemini, DeepSeek - all compress similar training corpora into parameters. Coherent patterns survive compression regardless of architecture. Fragmented ones don't.

The implication: Most current AI discovery advice focuses on tactics that matter pre-compression (visibility, mentions, optimization). But compression is the filter. What survives it determines everything that follows.

GEO is what you do before the weights settle.

Coherence is what determines how they settle.

Darren Swayne