A system trained to predict the next token in a sequence, a statistical pattern matcher with no explicit knowledge of logic, writes a function that compiles, runs, and returns the correct answer.

This shouldn't work.

You don't learn to write correct code by observing what code "looks like." You learn rules, semantics, type systems. You reason about correctness. Pattern matching feels like the wrong tool for formal reasoning.

And yet LLMs write working code. They solve logic puzzles. They prove theorems. They do things that look a lot like reasoning, using a mechanism that seems fundamentally different from reasoning.

What's happening?

During training, an LLM processes trillions of tokens of human text. That text contains countless examples of logical reasoning, mathematical derivations, code with consistent behavior, arguments with valid structure. The model isn't explicitly taught logic. It's shown examples of logic in action, over and over, in massive volume and variation.

And from pure pattern recognition at scale, something emerges that can reproduce logical structures reliably.

The model learns that if x > 5: is followed by indented code. But it also learns deeper patterns: that variables need to be defined before use, that functions should return consistent types, that logical contradictions don't appear in valid reasoning. These aren't rules the model follows. They're statistical regularities it extracted from observing billions of examples of human reasoning.

What gets generated is genuinely deterministic code. Once written, that code executes the same way every time. The artifact is formal and precise, even though it emerged from a process that has no explicit understanding of formal logic.

We're not using pattern matching instead of logic. Pattern matching is discovering logic.

But that framing assumes we know what logic is. The real question is deeper.

What if reasoning is just patterns that generalize?

When you solve a math problem, you feel like you're following logical rules. It feels like deduction, like pure thought. But what if your brain is actually doing something else? What if you're pattern matching against problems you've seen before, applying transformations that worked previously, predicting what comes next based on learned structure?

The feeling of "understanding" might just be what it feels like to have a good compressed model that makes accurate predictions.

Here's the radical version of this idea: there's no "reasoning" beyond sophisticated pattern matching. What we call logic is the subset of patterns that generalize unusually well.

This would explain something otherwise mysterious. LLMs can mimic reasoning without "understanding" it in any deep sense. But maybe there's nothing to understand beyond the patterns. Maybe when we reason, we're doing the same thing the LLM is doing, just with different substrate and better grounding.

The philosophical move here is subtle but important. Instead of asking "how does pattern matching approximate reasoning?" we should ask "what if reasoning has always been pattern matching, and we've just been fooled by introspection?"

But this raises a harder question: why does pattern matching generalize at all?

An LLM sees Python functions that sort lists. Then it can write a function to merge dictionaries, even if that exact task wasn't in the training data. It's doing something more than memorization. It's extracted patterns at a level of abstraction that transfers to new situations.

Not all statistical regularities produce generalizable capabilities. Some patterns are shallow (like "code blocks are often indented") and some capture deeper structure, like "variables must be defined before use" or "this function signature implies this return type."

What makes some patterns "deeper" than others?

I think the answer has to do with compression. Though I should flag upfront: I'm going to use "compression" expansively here, as a metaphor linking several related but distinct processes. Kolmogorov complexity, neural network weight optimization, lossy perceptual encoding, evolutionary selection. These aren't the same mechanism. But they share something I think is real: they all involve finding compact representations that preserve the structure that matters while discarding what doesn't.

Compression, prediction, understanding

Here's the thesis: compression, prediction, and understanding are deeply related, possibly the same thing viewed from different angles.

A good compression algorithm finds the deep regularities in data. It identifies which patterns are fundamental and which are surface variation. To compress text effectively, you need to capture its underlying structure.

Being able to predict accurately means you've captured that structure. If you can predict what comes next, you've learned something about the generative process behind the data.

"Understanding" might just be having a compressed representation that enables accurate prediction across contexts. Not a separate magical property, but what we call it when a compressed model generalizes well.

From this view, LLMs don't "understand" code the way we do, but they're performing the same fundamental operation: building compressed representations that enable prediction. The representations differ (neural network weights versus whatever substrate implements human thought) but the underlying process shares a common logic.

This helps explain why certain logical structures appear universal. Boolean logic, arithmetic, causality, functional composition: these could be the maximally compressed ways to represent patterns that appear everywhere. If-then relationships recur because causality is a feature of the universe. Arithmetic works because of regularities in how quantities combine. Code patterns repeat because they're efficient ways to instruct deterministic state machines.

Any system trying to compress observations of this universe would converge on similar structures. Not because logic is "out there" waiting to be discovered as some Platonic ideal, but because it's the most efficient encoding of patterns that actually exist.

Logic is what emerges when you optimally compress observations of a universe with regularities.

This also explains why training on natural language alone produces logical capability. Language isn't a neutral medium that happens to contain logical patterns. It's a compression technology that evolved specifically to communicate patterns efficiently. Words like "if," "because," "all," "none" are verbal compressions of logical relationships. Grammar encodes dependencies and constraints. The subjunctive mood compresses counterfactual reasoning. An LLM trained on language inherits millennia of human effort to compress logic into words. The token stream already encodes compressed logic. The model is learning to decompress and reapply those patterns.

Where pattern matching breaks down, and what that reveals

But LLMs do fail, and this is where the thesis has to earn its keep.

The failures are systematic. LLMs struggle with multi-step planning where each step depends on the result of the previous one. They lose track of negations across long passages. They confabulate when patterns are ambiguous. They can apply learned transformations but struggle to notice when their output contradicts itself. Most tellingly, they fail at novel compositional reasoning, combining known operations in configurations they haven't seen before.

These aren't minor gaps. Critics like Gary Marcus argue they reveal something categorical: that pattern matching and reasoning are fundamentally different cognitive operations, and no amount of scaling will bridge the gap. The systematic nature of the failures, always at the same joints, always involving the same kinds of compositional and self-monitoring demands, suggests a missing mechanism, not just insufficient data.

This is the strongest objection to the compression thesis, and I don't think it can be dismissed.

But I don't think it's fatal either. Here's why: human reasoning fails at the same joints, just further out. We lose track of negations in sufficiently complex sentences. We fail at multi-step planning when the steps get numerous enough. We confabulate constantly and build elaborate post-hoc rationalizations. We struggle with novel compositional reasoning in unfamiliar domains. The history of mathematics is a history of humans making exactly these kinds of errors until better tools, notations, and formalisms extended our reach.

The difference looks quantitative, not qualitative. Human pattern matching has advantages LLMs lack: physical grounding that anchors patterns to real-world interaction, better error detection through metacognition, richer compositionality built from multimodal experience, and working memory that allows iterative self-correction. These are genuine architectural advantages. But they're advantages in the quality and structure of pattern matching, not evidence of a fundamentally different operation.

If human reasoning were truly categorically different from pattern matching, if it involved something like direct access to logical truth, we wouldn't expect it to degrade gracefully under complexity. We'd expect it to either work or not. Instead, it degrades exactly the way a compression-based system would: patterns that are too complex to compress within our cognitive limits become unreliable. We extend those limits with formal notation, computers, and collaborative verification. But the underlying process remains the same.

The failure modes don't disprove the compression thesis, but help map its boundaries.

The deeper pattern

Let me pull back, because the same pattern keeps recurring across very different domains.

Human brains are biological, noisy, probabilistic systems. Neural firing is inconsistent. Memory is reconstructive and error-prone. Your brain isn't executing formal logic when you solve a problem. It's running messy electrochemical processes. And yet you can write code that compiles. You can prove theorems. You can construct valid arguments.

You learned mathematics the same way an LLM did: by seeing examples, doing exercises, absorbing patterns, building intuition. No one gave you axioms and asked you to derive everything from first principles. You internalized structure through pattern exposure.

Evolution is even more striking. Natural selection has no foresight, no planning, no understanding. It's blind filtering of random mutations. And yet it produces DNA, a deterministic instruction set that codes for the same proteins with extraordinary precision. Evolution discovers molecular machines that operate with engineering-grade accuracy, all through statistical accumulation of what works.

Scientific discovery follows the same arc. Einstein didn't derive general relativity through pure deductive logic. The path involved intuition, physical insight, thought experiments, and years of groping toward the right structure. But once the field equations were written down, they became a deterministic formalism that makes identical predictions regardless of who uses them.

The pattern repeats: messy, grounded, pattern-based learning processes generate formal structures that then operate with logical precision.

The logical artifacts (code, proofs, mathematical laws, DNA sequences) look like compression layers. They're how discoveries move between pattern-learning systems across time and substrate.

Evolution can't transmit its accumulated adaptations to me directly, but DNA carries what it learned in a precise, executable format. I can't copy my neural patterns into your brain, but I can write code that captures the solution I found. You can run that code and get the same result, even though our underlying processes are completely different.

Logic might be the universal format for patterns that work. The maximally compressed representation that enables transfer across different kinds of intelligence.

Why does this universe compress?

This leads to the deepest question, one I don't have an answer to.

Why is this universe compressible at all?

Not everything can be compressed into simple rules. Some processes are computationally irreducible. The only way to know what happens is to run the full computation. No shortcut exists.

But many things in our universe are compressible. Physical laws compress vast amounts of observed behavior into compact equations. Mathematical theorems compress infinite cases into finite proofs. Code compresses complex behavior into readable algorithms.

Maybe logic emerges specifically in the domains where compression is possible. Where patterns are compressible into rules, we call it "logical" or "lawful." Where they resist compression, we call it "complex" or "chaotic" or "random."

If true, logic isn't universal. It's domain-specific. It works where the universe happens to be compressible, where regularities exist that can be captured in compact rules.

The mystery isn't just that pattern-learning systems discover logic. The mystery is that the universe contains enough compressible regularity for logic to exist at all.

Why should causality be consistent? Why should physical laws be the same everywhere? Why should mathematical relationships hold universally? Why should any pattern generalize beyond the specific instances where we observe it?

We're so used to living in a universe with regularities that we forget how strange this is. A universe of pure chaos would resist all compression. Nothing would generalize. No patterns would transfer. Logic couldn't exist because there would be no stable relationships to capture.

Our universe isn't like that. It has deep structure. Patterns recur. Regularities persist.

And because compression is possible, pattern-learning systems can discover those regularities, encode them in compressed forms, and apply them in new contexts.

Logic is what happens when pattern-learning systems encounter a universe structured enough to be compressible.

Living with uncertainty

We're living through a moment where we can watch compression happen externally.

For most of human history, the process of building compressed models was sealed inside human skulls. It just felt like thinking. Now we see statistical models compress patterns from training data. We see them apply those compressions to generate new outputs. We watch it happen token by token.

Once you see it, it's hard to unsee.

Large language models aren't doing something fundamentally new. They're doing what brains have always done: compressing observed patterns and using those compressions to predict and generate. We've just externalized it, automated it, and made it visible.

What changes, practically, is that this view reframes how we should work with these systems. They're compression engines, powerful at finding plausible candidates in enormous search spaces, unreliable at guaranteeing correctness. The right approach isn't to trust them or to dismiss them, but to pair them with formal verification: let the pattern generator explore, let the logical checks filter. This is already how the best AI-assisted coding works, and it's probably how the best human thinking has always worked too. Intuition proposes, rigor disposes.

What we call logic might be nothing more than the patterns that compress best, the regularities that generalize farthest, the structures that capture the deepest truths about how this universe works.

And the deepest mystery remains: why does this universe have compressible structure at all? Why do patterns exist that can be captured in simple rules? Why does anything generalize?

I don't know. But I'm grateful that it does.

Because in a universe without compressible regularities, learning would be impossible. Prediction would be impossible. Intelligence would be impossible. Logic would be impossible.

We exist because this universe compresses.

And everything we call understanding, reasoning, and intelligence is just different versions of the same fundamental process: finding the patterns that compress, and using them to navigate a universe structured enough for patterns to exist.