Physics as Optimal Compression: What If Laws Are Not Unique?
Author: David Björling
David Björling holds a Master’s Degree in Physics from KTH Royal Institute of Technology. He is a first principles thinker, having built predictive algorithms now in use for demand-based parking prices at major airports across Scandinavia. He is currently engaged in a proof of concept for introducing epistemic structure into base-LLMs, experimenting on small model Microsoft Phi-2.
Is our way of representing the laws of nature actually the truth, or are physical laws better understood as one possible set of approximate compressions we happened to stumble upon? I would argue for the latter. I would also argue that we may be at a unique moment in history. A point in time where it is becoming possible, for the first time, to systematically explore many alternative cohesive systems for formulating natural laws. But let’s start with the first point.
Consider this: the human brain is a strange substrate on which to conduct science.
In raw capacity to perform operations, our brain is comparable to a supercomputer.
Our working memory is quite mediocre. We can hold references to highly complex concepts in mind (like World War 2), but we can never consciously attend to more than a tiny fraction of the complete whole at once.
Our capacity for arithmetic is comically poor relative to our overall brain capacity. At one point, a few hundred years ago, the ability to perform calculations that a birthday card could now handle was used as a proxy for human intelligence. How good are you at long division?
Clearly the first trait is beneficial for conducting science. Clearly the second and third are not. For the most part, scientific progress only accelerated once we could externalize working memory and computation. Writing was one early mechanism for this. What can be achieved through writing emulates what could be achieved without it, if our memory and compute were far better. Modern computers and machine learning represent our latest steps in this externalization.
I would argue that the way we describe and handle natural laws is inevitably shaped by both our cognitive strengths and weaknesses. The purpose of natural laws is to enable prediction. We find ways to isolate and compress certain aspects of reality into forms that are tractable for us to reason about. And most often, we are specifically interested in performing concrete predictive computations.
If we view physics as a search for optimal data compression, we must ask: optimal for whom? Clearly the same underlying truth can be expressed in different ways. In the US, fuel economy is tracked in miles per gallon. In Europe, we use litres per 100 km. Neither is inherently worse, but each guides intuition in subtly different directions. The same is true of Newton versus Einstein.
Any good compression has a scope — a domain of validity — but also a certain ease of computability. Every measurement contains uncertainty; every prediction carries acceptable error. Rather than viewing physical laws as discoveries of metaphysical truth, I propose we treat them as solutions to an optimization problem.
Given experimental data, find a set of representations that:
Minimises prediction error E
Minimises description length L
Minimises computational cost C
Maximises scope S
We can, and will, have many overlapping representations. Some might minimize computational cost; others might maximize scope. With infinite compute and time, we could keep track of individual particles, and emergent properties like temperature would just introduce errors. This, however, would be utterly infeasible in reality.
Physics then becomes a search over a Pareto surface defined by simplicity, accuracy, and generality, with many valid and overlapping representations.
Newton sits on one region of that surface: tiny E, moderate C, large S.
Einstein occupies another: minimal E, large C, very large S.
Quantum mechanics yet another: minimal E, very large C, very large S.
But why would these be unique optima? They are optima that happened to be accessible to humans, under human constraints. They also followed from prior definitions and compressions, dependent on the haphazard path our scientific history took. This is why I suspect that an unconstrained optimizer might settle on entirely different primitives. Newton and Einstein did not find truth in a vacuum. They located regions on the Pareto surface that were reachable given what had come before, shaped by our sensory priors, our evolutionarily crafted intuitions, and our computational limitations.
The Problem of Time
There is a deep suspicion in physics that one of our most fundamental variables may be something of an illusion: time.
As humans, we experience time as a sequence of distinct, fleeting moments. This is consistent with our weak working memory. We may be forced to process events sequentially simply because we lack the buffer to handle them globally. We stitch snapshots together and call it flow. This is likely also tied to the timescales at which human intervention needs to happen.
Now consider a tree. If a tree possessed intelligence, it would almost certainly treat time differently from us. Our concept of individual snapshots would not be the most evolutionarily advantageous architecture for a rooted organism responding to slow, sweeping arcs of environmental change. The tree might perceive time as large, mean gradients held in a vast working memory, spanning years. What tangible statistically significant changes emerging over months and years ought the tree to optimize for? This is obviously speculative.
What is not speculative is the kind of artificial intelligences we have begun creating — Large Language Models in particular. A LLM possesses a context window vastly larger than human working memory, and no concept of time that has been shaped by physical survival. A LLM might not represent a novel — or a physical process — as a sequence of words to be read one by one. Instead, it can hold the entire narrative arc as a single, simultaneous, richly structured representation.
To a mind with a massive context window, the concept of time as a linear progression of snapshots might seem terribly inefficient, a lossy compression. Rather than t as a variable that ticks forward, such a mind might compress reality into arcs or events-as-whole-objects, perceiving the full trajectory of a ball as a single static shape rather than a moving point. For most humans, a parabola plotted on a graph is an abstraction. Not so for a mind that naturally thinks in whole trajectories.
Would the laws we have formulated not be influenced by our particular vantage on time? I find it unlikely that they would be immune to it. Our intuition for time is also deeply non-relativistic. We evolved at slow speeds in a weak gravitational field. A mind evolved for survival in a regime where special relativity is decisive, perhaps navigating near a black hole or moving at a significant fraction of the speed of light, would have a radically different intuitive prior for time. Concepts like simultaneity, as we currently conceive of them, would be almost meaningless.
We can speculate on what those alternative architectures might look like:
The Deep-Time Mind: a mind working at longer timescales with a vastly larger working memory, which views events as four-dimensional structures in spacetime rather than a sequence of nows. Just as we could view an apple as a thousand microscopic slices, but have no reason to, such a mind might have no reason to slice time at all.
The Relativistic Mind: a mind fine-tuned for survival in high-velocity environments, for whom space and time are intuitively stretchy, codependent substrates, a lived experience rather than a complex mathematical derivation.
None of these minds would likely settle on the same priors for t that we have.
The question then becomes almost inevitable: if we fed a sufficiently advanced machine learning algorithm a large amount of physical data and tasked it with finding patterns and compressions — redundancies in reality — what would it find? How many overlapping, yet distinct and valid, compressions would we find, if we ran and re-ran the experiment?
The Contingency of Our Representations
There are many examples where we already know that different representations overlap. Newton’s laws and relativity cover much of the same ground. Within a certain domain and accuracy threshold, they say the same thing — but one is far more computationally efficient. There is a reason we have not shelved Newton since discovering Einstein, even though the scope overlap is 100 %.
Lagrangian and Newtonian mechanics are largely equivalent in scope, yet in practice one can be computationally superior in certain contexts while the other dominates elsewhere.
There is also one more human constraint worth noting here: high-fidelity long-term memory and retrieval. If this were orders of magnitude better in humans, perhaps we would not only seek to generalise our laws, but also to fragment them into a myriad of highly specialised compressions — limited in scope, but computationally optimal within their domain of validity. Allow me to direct your attention to Quake III Arena.
The Quake Constant Problem
There was a time when compute was a severe bottleneck in computer games. Any kind of motion through a 3D representation of space depends heavily on square roots and inverse square roots. We know how to compute these quite exactly, but what if there is not enough compute available to do so in a timely manner, as it was when Quake III was being developed? The necessary solution was to find a computationally optimal compression that worked well enough for the game’s required scope and accuracy:
The fast inverse square root constant in Quake III (0x5f3759df).
A human physicist might call it numerology. Ugly. Ad hoc. Not a law of nature. But within its narrow band of validity, it is exactly that. Under strict compute constraints, it may even be locally optimal.
How many such local optima exist in physics? Ugly, hyper-local compressions that look meaningless to us but are computationally superior within bounded regimes? Human taste filters these out (they seem devoid of deeper meaning) but perhaps that reflects only our inability to simultaneously manage generalisations and local optimisations in a structured way. An optimizer without aesthetic prejudice would not discard these valid and useful compressions.
And how many Newtonian, Lagrangian, or Einsteinian mechanics might there be — slightly different in scope, accuracy, and efficiency — that a system starting without priors could discover? How many broad scope unifications, necessary in the most complex, integrated cases? Finding different optimal compressions on a Pareto surface might give us some insight.
True Mathematics and Representational Contingency
This ultimately links to a deeper suspicion I have about mathematics: I doubt integers are fundamental.
They feel fundamental because we are organisms that individuate. I am one, you are two. We deal with singular objects in a world of discrete things. But according to one of our best theories, quantum mechanics, this discreteness may be illusory. Position, velocity, time are all indefinable in the way we commonly think about them. Reality is smeared out, without sharp borders. At the fundamental level, discreteness dissolves into wavefunctions; identity dissolves into process.
In this regard, intervals seem to be more fundamental than counts. And for many processes, tracking logarithms is more natural than tracking counts. This is true for our perception of sound and light, to name a couple of examples. Might there exist cognitive architectures for which logarithms are primitive and addition is derived? No five fingers acting as the starting point, so to speak. The foundation of our mathematics is initially rooted in whole numbers and Euclidean geometry.
I highly doubt this is the only valid starting point. And perhaps there are even useful compressions that cannot be expressed or proven within our system? Gödel tells us that no sufficiently rich formal system is complete. Maybe any such system tends to emerge through specific modes of interaction with reality? The “unreasonable effectiveness of mathematics” in describing physical reality has long puzzled scientists and philosophers. But perhaps this is not such a mystery. Human mathematics may have evolved precisely to compress certain physical observations.
Different axioms and starting points might be equivalent. There might be axioms seemingly unrelated to Euclidean geometry that, by necessity, lead to Euclid’s axioms being true, and vice versa. Yet no system can probably contain the full truth. Or, as Plato might have put it: perhaps each mathematical system is but one shadow cast by a deeper mathematical reality, if such a thing even exists.
Regardless: For two beings occupying the same underlying reality, what is deducible within their respective mathematical systems would likely overlap substantially, but not completely. Not if their modes of interaction with reality differ enough. Einsteinian spacetime, for instance, necessitates moving beyond Euclidean geometry. In real spacetime, parallel lines can cross.
Once again, I find myself wondering: what would a machine learning system find, if it searched the space of possible compressions? To name one tantalising example: calculus discretises the continuous through limits and infinitesimals, a move that feels natural to us. But what might calculus look like for a mind that experiences continuity as primitive? A mind finding whole numbers to be alien and impossibly exact edge-cases.
Instead of rediscovering F=ma, the AI might express mechanics through continuous deformation fields in phase space. In this alien physics, our concepts of ‘force’ and ‘mass’ might never explicitly appear. Yet, the predictions would match our experiments perfectly. It would be a physics of flow rather than objects—empirically equivalent, but conceptually unrecognizable.
The Actual Proposal
Back to the main point. What might we find through letting Machine-Learning find different compressions? The architectural details matter only insofar as they make possible a concrete experiment allowing us to:
Train an optimizer on experimental data.
Reward compression along multiple axes.
Reward finding many distinct points on the Pareto surface.
Reward different balances of Scope, Accuracy, Description Length, and Computational Cost.
Allow it to invent its own variables.
Allow it to request data needed to differentiate between competing generalisations.
Let it search.
Then observe:
Do independent runs converge?
Do they rediscover our laws?
Do they produce alternative but equivalent compressions?
Do they invent primitives that feel alien but are predictively powerful?
Do they find compressions more computationally efficient for specific domains?
Do they discover different unifications with different scopes?
Do they produce generalisations diverging from our current laws in ways that can be empirically tested?
If convergence occurs, our laws may be structurally privileged. If divergence occurs, physics may be representation-contingent. Either result is interesting.
A further refinement: data fed to such a system could come in three forms. “Simulated” data, generated from our current best understanding with added uncertainty; “experimentally measured” data from real observations; and “specifically requested” data specifically generated to probe the limits of a given compression’s domain of validity, helping distinguish between competing generalisations, asked for by the model itself.
Finally, a thought that has long hovered at the edges of my mind: what if String Theory had been discovered before Relativity? Would that even have been possible? I raise this only as an example of how contingent our path through physics may have been. This is not a value judgement about which theory is more fundamentally true. It is simply a thought I find tantalising.
There are so many interesting possibilities to explore here, and the only way to obtain real answers is through experimentation. I have a number of concrete ideas for how such an exploration might be started, but those details are for another time.

