All models are wrong, but some are alive.
Somewhere in a data center right now, a virtual bacterium is dividing for the millionth time. Somewhere else, an AI-enhanced model is learning from a patient's tumor, preparing treatment recommendations that didn't exist when the sun rose. Biology has learned to debug itself, and we're just getting started.
This isn't science fiction–it's Tuesday morning in labs around the world. But the story of how we got here begins seventy years ago at the Marine Biological Association laboratory in Plymouth—a facility that would be bombed during the war, forcing the scientists to return years later to complete their revolutionary work.
In 1952, Alan Hodgkin and Andrew Huxley sat watching voltage traces flicker across an oscilloscope, capturing something no one had seen before: the precise mathematical choreography of a nerve firing. Their equations didn't just describe what happened—they predicted it, millisecond by millisecond. For the first time, life was speaking in the language of code.
This moment planted the seed for something extraordinary: the idea that we could build living systems entirely from code. But transforming that vision into reality would require decades of patient work, false starts, and technological leaps that no one could have predicted.
For forty years, the dream lay mostly dormant. Computers were too slow, biological data too sparse, and the sheer audacity of the goal–simulating every molecule in a living cell–seemed to belong more in science fiction than serious research. Most biologists focused on understanding individual genes or proteins, not entire cellular systems.
Then in the late 1990s, a team of Japanese researchers decided to attempt the impossible. Working with the simplest hypothetical bacterium they could imagine, just 127 genes, they built the first prototype of what they called E-Cell. It was crude, crashed constantly, and bore only passing resemblance to any real organism. But when that virtual bacterium completed its first digital division cycle, something fundamental had shifted. A whole living system had been captured in code, however imperfectly.
Even so, the gap between simulation and reality remained vast. E-Cell was more proof of concept than practical tool. A sketch of what might be possible rather than a working model of life itself.
The real turning point came more than a decade later, in 2012. This time, researchers weren't working with hypothetical organisms but with Mycoplasma genitalium, the simplest known free-living bacterium. They spent years extracting every measurable parameter from nearly a thousand research papers–reaction rates, protein concentrations, metabolic fluxes–building what amounted to a molecular parts list for life itself.
When they finally assembled these fragments into 28 interconnected modules and hit run, the wait began. Ten hours of computation later, for the first time in history, an organism–however simple–had lived an entire life cycle inside a computer.
But here's what made it revolutionary: they immediately began breaking their virtual cell, deleting genes one by one to see what would happen. When certain deletions killed the simulation but left real bacteria unharmed, something profound was occurring. The model was revealing errors in decades of biological annotations. The digital cell had become teacher to its carbon counterpart.
This reversed relationship–silicon instructing carbon–marked the beginning of a new chapter in biology.
The field exploded from there, but in directions no one had anticipated. Rather than simply making existing models more complex, researchers began asking a different question: what if we could design life from scratch, then simulate our own creations?
J. Craig Venter's team took the audacious leap in 2016, creating JCVI-syn3.0—a synthetic organism stripped down to just 473 genes, the bare minimum needed for autonomous life. It was biology by subtraction, removing every non-essential component until only the core machinery remained. Then came the remarkable next step: they built a complete computational model of their artificial creation.
By 2021, these engineered bacteria could be simulated in unprecedented detail. Every gene, every major protein, and nearly every metabolic reaction in JCVI-syn3A. All 493 genes of the world's simplest synthetic organism could be tracked through an entire cell cycle. We had crossed an extraordinary threshold: biology designed in silico, built in carbon, then lived again in silicon with remarkable fidelity.
Meanwhile, other teams kept pushing the boundaries of complexity in the opposite direction. Instead of minimal cells, they tackled organisms like E. coli with thousands of genes orchestrating the intricate dance of metabolism, DNA replication, and cell division. These models captured not just individual cells but entire bacterial colonies, revealing how microscopic decisions by single cells ripple through populations in ways that no biologist had predicted.
Watching these simulations was like observing digital evolution in fast-forward: mutations, selection, adaptation–all the forces that shaped life on Earth, now playing out in accelerated time inside computers.
The next three years brought the infrastructure and validation that would finally move virtual cells from academic curiosities to clinical tools. In 2022, the FDA's CiPA initiative accepted computational human heart cell models as valid supplements to traditional drug safety testing; the first regulatory blessing for virtual cell predictions. Meanwhile, Stanford's Vivarium platform launched, providing the collaborative software infrastructure that multiple research groups had been desperately needing.
But the real acceleration came from an unexpected direction: the explosion in AI capabilities. By 2024, the Chan Zuckerberg Initiative had committed over 1,000 GPUs specifically to AI virtual cell development, while ETH Zürich's new Alps supercomputer brought 435 petaflops of raw computational power to bear on the problem. Instead of purely mechanistic models that took hours to simulate a single cell division, researchers began embedding neural networks into the physics. Machine learning handles the brutally complex gene expression dynamics while traditional equations manage the chemistry. Proof-of-concept studies now show what once required hours of supercomputer time can now run in minutes, and the models learn from new experimental data as it arrives.
Today, this technology has quietly slipped into the machinery of drug discovery and personalized medicine. Companies run millions of virtual experiments, ranking cancer drug targets before touching a test tube. The FDA now accepts predictions from digital heart cells as supplements for current pre-clinical assays during drug safety assessment. Early clinical pilots at major cancer centers now test thousands of in-silico regimens overnight and return ranked short-lists to research oncologists.
What strikes me isn't just the technical achievement–it's the philosophical shift. We've moved from studying biology to partnering with it. The lab bench feeds data to the digital twin, the twin suggests which experiments to run next, and somewhere in that feedback loop, the future of medicine is being written by code that dreams in the language of cells.
A timeline
Although I've tried to be inclusive, this is certainly far from a complete account of the history and evolution of virtual cells. I have tried to cover the highlights and what came up as interesting to me from my own research into the topic. Below is a more detailed timeline of the space in the form of research over the same period.
1952 – The Hodgkin-Huxley model provides the first quantitative simulation of a neuron's action potential, foreshadowing in silico cell physiology. Working across Cambridge University and Plymouth Marine Biological Association, they used hand-operated calculators when Cambridge's computer was down for six months.
1999 – Masaru Tomita's E-Cell platform demonstrates a prototype whole-cell simulation for a 127-gene minimal cell–casting whole-cell modeling as a "grand challenge".
2003 – Genome-scale metabolic models (GEMs) of E. coli popularize flux balance analysis (FBA), making in silico knock-outs routine in metabolic engineering.
2005 – The Blue Brain Project launches at EPFL with IBM Blue Gene supercomputer, targeting a digital reconstruction of a cortical microcircuit at cellular resolution.
2007 – Blue Brain simulates the first biologically detailed neocortical column (~10,000 neurons, 30 million synapses) generating coordinated electrical activity patterns.
2012 – Mycoplasma genitalium WCM becomes the first true whole-cell simulation of a living organism, predicting phenotypes from genotype in 10-hour computations.
2015 – Human Whole-Cell Modeling Project roadmap proposed, outlining the ambitious goal of modeling human cells with their ~20,000 genes and complex organellar structures.
2016 – JCVI-syn3.0 synthetic minimal cell created with just 473 genes, representing the smallest genome capable of autonomous life.
2016 – Human Cell Atlas consortium launches, aiming to map all human cell types as foundation for future whole-cell models.
2019 – JCVI-syn3A engineered with normal cell division, overcoming the irregular division patterns of syn3.0 and enabling more accurate modeling.
2020 – Major scaling breakthroughs: E. coli WCM incorporating 1,214 genes (28% of genome) and S. cerevisiae model spanning 6,447 genes across organelles demonstrate whole-cell modeling of complex organisms.
2021 – JCVI-syn3A whole-cell model published, representing the most complete computational model of any organism—synthetic or natural—incorporating all 493 genes and their interactions.
2022 – Vivarium platform launched by Stanford Covert Lab, providing sophisticated interfaces for integrative multiscale modeling and enabling collaborative whole-cell model development.
2022 – FDA CiPA initiative integrates computational human ventricular cell models into cardiac safety assessment, marking regulatory acceptance of virtual cell predictions.
2023 – Colony-level whole-cell modeling advances enable simulation of bacterial populations with single-cell resolution, revealing emergent behaviors and antibiotic resistance patterns.
2024 – AI Virtual Cell roadmap published in Cell, outlining integration of large-scale machine learning with mechanistic models. Chan Zuckerberg Initiative commits 1,000+ GPU cluster for AI virtual cell development.
2024 – Alps supercomputer launches at ETH Zürich (434.9 petaflops, 10,752 NVIDIA Grace Hopper chips), providing unprecedented computational power for next-generation virtual cell modeling.
2024 – Human Cell Atlas first draft nears completion with over 100 million cells mapped across 18+ organs, providing the foundational data for human whole-cell modeling initiatives.
2025 – Commercial applications mature with companies like Turbine running millions of virtual drug screens, Ginkgo Bioworks integrating AI-powered cell design, and cancer centers deploying patient-specific digital twins for therapy selection.