Virtual cells

All models are wrong, but some are alive.

Somewhere in a data center right now, a virtual bacterium is dividing for the millionth time. Somewhere else, an AI-enhanced model is learning from a patient's tumor, preparing treatment recommendations that didn't exist when the sun rose. Biology has learned to debug itself, and we're just getting started.

This isn't science fiction–it's Tuesday morning in labs around the world. But the story of how we got here begins seventy years ago at the Marine Biological Association laboratory in Plymouth—a facility that would be bombed during the war, forcing the scientists to return years later to complete their revolutionary work.

In 1952, Alan Hodgkin and Andrew Huxley sat watching voltage traces flicker across an oscilloscope, capturing something no one had seen before: the precise mathematical choreography of a nerve firing. Their equations didn't just describe what happened—they predicted it, millisecond by millisecond. For the first time, life was speaking in the language of code.

A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve

1952 ~ A. L. Hodgkin & A. F. Huxley

This moment planted the seed for something extraordinary: the idea that we could build living systems entirely from code. But transforming that vision into reality would require decades of patient work, false starts, and technological leaps that no one could have predicted.

For forty years, the dream lay mostly dormant. Computers were too slow, biological data too sparse, and the sheer audacity of the goal–simulating every molecule in a living cell–seemed to belong more in science fiction than serious research. Most biologists focused on understanding individual genes or proteins, not entire cellular systems.

Then in the late 1990s, a team of Japanese researchers decided to attempt the impossible. Working with the simplest hypothetical bacterium they could imagine, just 127 genes, they built the first prototype of what they called E-Cell. It was crude, crashed constantly, and bore only passing resemblance to any real organism. But when that virtual bacterium completed its first digital division cycle, something fundamental had shifted. A whole living system had been captured in code, however imperfectly.

Even so, the gap between simulation and reality remained vast. E-Cell was more proof of concept than practical tool. A sketch of what might be possible rather than a working model of life itself.

The real turning point came more than a decade later, in 2012. This time, researchers weren't working with hypothetical organisms but with Mycoplasma genitalium, the simplest known free-living bacterium. They spent years extracting every measurable parameter from nearly a thousand research papers–reaction rates, protein concentrations, metabolic fluxes–building what amounted to a molecular parts list for life itself.

A Whole-Cell Computational Model Predicts Phenotype from Genotype

2012 ~ J. R. Karr et al.

When they finally assembled these fragments into 28 interconnected modules and hit run, the wait began. Ten hours of computation later, for the first time in history, an organism–however simple–had lived an entire life cycle inside a computer.

But here's what made it revolutionary: they immediately began breaking their virtual cell, deleting genes one by one to see what would happen. When certain deletions killed the simulation but left real bacteria unharmed, something profound was occurring. The model was revealing errors in decades of biological annotations. The digital cell had become teacher to its carbon counterpart.

This reversed relationship–silicon instructing carbon–marked the beginning of a new chapter in biology.

The field exploded from there, but in directions no one had anticipated. Rather than simply making existing models more complex, researchers began asking a different question: what if we could design life from scratch, then simulate our own creations?

J. Craig Venter's team took the audacious leap in 2016, creating JCVI-syn3.0—a synthetic organism stripped down to just 473 genes, the bare minimum needed for autonomous life. It was biology by subtraction, removing every non-essential component until only the core machinery remained. Then came the remarkable next step: they built a complete computational model of their artificial creation.

Design and synthesis of a minimal bacterial genome

2016 ~ C. A. Hutchison III et al.

By 2021, these engineered bacteria could be simulated in unprecedented detail. Every gene, every major protein, and nearly every metabolic reaction in JCVI-syn3A. All 493 genes of the world's simplest synthetic organism could be tracked through an entire cell cycle. We had crossed an extraordinary threshold: biology designed in silico, built in carbon, then lived again in silicon with remarkable fidelity.

Meanwhile, other teams kept pushing the boundaries of complexity in the opposite direction. Instead of minimal cells, they tackled organisms like E. coli with thousands of genes orchestrating the intricate dance of metabolism, DNA replication, and cell division. These models captured not just individual cells but entire bacterial colonies, revealing how microscopic decisions by single cells ripple through populations in ways that no biologist had predicted.

Whole-cell modeling of E. coli colonies enables quantification of single-cell heterogeneity in antibiotic responses

2023 ~ C. J. Skalnik et al.

Watching these simulations was like observing digital evolution in fast-forward: mutations, selection, adaptation–all the forces that shaped life on Earth, now playing out in accelerated time inside computers.

The next three years brought the infrastructure and validation that would finally move virtual cells from academic curiosities to clinical tools. In 2022, the FDA's CiPA initiative accepted computational human heart cell models as valid supplements to traditional drug safety testing; the first regulatory blessing for virtual cell predictions. Meanwhile, Stanford's Vivarium platform launched, providing the collaborative software infrastructure that multiple research groups had been desperately needing.

Vivarium: an interface and engine for integrative multiscale modeling

2022 ~ E. Agmon et al.

But the real acceleration came from an unexpected direction: the explosion in AI capabilities. By 2024, the Chan Zuckerberg Initiative had committed over 1,000 GPUs specifically to AI virtual cell development, while ETH Zürich's new Alps supercomputer brought 435 petaflops of raw computational power to bear on the problem. Instead of purely mechanistic models that took hours to simulate a single cell division, researchers began embedding neural networks into the physics. Machine learning handles the brutally complex gene expression dynamics while traditional equations manage the chemistry. Proof-of-concept studies now show what once required hours of supercomputer time can now run in minutes, and the models learn from new experimental data as it arrives.

Today, this technology has quietly slipped into the machinery of drug discovery and personalized medicine. Companies run millions of virtual experiments, ranking cancer drug targets before touching a test tube. The FDA now accepts predictions from digital heart cells as supplements for current pre-clinical assays during drug safety assessment. Early clinical pilots at major cancer centers now test thousands of in-silico regimens overnight and return ranked short-lists to research oncologists.

What strikes me isn't just the technical achievement–it's the philosophical shift. We've moved from studying biology to partnering with it. The lab bench feeds data to the digital twin, the twin suggests which experiments to run next, and somewhere in that feedback loop, the future of medicine is being written by code that dreams in the language of cells.

But the field didn't stop there. Just this month, a team at Arc Institute released what may be the most significant leap forward yet: State, a transformer-based virtual cell model that finally delivers on the promise that has long eluded the field.

Predicting cellular responses to perturbation across diverse contexts with State

2025 ~ A. K. Adduri, D. Gautam, B. Bevilacqua, A. Imran, R. Shah, M. Naghipourfar, et al.

State represents both a technological and data breakthrough. The model achieves a 50% improvement over existing approaches in distinguishing perturbation effects and doubles the accuracy in identifying genes that truly change after cellular interventions. More importantly, it's the first model to consistently outperform simple linear baselines–a milestone that has frustrated researchers for years as sophisticated deep learning approaches repeatedly failed to beat basic mathematical operations.

The secret lies in State's massive training scale and elegant two-part architecture. Trained on observational data from nearly 170 million cells and perturbation data from over 100 million more, State learns to map cellular states in a smooth, multidimensional space where similar cell types naturally cluster together. Its transformer architecture then predicts how cells will transition through this learned landscape when perturbed by drugs, genetic modifications, or environmental changes.

The implications stretch far beyond academic curiosity. State offers a glimpse of medicine where discovering new drugs shifts from years of laboratory trial-and-error to computational queries that take minutes, where we can predict how diseased cells might respond to treatments before ever touching a test tube.

Timeline

Although I've tried to be inclusive, this is certainly far from a complete account of the history and evolution of virtual cells. I have tried to cover the highlights and what came up as interesting to me from my own research into the topic.

1952 – The Hodgkin-Huxley model provides the first quantitative simulation of a neuron's action potential, foreshadowing in silico cell physiology. Working across Cambridge University and Plymouth Marine Biological Association, they used hand-operated calculators when Cambridge's computer was down for six months.

A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve

1952 ~ A. L. Hodgkin & A. F. Huxley

1999 – Masaru Tomita's E-Cell platform demonstrates a prototype whole-cell simulation for a 127-gene minimal cell–casting whole-cell modeling as a "grand challenge".

E-Cell: Software Environment for Whole-Cell Simulation

1999 ~ M. Tomita et al.

2003 – Genome-scale metabolic models (GEMs) of E. coli popularize flux balance analysis (FBA), making in silico knock-outs routine in metabolic engineering.

Genome-Scale Reconstruction of the Saccharomyces cerevisiae Metabolic Network

2003 ~ J. Förster et al.

2005 – The Blue Brain Project launches at EPFL with IBM Blue Gene supercomputer, targeting a digital reconstruction of a cortical microcircuit at cellular resolution.

The Blue Brain Project

2006 ~ H. Markram

2007 – Blue Brain simulates the first biologically detailed neocortical column (~10,000 neurons, 30 million synapses) generating coordinated electrical activity patterns.

2012 – Mycoplasma genitalium WCM becomes the first true whole-cell simulation of a living organism, predicting phenotypes from genotype in 10-hour computations.

A Whole-Cell Computational Model Predicts Phenotype from Genotype

2012 ~ J. R. Karr et al.

2015 – Human Whole-Cell Modeling Project roadmap proposed, outlining the ambitious goal of modeling human cells with their ~20,000 genes and complex organellar structures.

A blueprint for human whole-cell modeling

2018 ~ B. Karr et al.

2016 – JCVI-syn3.0 synthetic minimal cell created with just 473 genes, representing the smallest genome capable of autonomous life.

Design and synthesis of a minimal bacterial genome

2016 ~ C. A. Hutchison III et al.

2016 – Human Cell Atlas consortium launches, aiming to map all human cell types as foundation for future whole-cell models.

The Human Cell Atlas

2017 ~ A. Regev et al.

2019 – JCVI-syn3A engineered with normal cell division, overcoming the irregular division patterns of syn3.0 and enabling more accurate modeling.

Minimal cell biology of JCVI-syn3A

2019 ~ J. I. Glass et al.

2020 – Major scaling breakthroughs: E. coli WCM incorporating 1,214 genes (28% of genome) and S. cerevisiae model spanning 6,447 genes across organelles demonstrate whole-cell modeling of complex organisms.

Whole-cell modeling of E. coli by integrating metabolic and gene expression models

2020 ~ A. N. Macklin et al.

A whole-cell model of budding yeast predicts gene essentiality

2020 ~ C. Ye et al.

2021 – JCVI-syn3A whole-cell model published, representing the most complete computational model of any organism—synthetic or natural—incorporating all 493 genes and their interactions.

Fundamental behaviors emerge from simulations of a living minimal cell

2021 ~ Z. A. Thornburg et al.

2022 – Vivarium platform launched by Stanford Covert Lab, providing sophisticated interfaces for integrative multiscale modeling and enabling collaborative whole-cell model development.

Vivarium: an interface and engine for integrative multiscale modeling

2022 ~ E. Agmon et al.

2022 – FDA CiPA initiative integrates computational human ventricular cell models into cardiac safety assessment, marking regulatory acceptance of virtual cell predictions.

The Comprehensive in Vitro Proarrhythmia Assay (CiPA) initiative

2016 ~ D. G. Strauss et al.

2023 – Colony-level whole-cell modeling advances enable simulation of bacterial populations with single-cell resolution, revealing emergent behaviors and antibiotic resistance patterns.

Multi-scale models of whole cells: progress and challenges

2023 ~ M. Karr et al.

2024 – AI Virtual Cell roadmap published in Cell, outlining integration of large-scale machine learning with mechanistic models. Chan Zuckerberg Initiative commits 1,000+ GPU cluster for AI virtual cell development.

How to build the virtual cell with artificial intelligence: Priorities and opportunities

2024 ~ C. Bunne et al.

2024 – Alps supercomputer launches at ETH Zürich (434.9 petaflops, 10,752 NVIDIA Grace Hopper chips), providing unprecedented computational power for next-generation virtual cell modeling.

2024 – Human Cell Atlas first draft nears completion with over 100 million cells mapped across 18+ organs, providing the foundational data for human whole-cell modeling initiatives.

The commitment of the human cell atlas to humanity

2024 ~ S. A. Teichmann et al.

2025 – Commercial applications mature with companies like Turbine running millions of virtual drug screens, Ginkgo Bioworks integrating AI-powered cell design, and cancer centers deploying patient-specific digital twins for therapy selection.