|
There’s a hidden modelling assumption baked into many AI- or ML-enabled biological datasets: If your models are trained on cell culture data, your media formulation is part of the model.
Here’s how a default choice I’ve seen teams make early on, accidentally becomes expensive to unwind later: When companies using cell biology to train predictive models are first established, cell culture processes are often given relatively minimal thought. Many companies treat this as basic infrastructure, while the critical part of the model sits downstream in protein folding, functional phenotypes, or biological performance metrics. Unfortunately, by starting from standard cell culture processes, many of these companies begin using processes that rely on fetal bovine serum or other undefined, variable cell culture media inputs. These model datasets are intended to become moats, but they’re only as robust as the conditions under which they were generated. Serum and other animal-derived nutritional supplements influence growth rate, shape stress responses, and impact metabolism, signaling, phenotypic baselines, and every facet of cell biology. From a modelling perspective, this means your training data can encode variability that isn’t obvious until conditions change. Unsurprisingly, this shows up in the data set as unexplained variability or performance drift. The major challenge is that this risk is easy to miss early on, when your model data is fairly limited, e.g. it’s been collected in one laboratory, using one batch of FBS, under “pretty constant” conditions. Usually, this risk rears its ugly head when datasets grow. Suddenly, you’ve used up your batch of FBS and you’re switching to a new one, or you’re transferring your findings to a different lab/company/site and the model’s results aren’t holding up. At that point, performance drift is sometimes blamed on “biology” or “model issues,” when it’s at least partially due to the “basic” cell culture processes that were popped in place three years ago. Reality is, if your experimental system isn’t controlled, your training data isn’t either. This will have implications for your model. My suggestion here is simple; think about this early! Media formulation should be a deliberate modelling decision, not a background reagent choice that we just roll into because “oh yeah, the literature says DMEM + 10% FBS so let’s go for it.” Teams that commit early to chemically defined, stable culture conditions are less likely to face costly or time-consuming surprises when models are applied, transferred, or scaled. Stable inputs in, more reliable models out, and a healthier data moat over time. That tends to keep everyone happy!
0 Comments
Leave a Reply. |
What's been happening?Sharing the Media City journey has been important to us because we want to encourage the next generation of scientists to establish companies that will advance scientific research. Check back regularly for the "building in public" updates on what it looks like to establish a scientific company. Archives
February 2026
Categories |
RSS Feed