Before You Can Validate AI, Your Project Needs to Not Fail

December 8, 2025

80% of AI projects fail. That’s not a GxP problem—it’s an AI problem. But in life sciences, the stakes are higher, the data messier, and the consequences more serious. The industry’s unique complexity—and its regulatory and compliance burden—turn common failure modes into structural risks. This article explores the patterns, examines the problems, and suggests where the real value lies.

AI is making significant contributions across many disciplines, including life sciences. There are too many positive indications to think otherwise. It’s this potential that makes addressing the issues identified here so important.

AI Projects Fail

Let’s start with the broader landscape. According to research from the RAND Corporation, approximately 80% of AI projects fail—a rate considerably higher than general IT projects from a decade ago. This isn’t a life sciences problem. It’s an AI problem. (Note: you can find links to references at the end of the article.)

RAND identified five root causes based on interviews with 65 data scientists, engineers, and leaders across industries. The findings are instructive, especially the first two:

Leadership-driven failures dominated the findings. This includes solving the wrong problem, using metrics that don’t align with business goals, overestimating what AI can deliver, and underestimating timelines, which, as we know is not restricted to AI projects. One respondent explains it: “They think they have great data because they get weekly sales reports, but don’t realize the data may not meet its new purpose.”

Data-driven failures represent perhaps the most fundamental challenge. Data engineering—the unglamorous work of cleaning, structuring, and preparing data—consumes the bulk of effort in any AI project. It’s rarely where the investment or attention goes.

The remaining categories—bottom-up failures (chasing technology rather than solving problems), underinvestment in infrastructure, and immature technology—round out the picture. But leadership and data account for the bulk of the carnage.

These findings are echoed elsewhere. S&P Global reports that 42% of companies scrapped most of their AI initiatives in 2025, up from 17% in 2024. Gartner projects that 30% of generative AI projects will be abandoned after proof of concept by the end of 2025. The trend is worsening, not improving.

A caveat: these are surveys and interviews, not longitudinal outcome studies. The data on AI project success and failure is fragmentary, skewed toward what companies are willing to report. But the pattern is consistent enough to take seriously.

Life Sciences AI Projects

Life sciences organizations face the same failure modes as everyone else, compounded by domain complexity and regulatory requirements. But the “successes” we read about appear to be concentrated in a specific area: drug discovery.

The numbers are genuinely impressive. AI-discovered molecules entering Phase I clinical trials have achieved success rates of 80-90%, substantially higher than the historic industry average of 40-65%. The pipeline of AI-discovered candidates has grown substantially—from 3 in 2016 to 17 in 2020 to 67 in 2023. These are real results.

But it’s worth noting what these numbers represent. The AI that identified these candidates operated in an exploratory, unvalidated context. Discovery AI is R&D—iterative, experimental, tolerant of failure, with a different kind of rigor. If a model suggests a compound and it doesn’t work in preclinical testing, you try another. The backstop is built in.

The successful molecules are then tested using traditional validated infrastructure—the same clinical systems (CTMS, EDC, LIMS, safety databases) that have supported drug development for decades. The AI found the candidate; validated systems did the downstream work.

One could reasonably ask whether the AI success stories we’re reading about represent validated AI succeeding in regulated environments, or unvalidated AI succeeding in discovery—with traditional validated systems handling everything that follows. The distinction matters.

Discovery AI operates in unvalidated, experimental settings where failure is expected and tolerated. GxP AI must operate in validated, regulated environments where failure is not tolerated, results need to be predictable, and validation is mandatory. Success in one doesn’t imply readiness for the other, especially when we contemplate more complex kinds of AI.

GxP Is Where It Gets Harder

The harder problem—and the less visible one—is deploying AI in environments that require validated systems. Preclinical testing, Phase 1-4 clinical trials, manufacturing, quality systems. This is GxP territory. This is where the pilot purgatory is.

A 2022 article in ISPE’s Pharmaceutical Engineering observed that “if validation is not considered from the beginning, there is considerable risk for AI-based digital pilots to get stuck in the pilot phase and not move on to operations.” This isn’t new wisdom. Early in my consulting career, when talking to clients about validating their systems during implementation, I’d tell them: “Validation is the first thing I think of in the morning and the last thing I think of at night.” If you don’t put it at the center of everything, it will derail your project every time.

Manufacturing AI successes do exist. Companies have achieved yield improvements, deviation reductions, predictive maintenance gains. But these applications tend to operate in monitoring and advisory capacities—closer to advanced statistical process control than adaptive learning systems. The human remains in the loop. The AI recommends; the operator decides.

It seems likely that the harder problem—truly adaptive AI operating within validated GxP environments—is where progress has been slower. Though by its nature, pilot purgatory doesn’t generate published data. Projects that stall quietly die. We don’t read case studies about them.

The Data Problem Is Foundational

RAND’s research points to dirty data as a primary failure mode. Separately, ISPE has emphasized that validation must be considered from the beginning. To those two factors, I’d add, based on my experience: lack of accurate workflow documentation and a knowledge gap amongst the team—in validation generally, but especially in AI. Computer Software Assurance doesn’t alleviate this; it assumes a baseline competency that may not exist. It requires it as a prerequisite to do critical thinking and risk management effectively.

The data challenge is particularly acute in life sciences. Data collected for regulatory compliance is not the same as data structured for machine learning. Your batch records were designed to demonstrate that you followed the procedure, not to train a model on process optimization. Your LIMS data captures what the SOP required you to capture, which may or may not be what a model needs to learn from.

Compliance processes—including CSA—focus on critical data elements. That’s appropriate for regulatory purposes. But AI models may depend on fields that were never classified as critical, never prioritized for quality, never subject to the same scrutiny. We don’t always know what the model needs beyond the CDEs to perform its intended use. The data that matters for compliance and the data that matters for the model may not be the same—and we may not discover the gap until the model underperforms.

If your training data is inconsistent, your production performance will be unpredictable. This comes before governance frameworks, before validation methodology, before debates about adaptive versus static systems. Without clean, well-characterized data, nothing else matters.

Where Life Sciences Companies Add Unique Value

Here’s where I’ll offer an opinion, clearly marked as such.

The highest-value contribution a life sciences company can make to an AI initiative isn’t building the model. It’s ensuring the model has what it needs to succeed: clean, well-characterized data; clear definition of the problem; domain expertise to evaluate outputs; understanding of what failure looks like in context.

Developers can build sophisticated models. What they can’t do is know which data fields are actually reliable versus which are checkbox compliance. They can’t understand why Protocol A and Protocol B produce different results when they shouldn’t. They can’t recognize when an output is technically possible but biologically implausible. That’s domain expertise. That’s what pharma and biotech uniquely contribute.

In my experience, the companies that struggle most are the ones trying to become something they’re not. Software development is a different discipline with different rhythms—continuous improvement, rapid iteration, tolerance for failure. That’s not how life sciences organizations operate, and for good reason. Trying to build internal AI development capabilities from scratch may lead to the failure modes RAND documented: data scientists who don’t understand the domain, projects that stall, institutional knowledge that walks out the door.

The alternative is to adopt AI as a component in systems you already use. Your LIMS vendor adds AI-assisted anomaly detection? Use it. Your QMS vendor adds NLP for deviation classification? Configure it. Learn what it does, what it gets wrong, where the thresholds are. That’s how your organization builds the muscle memory—understanding what AI outputs actually look like, how to evaluate confidence scores, when to override, what governance means in practice—before you need it at scale.

This isn’t a lesser role. It’s the role that determines whether the project works as you intended. It’s core to the collaboration; developers need you to do that part. They can’t do it themselves.

Which is why it’s troubling to read about substantial layoffs at leading life sciences companies—often it seems for the purpose of investing in AI-driven efficiency. The people caught in those “down-sizings” are the ones who know whether the data is good, or if the model is solving the right problem, or whether the outputs make sense. They’re the best source for curating the data, defining success, and validating outputs. In organizations lacking effective documentation, RAND observed, “loss of a data engineer means no one knows which datasets are reliable or how meaning shifted over time.”

Conclusion

There’s no formula—not yet. What you can do is reduce the unknown unknowns: understand your data’s limitations before you train on it, retain the people who know what “correct” looks like, define intended use before you build, document the workflows you’re trying to improve. None of this guarantees success. It moves risks from things you didn’t see coming to things you can manage. That’s the best anyone can do right now.

We’re in early days. The data we have on AI success and failure is fragmentary and skewed toward published successes. The regulatory landscape is evolving. The technology is moving faster than our ability to evaluate it.

What I’ve offered here is pattern recognition from watching this domain through previous waves of ground-breaking innovation over 25 years—not proof, but informed observation. The failure rates are real. The data challenges are not specific to AI; they’ve been documented for quite some time. Same with workflows, intended use, and underestimating the complexity of new solutions. The gap between discovery AI and production AI is real. How to address it is not obvious—but gaining clarity on the contributing factors is a necessary first step.

To be clear: I’m fully behind AI. I believe it will contribute to significant improvements in patient safety and health outcomes. But it’s a long row to hoe—and the work starts with the fundamentals, before the models.

References and Additional Reading

RAND Corporation. “What Factors Cause AI Projects to Succeed or Fail?” RR-A2680-1, 2024. https://www.rand.org/pubs/research_reports/RRA2680-1.html

Jayatunga et al. “How successful are AI-discovered drugs in clinical trials? A first analysis and emerging lessons.” Drug Discovery Today, April 2024. https://www.sciencedirect.com/science/article/pii/S135964462400134X

Erdmann et al. “AI Maturity Model for GxP Application: A Foundation for AI Validation.” Pharmaceutical Engineering, March/April 2022. https://ispe.org/pharmaceutical-engineering/march-april-2022/ai-maturity-model-gxp-application-foundation-ai

ISPE. “GAMP Guide: Artificial Intelligence.” July 2025. https://ispe.org/publications/guidance-documents/gamp-guide-artificial-intelligence

McKinsey & Company. “Generative AI in the pharmaceutical industry: Moving from hype to reality.” January 2024. https://www.mckinsey.com/industries/life-sciences/our-insights/generative-ai-in-the-pharmaceutical-industry-moving-from-hype-to-reality

ISPE. “New GAMP Guide Addresses Challenges Posed by AI-Enabled Computerized Systems.” Pharmaceutical Engineering, 2025. https://ispe.org/pharmaceutical-engineering/ispeak/new-gampr-guide-addresses-challenges-posed-ai-enabled

Patel et al. “Regulatory Perspectives for AI/ML Implementation in Pharmaceutical GMP Environments.” PMC, 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12195787/

ISPE. “GAMP RDI Good Practice Guide: Data Integrity by Design.” October 2020. https://ispe.org/publications/guidance-documents/gamp-rdi-good-practice-guide-data-integrity-design