RAG: Retrieval-Augmented Generation for Regulated Environments

Part 2 of AI Foundations for Life Sciences

In the taxonomy article, we identified hallucination as a predictable failure mode of LLMs — they generate plausible text without inherent mechanisms for truth verification. We mentioned RAG (Retrieval-Augmented Generation) as the most common architectural response to this problem, but didn’t explain the mechanics.

This article — and the others that follow in this series — flesh out concepts the taxonomy introduced. The term “RAG” will come up in vendor conversations, product evaluations, and audit discussions. It’s important to understand what it is, how it works, where it fails, and what it means for validation. That’s what this article covers.


Where RAG Fits in the Taxonomy

In the previous article, we covered seven categories of AI — from rule-based systems through agentic AI. Each category describes what type of AI you’re dealing with: how it learns, how it produces outputs, how predictable its behavior is.

RAG (Retrieval-Augmented Generation) is different. It’s not a category — it’s a design decision about how to feed information to certain types of AI.

A simpler framing

The taxonomy tells you what kind of engine you have. RAG tells you where the engine gets its fuel.

Categories 1–4 operate on logic that is either explicitly configured by humans or embedded at build or training time. In all cases, behavior is constrained to executing that logic — not generating novel content.

LLMs and Generative AI (Categories 5, 6, and 7) are different. They generate novel content — text, images, code — based on massive training datasets. But training data has limits: it’s frozen in time, may contain errors, and can’t cover everything. When these models encounter gaps, they don’t stop. They fill in. That’s hallucination.

RAG addresses this by giving the model access to a curated knowledge base at the moment it generates a response. Before answering, the system retrieves relevant documents and includes them in the prompt. The model generates based on its training plus the retrieved content.

Why only Categories 5, 6, and 7?

RAG is most relevant for LLMs/Generative AI (Category 5), Fine-tuning (Category 6), and Agentic AI (Category 7) — all of which generate new text and can hallucinate. Agentic systems often combine RAG with tool use and multi-step reasoning, but the RAG component addresses the same core problem: grounding generated content in verified sources.

Categories 1-4 do not generate open-ended content. They classify, predict, extract, or route. Retrieval may inform decisions in these systems, but it does not constrain generation — because there is none. A model that flags potential adverse events doesn’t write prose — it returns a prediction. There is no generation step to augment.

RAG exists because generative models are designed to produce novel outputs. In those systems, the retrieval step constrains what the model draws from during generation. Without generation, there is none to constrain.

RAG isn’t the only approach

RAG is one of several architectural approaches to constraining hallucination. Others — fine-tuning, guardrails, constrained generation, prompt engineering — exist and matter. We’ll cover them in a future article. For now, we’re focused on RAG because for regulated environments, it offers the best balance: you control the knowledge base, outputs are traceable to sources, and you don’t need ML expertise to maintain it. It’s not the only option, but it’s the one you’ll encounter most often in vendor products — and the one that requires the most understanding to validate properly.

What “architectural” means

Architecture refers to how the system is structured — how components connect, what happens before and after the model generates output. An architectural response is a design decision about the system around the model, not a change to the model itself.

RAG is architectural because it modifies the information flow — retrieval happens before generation, augmentation shapes what the model sees — but the model itself is unchanged. Same LLM. Different plumbing.

The acronym unpacked

This table describes each term of the acronym as a progression of steps.

Three steps. Each introduces distinct failure modes. The rest of this article unpacks them.


What About the AI Tools You Already Use?

If you’re using Claude, ChatGPT, or similar tools, you’re using a base LLM — not RAG.

When you upload a document to Claude, it goes into the context window. The model sees it directly, like handing someone a paper to reference while they talk to you. There’s no retrieval step — the document is just there, in full, for the duration of the conversation.

When Claude uses web search, that’s tool use — the model retrieves external information and incorporates it. But there’s no persistent knowledge base, no vector database, no curated corpus that gets searched before every response.

RAG is different. Documents are pre-processed, chunked, embedded, and stored. When you ask a question, the system searches that corpus, retrieves relevant passages, and includes them in the prompt — before the model generates anything. You don’t upload documents in the moment. The knowledge base exists independently and is maintained over time.

The distinction matters because RAG introduces infrastructure you can govern — knowledge base contents, retrieval logic, source traceability. A base LLM, even with file uploads, doesn’t have that layer.


The Core Concept: Open-Book vs. Closed-Book

A standard LLM is a closed-book exam. The model generates responses based entirely on patterns learned during training. It has no access to external information at inference time. If the answer isn’t encoded in the model’s weights — or if the model’s training data is outdated, incomplete, or wrong — the output reflects that. The model doesn’t know what it doesn’t know. It generates anyway.

RAG turns this into an open-book exam. Before generating a response, the system retrieves relevant documents from a knowledge base and includes them in the prompt context. The model generates based on both its training and the retrieved content. The answer is grounded in source material the system can point to.

This doesn’t make the model smarter. It gives the model better inputs. This distinction—better inputs, not a smarter model—comes up repeatedly, because it’s the most common misconception about RAG.

GxP Analogy: The difference between an operator answering from memory versus an operator referencing the SOP before responding. Same person, different reliability — because one approach includes verification against authoritative sources.

What RAG adds (infrastructure, not model)

None of this is in the LLM. It’s built around it. The model may be identical in both cases — what differs is the plumbing.


The Three Steps

RAG isn’t a single technology. It’s an architecture pattern with three distinct phases, each with its own potential failure points.

1. Retrieval

User query triggers a search against a knowledge base — typically a vector database where documents have been pre-processed into numerical representations (embeddings). The system identifies documents or passages semantically similar to the query and retrieves them.

What can go wrong:

  • Relevant documents exist but aren’t retrieved (retrieval miss)
  • Irrelevant documents are retrieved (retrieval noise)
  • The right document exists but the relevant passage isn’t surfaced (chunking problem)
  • Query interpretation differs from user intent (semantic gap)

2. Augmentation

Retrieved content is assembled and inserted into the prompt sent to the LLM. This is the “context” the model uses to generate its response. The quality of augmentation depends on what was retrieved, how it’s formatted, and whether it fits within the model’s context window.

What can go wrong:

  • Too much content retrieved, exceeds context window, information truncated
  • Retrieved content is included but not well-organized for the model to use
  • Critical information is present but buried among less relevant passages
  • Context window limits force trade-offs between breadth and depth

3. Generation

The LLM produces output based on the augmented prompt — its training plus the retrieved context. If retrieval and augmentation worked well, the output reflects source material. If they didn’t, the model may fall back on training alone or blend retrieved content with hallucinated material.

What can go wrong:

  • Model ignores retrieved content, generates from training anyway
  • Model blends retrieved facts with hallucinated additions
  • Model misinterprets or misrepresents retrieved content
  • Output appears authoritative but source doesn’t support the claim

Why RAG Matters for Regulated Content

RAG introduces something LLMs alone don’t have: a potential audit trail.

In principle, you can trace an output back to source documents. You can ask: what did the system retrieve? Did the retrieved content support the generated output? Was the source authoritative and current?

This matters in GxP contexts because:

  • Traceability — you can verify whether outputs are grounded in approved sources
  • Source control — you control the knowledge base, not the model vendor
  • Currency — you can update the knowledge base without retraining the model
  • Scope constraints — the system only “knows” what you’ve given it access to

But traceability only works if the architecture exposes it. A RAG system that doesn’t surface retrieved sources, or that blends retrieval with open-ended generation, loses the audit trail advantage.

GxP Analogy: The difference between a system that shows you the referenced SOP section and one that just gives you an answer. Both might be correct. Only one is verifiable.


RAG in Practice: ValKit.ai

ValKit is a GxP validation platform that uses RAG to generate validation documentation — test scripts, requirements, risk assessments — grounded in each customer’s own data. It illustrates one way RAG can be implemented in a regulated context. (Note: I’m a member of Valkit.ai’s Advisory Board.)

How it implements the three steps

What’s validation-relevant

  • Isolated knowledge bases — Each customer’s data stays within organizational boundaries. No cross-contamination between clients. The system only “knows” what that organization has given it.
  • Human-in-the-loop — All outputs are designed for review, not autonomous use. Engineers verify, adjust, approve. The AI drafts; humans finalize.
  • Source grounding — Generated content traces to retrieved documents. You can ask: where did this come from?

What it illustrates about RAG generally

The value isn’t that the LLM is smarter. It’s that the LLM has access to your information at the moment it generates. Source quality, retrieval scope, and human oversight determine whether outputs are defensible — not the model itself.

This doesn’t eliminate the failure modes discussed below. Retrieval can still miss. Generation can still drift. But the architecture makes those failures auditable in ways that pure LLM outputs aren’t.


Failure Modes: Inherent, Not Implementation Defects

These aren’t bugs in specific products. They’re characteristics of the architecture. Any RAG implementation must account for them.

Retrieval Misses

The knowledge base contains the right information, but retrieval doesn’t surface it. Embeddings don’t capture the semantic relationship between query and content. The system answers without the relevant source — functionally equivalent to hallucination from the user’s perspective.

Retrieval Noise

Irrelevant or marginally relevant content is retrieved and included. The model generates based on sources that don’t actually address the query. Output may be well-written and internally consistent but unresponsive to the actual question.

Generation Drift

Even with good retrieval, the model may drift from retrieved content during generation. It may add qualifications, examples, or extensions that aren’t in the source material. The output looks grounded but includes unsupported claims.

Context Window Limits

Models have fixed context windows. If retrieval returns more content than fits, something gets cut. The most relevant passage may be truncated. The system doesn’t warn you — it just works with what fits.

Source Quality

RAG is only as good as the knowledge base. Outdated SOPs, superseded guidance documents, draft content accidentally indexed — the system retrieves what’s there. Garbage in, garbage out, but now with the appearance of authoritative sourcing.

Recency Gaps

Knowledge bases require maintenance. New documents must be ingested, old ones retired. If the knowledge base isn’t current, retrieval surfaces outdated information. The model doesn’t know the document is stale.


Validation Considerations: Questions to Ask

This isn’t a validation protocol. It’s a way to expose where traditional validation assumptions no longer apply—and a starting point for scoping what validation needs to address.

Knowledge Base Governance

  • What sources are included? Who decides?
  • How are documents ingested, chunked, embedded?
  • What’s the update cadence? Who maintains currency?
  • How are retired or superseded documents handled?
  • Is there version control for the knowledge base itself?

Retrieval Performance

  • How is retrieval accuracy measured?
  • What’s the miss rate? The noise rate?
  • How does the system handle ambiguous queries?
  • Is there a confidence threshold below which retrieval is flagged?

Output Traceability

  • Does the system surface retrieved sources with each output?
  • Can you verify that generated content is supported by retrieved sources?
  • Is there logging of what was retrieved vs. what was generated?

Human Review Requirements

  • What review is required before outputs affect regulated processes?
  • How are reviewers trained to spot generation drift?
  • Is there a mechanism to flag outputs for secondary review?

Change Control

  • What triggers revalidation? Knowledge base update? Model update? Embedding model change?
  • How are changes to retrieval logic handled?
  • Is there version control that allows rollback?

What This Enables

Terminology without application is trivia. The point of understanding RAG — and the taxonomy behind it — is to do something with it.

If you can’t distinguish RAG from a plain LLM, you’re at the mercy of vendor marketing. If you don’t understand retrieval, augmentation, and generation as separate steps, you can’t assess where failures occur. If you don’t know the failure modes, you can’t scope validation or ask the right questions.

Understanding the architecture enables:

  • Recognizing whether a vendor actually has RAG or is just wrapping an LLM
  • Knowing what to ask about knowledge base governance, retrieval performance, and output traceability
  • Assessing whether the system supports your validation requirements or leaves you exposed
  • Cutting through “AI-powered” marketing to figure out what you’re actually buying

We’ll cover vendor evaluation in depth in a later piece — what to ask, what to look for, what the red flags are. But that conversation starts with the conceptual foundation laid here and in the taxonomy article.

The concepts aren’t the destination. They’re the prerequisite.


Series Context

This article is part of the AI Foundations series — a level-setting effort to build the conceptual vocabulary for AI in regulated environments. The taxonomy article provided the map of AI categories. This article went deeper on RAG — the most common architectural pattern for constraining hallucination in generative AI.

Next in This Series

Before diving deeper into generative AI and advanced categories, we need to establish the baseline. The next article covers rule-based systems—Category 1 in the taxonomy.

Rule-based systems are deterministic, familiar to validation professionals, and represent the simplest form of AI. They’re everywhere in life sciences: deviation routing, eligibility screening, data quality checks. Most aren’t recognized as “AI” because they don’t feel magical—but from a validation perspective, they matter.

Understanding how to validate deterministic decision logic sets the foundation for everything more complex: machine learning, NLP, and the other approaches to constraining LLM outputs we’ve discussed here. We’re building the taxonomy from the ground up—starting with what’s simplest, not what’s newest.

After rule-based systems, we’ll cover machine learning fundamentals, NLP in life sciences contexts, and eventually return to the frontier: agentic AI, where systems don’t just respond but operate autonomously. That’s where validation frameworks haven’t caught up yet—but understanding the categories in production now prepares you for what’s coming.

About Driftpin

Kevin Shea is Owner and Principal at Driftpin Consulting, a life sciences technology company focused on applying technology to improve outcomes, safety, and data integrity in regulated environments. He serves on the advisory board at ValKit.ai. With 25+ years spanning pharma, biotech, CROs, and software vendors, he works with both tech vendors and their customers. AI is advancing fast and can add enormous value — but only if applied judiciously, pragmatically, and in a validated manner. These articles are meant to engender conversation and hopefully move our collective thinking forward.