Lesson 4 - Grounding and Citations

Welcome to Grounding and Citations

Your pipeline works: it retrieves the right documents and the model answers from them. But “works” and “trustworthy” are different things. A real product gets asked questions your data can’t answer, and it gets used by people who need to verify what the model told them before they act on it. The two techniques in this lesson — grounding and citations — are what close that gap. Grounding forces the model to answer only from the context you supply, and to admit when the answer isn’t there instead of inventing one. Citations make the model tell you which retrieved facts it used, so a person (or a UI) can trace any answer back to its source. Together they are the line between a demo you show colleagues and a system you put in front of users.

This lesson is short on theory and heavy on real output. Every answer below was produced by the live Claude API and pasted in unchanged.

By the end of this lesson, you will be able to:

  • Write a grounding instruction that ties the model to the retrieved context
  • Make the model decline (“I don’t know”) when the answer isn’t in the data
  • Number your retrieved chunks and get the model to cite the sources it used
  • Return the source documents alongside the answer so a UI can show provenance

Let’s make your RAG system honest.


Grounding: Answer Only from the Context

Grounding is a single instruction with an outsized effect. You tell the model, in plain language, to answer using only the context you provide — not its own training. Without that line, the model treats your retrieved documents as a suggestion and freely mixes in whatever it already “knows,” which is exactly where hallucinations come from.

We’ll work with a tiny labeled knowledge base — three facts about a notes app — so the behavior is easy to see. We number each fact now because we’ll need the numbers for citations in a moment:

facts = [
    "Deleted notes stay in Trash and are recoverable for 30 days before being permanently removed.",
    "The Pro plan costs 8 dollars per month and includes unlimited notes and 50 GB of storage.",
    "Notes sync automatically across devices within a few seconds when you are online.",
]

def build_prompt(question, facts):
    context = "\n".join(f"[{i+1}] {f}" for i, f in enumerate(facts))
    return (
        "Answer using ONLY the numbered context below. "
        "Cite the sources you use with their numbers in square brackets, like [1]. "
        "If the answer is not in the context, say you don't know.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}"
    )

print(build_prompt("How long can I recover a deleted note?", facts))
Answer using ONLY the numbered context below. Cite the sources you use with their numbers in square brackets, like [1]. If the answer is not in the context, say you don't know.

Context:
[1] Deleted notes stay in Trash and are recoverable for 30 days before being permanently removed.
[2] The Pro plan costs 8 dollars per month and includes unlimited notes and 50 GB of storage.
[3] Notes sync automatically across devices within a few seconds when you are online.

Question: How long can I recover a deleted note?

Notice the three jobs that one instruction is doing: bind the answer to the context, ask for citations, and authorize a refusal. Now send it to Claude:

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from your environment

def ask(question):
    prompt = build_prompt(question, facts)
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=150,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

print(ask("How long can I recover a deleted note?"))
According to the context, deleted notes stay in Trash and are recoverable for 30 days before being permanently removed. [1]

The answer is correct, drawn straight from fact [1], and the model is staying inside the lines you drew. That is grounding doing its job.


Declining: Saying “I Don’t Know”

The real test of a grounded system is a question it can’t answer. Your three facts say nothing about refunds. A bare, ungrounded model would happily improvise a refund policy that sounds plausible and is completely made up. A grounded one declines. Watch what happens when we ask something outside the context:

print(ask("Can I get a refund if I cancel my subscription?"))
I don't know. The provided context doesn't contain information about refund policies for subscription cancellations.

This refusal is the single most valuable behavior in applied RAG. The model didn’t guess. It recognized that the answer wasn’t in the supplied facts and said so, even naming exactly what was missing. In a support bot, this is the difference between “I don’t have that information” (fine — the user escalates to a human) and a confidently invented policy that creates a real-world problem.

Grounding works because of two cooperating phrases in your instruction: “answer using ONLY the numbered context” tells the model where its knowledge ends, and “if the answer is not in the context, say you don’t know” gives it explicit permission to decline. Drop either one and the model drifts back toward filling the gap from training. Keep both, and “I don’t know” becomes a feature instead of a failure.


Citations: Cite the Sources You Used

Grounding makes the answer correct; citations make it checkable. By numbering the chunks and asking the model to mark which numbers it used, you give every answer a trail back to the exact fact it came from. You already wired this up — the [i+1] numbering in build_prompt and the “cite the sources… like [1]” instruction. Here it is paying off on a different question:

print(ask("How much does the Pro plan cost and what do I get?"))
The Pro plan costs $8 per month and includes unlimited notes and 50 GB of storage [2].

The [2] is not decoration. It tells you, and your users, that this answer rests on fact [2] and nothing else. If that fact is wrong or out of date, you know precisely which document to fix. If a user doubts the answer, they can click [2] and read the source themselves. Citations turn the model from an oracle you have to trust into a research assistant whose work you can audit — which is exactly what a non-technical user needs before they’ll act on an answer.

A few practical notes. Citation numbers refer to the position in the prompt, so keep your numbering consistent between the context you build and any source list you display. The model cites the chunks it actually used, which is usually a subset of what you retrieved — that’s normal and useful, because it tells you which retrieved documents were relevant. And citations stack with grounding: when the model has nothing to cite, the same instruction makes it decline rather than cite a source that doesn’t support the claim.


Returning the Sources Alongside the Answer

The [2] marker is only useful if the user can see what [2] is. In a product, you return the source documents next to the answer so the UI can render “where this came from” — a footnote, a hover card, a list of links. The pattern is simple: package the answer and the retrieved chunks together so the caller has both.

def answer_with_sources(question, facts):
    answer = ask(question)
    return {"answer": answer, "sources": facts}

result = answer_with_sources("How long can I recover a deleted note?", facts)
print("Answer:", result["answer"])
print("\nSources:")
for i, src in enumerate(result["sources"]):
    print(f"  [{i+1}] {src}")
Answer: You can recover a deleted note for 30 days before it is permanently removed [1].

Sources:
  [1] Deleted notes stay in Trash and are recoverable for 30 days before being permanently removed.
  [2] The Pro plan costs 8 dollars per month and includes unlimited notes and 50 GB of storage.
  [3] Notes sync automatically across devices within a few seconds when you are online.

Now the [1] in the answer lines up with [1] in the sources list, and a front end has everything it needs: the natural-language answer, the citation markers inside it, and the full text of each source to display. In a real app you’d typically return only the chunks you retrieved for this question (with their original document IDs from Chroma) rather than the whole knowledge base, so the source panel shows just the handful of documents the answer was built from.

Citations build trust; grounding limits — but doesn’t erase — hallucination

Grounding sharply reduces hallucination, but it doesn’t fully eliminate it: a model can still misread a chunk or stretch a fact slightly beyond what it says. That’s exactly why citations matter. When every claim points back to a numbered source the user can read, a wrong or exaggerated answer becomes visible and correctable instead of silently authoritative. Showing your sources is the safety net under grounding.


Practice Exercises

Exercise 1: Force a refusal

Using the three facts above, write a question whose answer is genuinely not in the context (for example, about exporting notes to PDF). Predict what the grounded model will do, then describe how that behavior protects your users compared to an ungrounded model.

Hint

None of the three facts mention exporting, so the “if the answer is not in the context, say you don’t know” instruction takes over and the model declines — as it did for the refund question. An ungrounded model would invent an export procedure that sounds right but isn’t, which is far more dangerous than an honest “I don’t know.”

Exercise 2: Predict the citation

You ask, “Do my notes update on my phone and laptop together?” Which numbered fact should the answer cite, and what would the citation marker look like in the response?

Hint

Fact [3] covers automatic syncing across devices, so a grounded answer would describe near-instant syncing while online and end with the marker [3]. If the model cited any other number, that would be a signal your retrieval or numbering is off.

Exercise 3: Wire up the source panel

Imagine the answer comes back as “The Pro plan costs $8 per month and includes unlimited notes and 50 GB of storage [2].” Describe what your UI should display so a user can verify the [2], and why returning only the retrieved chunks (not the whole knowledge base) is better in production.

Hint

The UI should render the answer with [2] as a clickable marker that reveals the text of source [2] — ideally with its original document ID. Returning only the retrieved chunks keeps the source panel short and relevant, avoids leaking unrelated documents, and makes the numbers in the answer line up cleanly with what the user sees.


Summary

A RAG pipeline that retrieves and generates is only half a product; making it trustworthy is the other half. Grounding ties the model to the retrieved context and gives it permission to decline — so it answers “30 days [1]” when the fact is present and “I don’t know” when it isn’t, instead of hallucinating. Citations number the retrieved chunks and have the model mark which ones it used, turning every answer into something a user can verify. Returning the sources alongside the answer gives a UI what it needs to show provenance. You saw all three on real Claude output: a grounded cited answer, an honest refusal on an out-of-context question, and an answer packaged with its sources.

Key Concepts

  • Grounding — instructing the model to answer using only the supplied context, so it stays accurate.
  • Declining — letting the model say “I don’t know” when the context can’t support an answer, instead of guessing.
  • Citations — numbering retrieved chunks and having the model cite the source numbers (e.g. [2]) it used.
  • Returning sources — sending the retrieved documents back with the answer so a UI can show where the answer came from.

Why This Matters

The moment real users touch a RAG system, two things happen: they ask questions your data can’t answer, and they need to trust what it tells them before they act. Grounding handles the first by making refusal a built-in behavior; citations and returned sources handle the second by making every answer auditable. This is precisely what separates a slick demo from a system you can responsibly deploy — and it’s the standard pattern behind every credible “chat with your documents” product. Master it and your RAG apps stop merely answering and start earning trust.


Next Steps

Continue to Lesson 5 - Guided Project: Documentation Q&A Bot

Put it all together: build a documentation Q&A bot that retrieves, grounds, cites its sources, and declines when it should.

Back to Module Overview

Return to the Retrieval-Augmented Generation module overview


Continue Building Your Skills

You now have the two techniques that make RAG safe to ship: a model that answers only from your data (and admits when it can’t) and answers you can trace back to their sources. In the next lesson you’ll combine everything — retrieval, grounding, citations, and returned sources — into a complete documentation Q&A bot.