The CRAAP Test and Why AI Should Show Its Work

Somewhere in a school library right now, a student is staring at a source their AI tool called "reliable" — with no explanation of why. They copy it. They move on. They learned nothing about evaluation.

I've been a librarian for over two decades. The CRAAP Test wasn't handed down from on high — it was built by librarians because we kept seeing students get burned by sources that looked fine on the surface. Currency, Relevance, Authority, Accuracy, Purpose. Five questions. No magic.

Most AI tools ignore all of it.

What the CRAAP Test Actually Is

The CRAAP Test is a framework — not a rubric, not a score. You evaluate a source against five criteria:

Criterion	What You're Checking
Currency	How recent is this? Has the information been superseded?
Relevance	Does it actually answer the question, or just vaguely orbit it?
Authority	Who wrote this? What's their expertise? What's their agenda?
Accuracy	Is this verifiable? Are there citations? Are there errors?
Purpose	Is this meant to inform, persuade, sell, or entertain?

Librarians love it because it teaches a process. A student can internalize those five questions and apply them to any source — a Wikipedia article, a government report, a TikTok video, a peer-reviewed journal. That's the whole point: evaluation as a skill, not a verdict.

The Problem With "Trust Me" AI

Here's what most AI research tools do:

You paste a URL or ask a question
The tool says something like "This source is credible" or "Use this"
No breakdown. No evidence. No reasoning.

That's not evaluation. That's a black box giving you a verdict.

Students don't learn anything from that. Worse — they can't tell when the tool is wrong. A source can have outdated information, an unknown author, and a clear bias, and the AI still says "looks good." The student has no framework to push back.

I started seeing this pattern years before generative AI was mainstream. Every year, more students arrived with learned helplessness around sources — they could find things, but they couldn't evaluate them. AI tools are making that worse, not better.

Sabia Librarian's Approach

When I built Sabia Librarian, I started with a question: what would a librarian's evaluation actually look like?

Not a score. Not a thumbs up or down.

A breakdown.

When you evaluate a source with Sabia, you get structured results for every CRAAP criterion — a pass or fail on each point, with evidence and flags. You can see why something scored the way it did. You can see exactly where the authority fell apart or why the currency was a problem.

That transparency is the point.

A student who sees "Authority: FAIL — no author listed, no institutional affiliation" can go find that information. They're learning evaluation in real time. The tool didn't just hand them an answer — it showed them the work.

Why Showing Your Work Matters More Than Being Right

Here's the thing about information literacy: the goal is not to find the right source. The goal is to be able to evaluate any source.

That's the difference between teaching and testing. A good test gives you a score. A good lesson teaches you to ask better questions.

Most AI tools are built like tests — they give you an output and move on. Sabia is built like a lesson. The breakdown is the product, not a side effect.

When I say "showing your work" — I mean it literally. Our methodology page explains exactly how we evaluate each CRAAP criterion. We publish the criteria. We publish the scoring logic. We don't hide it.

We Scored Ourselves 35 Out of 100

This is the part that makes most tech founders uncomfortable.

We ran Sabia Librarian through the same evaluator we built for sources. Our methodology is transparent, but our product is still young — we have known gaps in coverage, known limitations in certain source types, known areas where the model needs more training data.

Our overall score: 35 out of 100.

We published that. Read it yourself — it's on the site, along with every evaluation we've run. We want users to see what we got wrong so they can tell us where we're wrong.

That's not humility theater. It's the same standard I held my students to: show your reasoning, take accountability for your conclusions, stay teachable.