Back to path
MediumWorking system ~17h· 6 milestones

Build and evaluate a RAG assistant over real docs

A support team is drowning in repetitive questions answered in a 400-page docs site. You build an assistant that answers from those docs accurately, cites sources, refuses to hallucinate, and is measurably good.

RAG architectureEmbeddings & vector searchPrompt designPII detectionLLM evaluationContainerizationCost & latency observabilityRunbook / incident response

What you'll build

A retrieval-augmented assistant with a vector index over real documents, grounded answers with citations, PII redaction, and an evaluation harness that scores answer quality.

See how we teach, before you sign up

You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:

docker-compose.ymlyaml
services:
  db:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_PASSWORD: rag
      POSTGRES_DB: rag
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
volumes:
  pgdata:

Reading this file

  • image: pgvector/pgvector:pg16A Postgres image with the vector extension built in, so similarity search works out of the box.
  • POSTGRES_PASSWORD: ragSets the database password via config, fine for local dev but never hardcode real secrets like this in production.
  • ports: - "5432:5432"Exposes the database port to your host so your code can connect to it locally.
  • volumes: - pgdata:...Persists the data to a named volume so your index survives container restarts.

A one-command vector store. Reviewers run `docker compose up -d` and the index has somewhere to live.

That's 1 of 10 explained code blocks in this single project.

The build, milestone by milestone

  1. 1

    Ingest & chunk

    5 guided steps

    Retrieval can only return what you indexed well. Bad chunking (splitting mid-sentence, losing headings) silently caps the ceiling of every answer downstream.

  2. 2

    Embed & index

    5 guided steps

    This is the search engine under the assistant. If retrieval returns the wrong chunks, no amount of prompt cleverness saves the answer.

  3. 3

    Ground the generation

    5 guided steps

    Grounding plus citations is what makes the assistant trustworthy. An answer you cannot trace to a source is just a confident guess wearing a uniform.

  4. 4

    Add safety

    5 guided steps

    A RAG system ingests untrusted documents and handles user data, both are attack surfaces. PII handling and injection defense are table stakes before this touches real users.

  5. 5

    Evaluate honestly

    5 guided steps

    Without evals you cannot tell whether a "fix" helped or hurt. A scored eval set turns RAG tuning from guesswork into engineering.

  6. 6

    Instrument cost, latency & failure

    5 guided steps

    A RAG service spends money and adds latency on every query, and it fails in operator-hostile ways (empty retrieval, provider 429s, a re-index that tanks recall). Observability plus a runbook are what turn a 2am "it's slow and expensive" page into a 5-minute fix.

What's inside when you start

4 starter files, ready to clone
6 guided milestones
6 full reference solutions
10 code blocks explained line-by-line
6 "is it working?" checks
4 interview questions it prepares you for

You'll walk away with

A running RAG service with a documented API
An eval report with before/after scores on held-out questions
A note on chunking, retrieval, and safety trade-offs
A cost/latency report (per-query cost, p95 latency, cache hit rate) with stated budgets
A one-page operational runbook for retrieval/model/cost failures

This is portfolio-grade. Build it free.

Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.

Start building