Back to path
SmallWeekend build ~6h· 4 milestones

Build a reliable structured-extraction service

A team is copy-pasting invoices and emails into spreadsheets by hand. You build a service that turns messy free text into clean, validated JSON an LLM can be trusted to produce, every time, or fail loudly.

Structured output / function callingJSON schema validationPrompt designRetry & fallback logicAPI designToken cost tracking & health checks

What you'll build

A small API that extracts structured data from unstructured text using schema-constrained output, with validation, retries, and a fallback when the model is unsure.

See how we teach, before you sign up

You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:

app/schema.pypython
from pydantic import BaseModel, Field
from typing import Optional


class Invoice(BaseModel):
    """Structured invoice. Fields the model can't find stay None, never guessed."""
    vendor: str = Field(description="Name of the company that issued the invoice")
    invoice_number: Optional[str] = Field(None, description="Invoice id, if present")
    total: Optional[float] = Field(None, description="Grand total as a number, no currency symbol")
    currency: Optional[str] = Field(None, description="ISO code, e.g. EUR, USD")
    issue_date: Optional[str] = Field(None, description="ISO 8601 date, or None if ambiguous")
    confident: bool = Field(description="False if any field was guessed or the doc looked out of scope")

Reading this file

  • class Invoice(BaseModel)Defines the exact shape your service promises to return, so every downstream system can rely on the same fields.
  • vendor: strA required field with no default, the model must always supply it or validation fails loudly.
  • Optional[str] = Field(None, ...)Marks a field as genuinely optional so the model returns null instead of inventing a value it cannot find.
  • confident: boolA self-reported uncertainty flag that lets the service route shaky extractions to human review instead of trusting them.

The contract. Optional fields are genuinely optional so the model never has to invent values.

That's 1 of 8 explained code blocks in this single project.

The build, milestone by milestone

  1. 1

    Define the contract

    5 guided steps

    The schema is the contract the rest of the system depends on. Nail down required vs. optional fields and the refusal cases now, or you will be debugging "sometimes it returns a string, sometimes null" forever.

  2. 2

    Constrain the model

    5 guided steps

    Free-text "please return JSON" prompts break in production. Schema-constrained output plus server-side validation is the difference between a demo and something a downstream system can trust.

  3. 3

    Make it robust

    5 guided steps

    Real inputs are messier than your samples. Bounded retries plus an explicit low-confidence escape hatch are what keep the service from quietly emitting garbage when it is unsure.

  4. 4

    Watch cost & health

    5 guided steps

    Even a weekend LLM service spends real money per call. If you cannot see token usage and you have never multiplied cost-per-call by expected volume, you ship a service that can quietly run up a bill no one budgeted for.

What's inside when you start

4 starter files, ready to clone
4 guided milestones
4 full reference solutions
8 code blocks explained line-by-line
4 "is it working?" checks
4 interview questions it prepares you for

You'll walk away with

A documented extraction API with a schema
A test set showing valid-rate and how failures are handled
A short README on the schema design and the low-confidence/refusal strategy
A one-page cost estimate (cost-per-doc and projected monthly spend) plus a per-request token/cost log

This is portfolio-grade. Build it free.

Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.

Start building