Build a reliable structured-extraction service
A team is copy-pasting invoices and emails into spreadsheets by hand. You build a service that turns messy free text into clean, validated JSON an LLM can be trusted to produce, every time, or fail loudly.
What you'll build
A small API that extracts structured data from unstructured text using schema-constrained output, with validation, retries, and a fallback when the model is unsure.
See how we teach, before you sign up
You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:
from pydantic import BaseModel, Field
from typing import Optional
class Invoice(BaseModel):
"""Structured invoice. Fields the model can't find stay None, never guessed."""
vendor: str = Field(description="Name of the company that issued the invoice")
invoice_number: Optional[str] = Field(None, description="Invoice id, if present")
total: Optional[float] = Field(None, description="Grand total as a number, no currency symbol")
currency: Optional[str] = Field(None, description="ISO code, e.g. EUR, USD")
issue_date: Optional[str] = Field(None, description="ISO 8601 date, or None if ambiguous")
confident: bool = Field(description="False if any field was guessed or the doc looked out of scope")Reading this file
class Invoice(BaseModel)Defines the exact shape your service promises to return, so every downstream system can rely on the same fields.vendor: strA required field with no default, the model must always supply it or validation fails loudly.Optional[str] = Field(None, ...)Marks a field as genuinely optional so the model returns null instead of inventing a value it cannot find.confident: boolA self-reported uncertainty flag that lets the service route shaky extractions to human review instead of trusting them.
The contract. Optional fields are genuinely optional so the model never has to invent values.
That's 1 of 8 explained code blocks in this single project.
The build, milestone by milestone
- 1
Define the contract
5 guided stepsThe schema is the contract the rest of the system depends on. Nail down required vs. optional fields and the refusal cases now, or you will be debugging "sometimes it returns a string, sometimes null" forever.
- 2
Constrain the model
5 guided stepsFree-text "please return JSON" prompts break in production. Schema-constrained output plus server-side validation is the difference between a demo and something a downstream system can trust.
- 3
Make it robust
5 guided stepsReal inputs are messier than your samples. Bounded retries plus an explicit low-confidence escape hatch are what keep the service from quietly emitting garbage when it is unsure.
- 4
Watch cost & health
5 guided stepsEven a weekend LLM service spends real money per call. If you cannot see token usage and you have never multiplied cost-per-call by expected volume, you ship a service that can quietly run up a bill no one budgeted for.
What's inside when you start
You'll walk away with
This is portfolio-grade. Build it free.
Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.
Start building