AI-Native UX: Streaming, Optimistic UI, and Designing for Uncertainty

On this page

Why AI features break your UX habits
A new mental model: thinking out loud vs the vending machine
The streaming flow, end to end
A streaming chat UI with the Vercel AI SDK
Optimistic UI with rollback
Traditional UX vs AI-native UX
Designing for uncertainty
Accessibility of streaming content
Cancellation and cost
Common mistakes that cost hours (and dollars)
Takeaways
Where to go next

Why AI features break your UX habits

You wire up an LLM endpoint, fire a request, and show a spinner. Eight seconds later a wall of text appears. It works in the demo. Then real users arrive, the spinner feels broken, people click again, costs double, and nobody trusts the answer because they cannot tell where it came from. The model is not the problem here, your interaction model is.

Who this is for

Frontend and full-stack engineers shipping their first (or fifth) LLM-powered feature. You know React. You have an [LLM API working](/blog/working-with-the-llm-api). Now you need a UI that handles latency, non-determinism, and the fact that the model is sometimes confidently wrong.

AI-native UX is the set of patterns that make slow, probabilistic, occasionally-wrong responses feel fast, trustworthy, and in the user's control. Three pillars: stream so latency disappears, give optimistic feedback so actions feel instant, and design for uncertainty so people can verify, edit, and undo.

A new mental model: thinking out loud vs the vending machine

A traditional API call is a vending machine: you pay, you wait, you either get the exact item or an error. An LLM is a person thinking out loud, they start talking before they have finished the thought, they might be wrong, and you can interrupt them.

A colleague starts answering before they have the full thought, words arriving one at a timeToken streaming over SSE, render each chunk as it arrives instead of waiting for the full response

You raise a hand and say 'stop, that is not what I meant'An abort/stop button that cancels the in-flight stream (and the billing)

They cite where they read something so you can checkShowing sources, retrieval chunks, or tool calls alongside the answer

They say 'I think it was Tuesday, but double-check'Confidence signals and edit/undo affordances instead of presenting output as fact

A vending machine: pay, wait, get the item or a flashing errorThe old request/response model, one blocking call, one final payload, a spinner in between

Stop designing AI features like vending machines. Design them like conversations.

The streaming flow, end to end

Before any code, hold the whole pipeline in your head. The browser opens one request to your own server route. Your route calls the model provider, which streams tokens back. Your route relays those tokens to the browser as Server-Sent Events, and the UI appends each chunk to the screen as it lands.

Tokens flow left to right and render progressively; nothing waits for the full answer.

1
User submits
The hook POSTs the full message history to your own route, never call the provider directly from the browser, that leaks your API key.
2
Route opens a model stream
streamText starts the provider call and returns immediately with a readable stream, not a finished string.
3
Tokens relay as SSE
Each model chunk is forwarded to the browser over a long-lived text/event-stream response. One HTTP request, many events.
4
UI appends per chunk
The hook concatenates each delta into the assistant message and re-renders, so words appear as they are generated.
5
Stream closes
On the final chunk the connection ends, the stop button flips back to send, and you persist the completed message.

Why SSE and not WebSockets

Streaming an answer is one-directional server-to-client text, which is exactly what SSE is built for, it is plain HTTP, auto-reconnects, and needs no extra infra. Reach for WebSockets only when you need true bidirectional, low-latency exchange. See [realtime APIs: WebSockets, SSE, and long polling](/blog/realtime-apis-websockets-sse) for the full comparison.

A streaming chat UI with the Vercel AI SDK

The Vercel AI SDK collapses the entire pipeline above into two pieces: a useChat hook on the client and a streamText call in the route. The hook manages the message list, the streaming append, the input state, and the loading flag. Here is a complete client component, including the all-important stop button.

app/chat/Chat.tsx

tsx

'use client';

import { useChat } from 'ai/react';

export function Chat() {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    status,
    stop,
    reload,
    error,
  } = useChat({ api: '/api/chat' });

  const isStreaming = status === 'streaming' || status === 'submitted';

  return (
    <div className="flex flex-col gap-4">
      <div aria-live="polite" aria-atomic="false" className="space-y-3">
        {messages.map((m) => (
          <article key={m.id} data-role={m.role}>
            <span className="text-xs uppercase opacity-60">
              {m.role === 'user' ? 'You' : 'Assistant'}
            </span>
            {/* tokens append here as they arrive */}
            <p className="whitespace-pre-wrap">{m.content}</p>
          </article>
        ))}
      </div>

      {error && (
        <p role="alert" className="text-red-400">
          Something went wrong. <button onClick={() => reload()}>Retry</button>
        </p>
      )}

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything…"
          className="flex-1 rounded border bg-transparent px-3 py-2"
        />
        {isStreaming ? (
          <button type="button" onClick={stop} className="rounded bg-red-600 px-4">
            Stop
          </button>
        ) : (
          <button type="submit" className="rounded bg-amber-600 px-4">
            Send
          </button>
        )}
      </form>
    </div>
  );
}

The route is where tokens are born. streamText kicks off the provider call and hands back a stream you turn straight into a streaming HTTP response. Note it is an Edge-friendly handler returning a Response, not JSON.

app/api/chat/route.ts

typescript

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export const runtime = 'edge';
export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: 'You are concise. If unsure, say so and suggest how to verify.',
    messages,
    // abort propagates: if the client disconnects, stop billing
    abortSignal: req.signal,
  });

  // relays tokens to the browser as an SSE-style stream
  return result.toDataStreamResponse();
}

The system prompt is part of your UX

Telling the model to admit uncertainty ("if unsure, say so") changes what the user sees. Pair it with the security posture from [prompt injection and LLM security](/blog/prompt-injection-and-llm-security), never trust message content as instructions, and never render model output as raw HTML.

Optimistic UI with rollback

Streaming hides latency for the answer. Optimistic UI hides it for the side effects, saving a generated draft, applying an AI edit, accepting a suggestion. The pattern: update local state immediately as if the server already said yes, fire the request in the background, and if it fails, roll the state back and tell the user.

app/notes/useOptimisticSave.ts

tsx

'use client';

import { useState } from 'react';

type Note = { id: string; text: string; aiGenerated: boolean };

export function useOptimisticNotes(initial: Note[]) {
  const [notes, setNotes] = useState<Note[]>(initial);
  const [errorId, setErrorId] = useState<string | null>(null);

  async function acceptSuggestion(note: Note) {
    const snapshot = notes; // keep a rollback point
    setErrorId(null);

    // 1. optimistic: show it instantly
    setNotes((prev) => [...prev, note]);

    try {
      const res = await fetch('/api/notes', {
        method: 'POST',
        body: JSON.stringify(note),
      });
      if (!res.ok) throw new Error('save failed');

      // 2. reconcile with the server's canonical record
      const saved: Note = await res.json();
      setNotes((prev) => prev.map((n) => (n.id === note.id ? saved : n)));
    } catch {
      // 3. rollback + surface the failure
      setNotes(snapshot);
      setErrorId(note.id);
    }
  }

  return { notes, errorId, acceptSuggestion };
}

Always keep the snapshot

The single most common optimistic-UI bug is mutating state in place so there is nothing to roll back to. Capture the pre-change value first, then update. React 19's `useOptimistic` hook automates the snapshot/rollback dance when the mutation lives in a transition.

Traditional UX vs AI-native UX

The shift is not cosmetic. Four assumptions baked into classic web UX break the moment an LLM is in the loop.

Dimension	Traditional	AI-native
Latency	Sub-second; a spinner is fine	Seconds; stream tokens, never block
Determinism	Same input, same output	Same input, varying output, design for variance
Errors	4xx/5xx with clear states	Plausible-but-wrong; needs verification UI
Trust	Implicit; the system is right	Earned; show sources, allow edit/undo

What changes when the response is slow, probabilistic, and fallible.

Designing for uncertainty

The hardest part is not technical. A model that is wrong 5% of the time but presented as authoritative erodes trust faster than one that is wrong 15% of the time but honest about it. Your UI is where that honesty lives.

Three rules for honest AI UX

**Show your work**, surface the sources, retrieval chunks, or tool calls behind an answer so users can verify instead of believing. **Make it editable**, every generated artifact should be tweakable and undoable; AI output is a draft, not a verdict. **Never hide that it is AI**, label generated content clearly; disguising the model as a human is both a trust and a compliance risk.

Structured output deserves special paranoia. When you ask the model for JSON to drive your UI, treat that JSON as hostile input: it can be truncated mid-stream, contain extra prose, or simply omit a required field. Validate before you render.

lib/parseStructured.ts

typescript

import { z } from 'zod';

const Suggestion = z.object({
  title: z.string().min(1),
  confidence: z.number().min(0).max(1),
});

// model output is untrusted: never assume the shape
export function parseSuggestion(raw: string) {
  try {
    const json = JSON.parse(raw);
    const result = Suggestion.safeParse(json);
    if (!result.success) {
      return { ok: false as const, reason: 'invalid shape' };
    }
    return { ok: true as const, value: result.data };
  } catch {
    // partial / malformed during stream, show a 'still thinking' state
    return { ok: false as const, reason: 'incomplete' };
  }
}

Render a low-confidence result with a visible badge, not silently as fact.
If parsing fails mid-stream, keep the skeleton, do not flash an error for a response that is simply not finished.
Give the user a one-click way to regenerate when the structured output is unusable.

Accessibility of streaming content

Text that appears a token at a time is invisible to screen-reader users unless you tell assistive tech to announce it. The tool is aria-live, but the wrong setting is worse than none, assertive on a streaming region interrupts the user on every single token.

Use aria-live="polite", not assertive

Wrap the message list in `aria-live="polite"` with `aria-atomic="false"` so the screen reader queues updates and reads new text after the current utterance, instead of stuttering on every chunk. Announce the start of generation ("Assistant is responding") and completion so users know the boundaries. Keep the stop button reachable and labelled by keyboard at all times.

Cancellation and cost

Every streamed token is a billed token. If a user navigates away or hits stop, an un-cancelled stream keeps generating, and keeps charging you, for the full max-tokens budget. Wiring up abort is both a UX feature and a cost control.

app/chat/useAbortableStream.ts

tsx

'use client';

import { useRef } from 'react';

export function useAbortableStream() {
  const controllerRef = useRef<AbortController | null>(null);

  async function run(prompt: string, onToken: (t: string) => void) {
    controllerRef.current?.abort(); // cancel any prior stream
    const controller = new AbortController();
    controllerRef.current = controller;

    const res = await fetch('/api/chat', {
      method: 'POST',
      body: JSON.stringify({ prompt }),
      signal: controller.signal, // <- propagates the abort to the server
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        onToken(decoder.decode(value, { stream: true }));
      }
    } catch (err) {
      if ((err as Error).name === 'AbortError') return; // expected
      throw err;
    }
  }

  function stop() {
    controllerRef.current?.abort();
  }

  return { run, stop };
}

Abort must reach the provider

Cancelling the browser fetch only stops rendering, the model keeps generating server-side unless you forward the signal. Pass `req.signal` into `streamText` (as in the route above) so the abort propagates all the way to the provider and the meter stops.

Common mistakes that cost hours (and dollars)

A blocking spinner instead of streaming. An 8-second spinner reads as broken; the same 8 seconds with tokens flowing reads as fast. Stream from day one, retrofitting it later means rewriting the data layer.
Trusting structured output blindly. JSON.parse(modelOutput) will crash in production on a truncated or chatty response. Validate with a schema and handle the partial-stream case explicitly.
No stop or cancel. Users feel trapped watching a wrong answer generate, click again out of frustration, and you pay for two full completions. A stop button that aborts the server stream fixes the UX and the bill at once.
Presenting output as fact. No sources, no edit, no confidence, the first hallucination the user catches destroys trust in the whole feature.
aria-live="assertive" on the stream. It interrupts screen-reader users on every token. Use polite, and announce start and end.

Takeaways

The whole article in seven lines

LLM features are conversations, not vending machines, slow, probabilistic, sometimes wrong.
Stream tokens over SSE so latency disappears; a blocking spinner is a UX bug.
The Vercel AI SDK reduces it to `useChat` on the client and `streamText` in the route.
Always ship a stop button, and propagate the abort to the provider to stop the bill.
Use optimistic UI with a captured snapshot so failures roll back cleanly.
Design for uncertainty: show sources, allow edit/undo, never hide that it is AI.
Validate structured output as hostile input, and announce streams politely for screen readers.

Where to go next

AI-native UX sits on top of three layers worth deepening. Get the transport right, get the API call right, and lock down the security boundary, the front-end patterns here only shine when those are solid.

realtime APIs: WebSockets, SSE, and long polling, why SSE is the right transport for one-way token streams.
working with the LLM API, prompts, tokens, and the server-side call that feeds your stream.
prompt injection and LLM security, treat model output and user messages as untrusted before you render them.
Practice the request/response fundamentals in the networking lab so the streaming layer has a solid base.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

AI Engineering

Working with the LLM API

Read

Frontend

What Is a Frontend Engineer?

Read

Frontend

How the Browser Renders a Page

Read