Lesson 3 - Streaming and Server-Sent Events

Welcome to Streaming and Server-Sent Events

Most endpoints you have built so far compute a complete result, then hand the whole thing back at once. That works beautifully for a small JSON object. But what if the response is huge, or — more interestingly — what if it arrives piece by piece over time? Think of a log file being tailed, a long export, a progress bar ticking up, or an AI model producing one token at a time. Waiting for all of it before sending anything means the client stares at a blank screen, and your server has to hold the entire response in memory first.

Streaming flips that around: you send each chunk the moment it is ready. The client starts receiving immediately, and your server never buffers the whole payload. In this lesson you will meet FastAPI’s StreamingResponse and a delightfully simple browser-native protocol called Server-Sent Events (SSE) for pushing live updates.

By the end of this lesson, you will be able to:

  • Explain when streaming a response beats sending it all at once
  • Wrap a generator in StreamingResponse with the right media_type
  • Push live one-way updates using Server-Sent Events and the data: format
  • Use either a sync or an async generator depending on your data source

This builds directly on the async ideas from Lesson 1. Let’s begin.


Why Stream Instead of Sending It All at Once

A normal endpoint follows a “compute everything, then respond” pattern. The server assembles the full body in memory and sends it as one block. For a small result that is perfect. It becomes a problem in two situations:

  • The response is large. A multi-gigabyte file export or a giant CSV would have to sit fully in memory before the first byte leaves the server. That is wasteful and can crash the process.
  • The result is produced incrementally. Logs, progress updates, and AI tokens all become available over time. If you wait for the last piece before sending the first, the client sees nothing until everything is done.

Streaming solves both. Instead of returning a finished value, you hand FastAPI a generator — a function that yields chunks — and FastAPI sends each chunk as it is produced. The benefits are concrete: the server keeps only one chunk in memory at a time, and the client starts receiving (and rendering) almost immediately instead of waiting for the full payload. That earlier first byte is exactly what makes a progress bar feel live or an AI answer feel like it is “typing.”


StreamingResponse Wrapping a Generator

The core tool is StreamingResponse from fastapi.responses. You give it a generator that yields chunks (strings or bytes) and a media_type so the client knows how to interpret the stream.

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

def number_stream():
    for i in range(1, 6):
        yield f"chunk {i}\n"

@app.get("/stream")
def stream():
    return StreamingResponse(number_stream(), media_type="text/plain")

number_stream is an ordinary generator: each yield produces one piece of the response. StreamingResponse consumes that generator and writes each yielded value to the network as it arrives. Let’s call it and inspect what the client receives:

from fastapi.testclient import TestClient

client = TestClient(app)
r = client.get("/stream")
print("content-type:", r.headers["content-type"])
print("body:", repr(r.text))
content-type: text/plain; charset=utf-8
body: 'chunk 1\nchunk 2\nchunk 3\nchunk 4\nchunk 5\n'

The body is exactly the five yielded chunks concatenated, and the media_type you passed became the response’s Content-Type header. Crucially, the generator was never asked to build that whole string in memory — each "chunk {i}\n" was sent on its own, the instant it was yielded. For a five-line demo that is invisible, but swap the loop for “read one line from a 2 GB log file” and the memory savings become the whole point.


Server-Sent Events: Pushing Live Updates

Plain text streaming is fine for files, but for live updates — a progress bar, a notification feed, status messages — there is a purpose-built standard: Server-Sent Events (SSE). It is a simple, one-way channel where the server pushes messages to the client over a single long-lived HTTP response. Browsers support it natively through the EventSource API, so the client side is almost free.

SSE is just streaming with two conventions:

  • The media_type must be text/event-stream.
  • Each message is a line beginning with data: , followed by a blank line — that is, it ends with \n\n. The double newline is what tells the client “this event is complete.”
def sse_events():
    for i in range(1, 4):
        yield f"data: event {i}\n\n"

@app.get("/events")
def events():
    return StreamingResponse(sse_events(), media_type="text/event-stream")

It is the same StreamingResponse you already know — only the chunk format and the media type changed. Here is what the client receives:

r = client.get("/events")
print("content-type:", r.headers["content-type"])
print("body:", repr(r.text))
content-type: text/event-stream; charset=utf-8
body: 'data: event 1\n\ndata: event 2\n\ndata: event 3\n\n'

Each event is a data: line terminated by \n\n. A browser using EventSource would fire a message event for each one as it arrives, handing your JavaScript the text after data: . That is all it takes to drive a live progress bar or a notification toast from the server — no polling, no WebSocket handshake, just an HTTP response that keeps trickling.


Sync or Async — Match the Generator to Your Source

The generator you stream can be synchronous (def + yield, like the examples above) or asynchronous (async def + yield). The rule mirrors Lesson 1: if producing each chunk involves awaiting real I/O — an async database cursor, an async HTTP client, an AI client that streams tokens — use an async def generator so the worker stays free between chunks.

import asyncio
from fastapi.responses import StreamingResponse

async def progress_stream():
    for pct in (25, 50, 75, 100):
        await asyncio.sleep(0)   # stand-in for awaitable work
        yield f"data: {pct}%\n\n"

@app.get("/progress")
async def progress():
    return StreamingResponse(progress_stream(), media_type="text/event-stream")

progress_stream is an async generator: it can await between yields. The await asyncio.sleep(0) here stands in for whatever real awaitable produces the next update — a job querying its own status, say. FastAPI handles async generators just as naturally as sync ones:

r = client.get("/progress")
print("content-type:", r.headers["content-type"])
print("body:", repr(r.text))
content-type: text/event-stream; charset=utf-8
body: 'data: 25%\n\ndata: 50%\n\ndata: 75%\n\ndata: 100%\n\n'

Each progress event is pushed as soon as it is ready. (Order note: the events arrive in yield order — 25, 50, 75, 100.) This is also the exact shape of AI token streaming, which you will build in Lesson 5: an async def generator that awaits the model client and yields each token as a data: event, so the answer appears word by word in the browser. For our Task Manager, the same pattern lets a long-running job stream its progress back to whoever is watching.

SSE is one-way, and TestClient buffers

Server-Sent Events flow in one direction only: server to client. That is perfect for progress, logs, and notifications, but if you need the client to send messages back over the same connection (chat, collaborative editing), reach for WebSockets — the topic of the next lesson. Also note that TestClient collects the full streamed body into .text for convenient assertions, so the examples above show the complete result. In a real browser or HTTP client, each chunk arrives incrementally, the moment it is yielded — which is the entire reason to stream.


Practice Exercises

Exercise 1: Stream a list of lines

You have a list lines = ["alpha", "beta", "gamma"] and want an endpoint /lines that streams each one on its own line as plain text. Sketch the generator and the StreamingResponse.

Hint

Write a generator that loops over lines and yields f"{line}\n" for each, then return StreamingResponse(gen(), media_type="text/plain"). Each yield is sent as its own chunk, so the body comes out as 'alpha\nbeta\ngamma\n'.

Exercise 2: Fix the broken SSE

A developer writes yield f"data: {msg}" (no trailing newlines) and the browser’s EventSource never seems to fire a message event. What is missing?

Hint

Each SSE message must end with a blank line — that is, \n\n. Without the double newline the client can’t tell where one event ends, so it keeps waiting. The fix is yield f"data: {msg}\n\n". The media_type must also be text/event-stream.

Exercise 3: Sync or async generator?

You are streaming tokens from an AI client whose method must be awaited for each token. Should your generator be def or async def, and why?

Hint

Use an async def generator (async def ...: ... yield ...). Because each token comes from an awaitable call, an async generator lets you await between yields without blocking the event loop, keeping the worker free to serve other requests. A plain def generator can’t await, so it wouldn’t be able to call the async client correctly.


Summary

Streaming sends a response as it is produced instead of all at once, which avoids buffering huge payloads and lets the client start receiving immediately. In FastAPI you stream by returning a StreamingResponse wrapped around a generator that yields chunks, paired with a media_type. Server-Sent Events are streaming specialized for live one-way updates: set media_type="text/event-stream" and format each message as data: ...\n\n. The generator can be sync (def) for in-memory or blocking sources, or async (async def) when each chunk comes from awaitable I/O like a database cursor or an AI client.

Key Concepts

  • Streaming — sending a response chunk by chunk as it is produced, not all at once.
  • StreamingResponse — wraps a generator and writes each yielded chunk to the network.
  • media_type — tells the client how to interpret the stream (text/plain, text/event-stream).
  • Server-Sent Events (SSE) — one-way server-to-client push using text/event-stream and data: ...\n\n.
  • Sync vs. async generatorsdef + yield for simple sources, async def + yield for awaitable I/O.

Why This Matters

Streaming is what makes modern APIs feel responsive: progress bars that tick in real time, logs that tail live, large exports that never blow up server memory, and AI responses that appear word by word. SSE gives you that live-update experience with almost no client-side code, riding on plain HTTP. Master this pattern now and the AI token streaming you build in Lesson 5 will feel like a small, familiar variation rather than a new skill.


Next Steps

Continue to Lesson 4 - WebSockets

Open a two-way, real-time channel between client and server — for chat, live collaboration, and anything that needs to talk back.

Back to Module Overview

Return to the Async, Background Work, and Streaming module overview


Continue Building Your Skills

You can now send data the moment it is ready — streaming large responses chunk by chunk and pushing live updates with Server-Sent Events. That covers everything a server needs to talk to a client over time. Next you’ll add the missing direction: WebSockets, a full two-way channel where the client and server can both send messages on the same open connection.