React 19 Streaming from FastAPI: Building Truly Progressive UI Without Separate Stream and Non-Stream Endpoints

I've watched teams burn engineering hours duplicating endpoints. /api/generate for buffered responses. /api/generate/stream for streaming. Different error handling, different retry logic, different bugs. It's absurd.

Here's the truth: a single endpoint that always streams is enough. Your React 19 client decides whether to buffer the entire response or render it progressively. One source of truth. One place to fix bugs. One endpoint to monitor.

I built this pattern into CitizenApp because we support users on everything from fiber to 3G. Some want instant full responses. Some want to see Claude thinking in real-time. Same endpoint. Different consumption.

Why This Matters

Most teams default to separate endpoints because they think streaming and buffering are fundamentally different operations. They're not. Streaming is just where you buffer.

Server-side buffering (traditional approach):

Collect entire response → send JSON → client renders instantly
Wastes bandwidth if user leaves early
Feels slower on slower networks
Server holds memory longer

Client-side buffering (our approach):

Stream chunks → client accumulates → render whenever you want
User sees progress immediately
Same network footprint either way
Server frees memory faster

The endpoint doesn't care which strategy you use. It streams. Period.

The FastAPI Endpoint

Here's a single endpoint that powers both streaming and buffered consumption:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
import anthropic
import json
import asyncio

app = FastAPI()

@app.post("/api/generate")
async def generate(request: dict):
    """
    Single endpoint. Always streams via Server-Sent Events.
    Client decides whether to buffer or render progressively.
    """
    prompt = request.get("prompt")
    if not prompt:
        raise HTTPException(status_code=400, detail="prompt required")

    async def event_generator():
        client = anthropic.AsyncAnthropic()
        full_text = ""

        try:
            with client.messages.stream(
                model="claude-3-5-sonnet-20241022",
                max_tokens=1024,
                messages=[
                    {"role": "user", "content": prompt}
                ],
            ) as stream:
                for text in stream.text_stream:
                    full_text += text
                    # SSE format: "data: " prefix + JSON + double newline
                    chunk = {
                        "type": "text_delta",
                        "delta": text,
                        "stop_reason": None
                    }
                    yield f"data: {json.dumps(chunk)}\n\n"

            # Signal completion
            yield f"data: {json.dumps({'type': 'message_stop', 'stop_reason': 'end_turn'})}\n\n"

        except Exception as e:
            yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
        }
    )

That's it. No /stream variant. No buffer mode. One endpoint that streams everything via Server-Sent Events (SSE). The format is dead simple:

type: text_delta = chunk arrived
type: message_stop = done
type: error = something broke

The React 19 Client Hook

Now the client hook that consumes this endpoint. Notice how we support both streaming and buffering without separate API calls:

import { useState, useCallback } from 'react';

type StreamMode = 'streaming' | 'buffered';

interface UseGenerateOptions {
  mode?: StreamMode;
  onChunk?: (chunk: string) => void;
  onComplete?: (fullText: string) => void;
  onError?: (error: string) => void;
}

export function useGenerate({
  mode = 'streaming',
  onChunk,
  onComplete,
  onError,
}: UseGenerateOptions) {
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);

  const generate = useCallback(
    async (prompt: string) => {
      setIsLoading(true);
      setError(null);

      try {
        const response = await fetch('/api/generate', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ prompt }),
        });

        if (!response.ok) {
          throw new Error(`HTTP ${response.status}`);
        }

        const reader = response.body?.getReader();
        if (!reader) throw new Error('No response body');

        let fullText = '';
        const decoder = new TextDecoder();
        let buffer = '';

        while (true) {
          const { done, value } = await reader.read();
          if (done) break;

          buffer += decoder.decode(value, { stream: true });
          const lines = buffer.split('\n');
          buffer = lines.pop() || '';

          for (const line of lines) {
            if (!line.startsWith('data: ')) continue;

            try {
              const event = JSON.parse(line.slice(6));

              if (event.type === 'text_delta') {
                fullText += event.delta;

                // Streaming mode: call onChunk immediately
                if (mode === 'streaming' && onChunk) {
                  onChunk(event.delta);
                }
              } else if (event.type === 'message_stop') {
                // Done
              } else if (event.type === 'error') {
                throw new Error(event.message);
              }
            } catch (e) {
              // Ignore JSON parse errors on partial lines
            }
          }
        }

        // Buffered mode: call onChunk once with full text
        if (mode === 'buffered' && onChunk) {
          onChunk(fullText);
        }

        onComplete?.(fullText);
      } catch (err) {
        const message = err instanceof Error ? err.message : 'Unknown error';
        setError(message);
        onError?.(message);
      } finally {
        setIsLoading(false);
      }
    },
    [mode, onChunk, onComplete, onError]
  );

  return { generate, isLoading, error };
}

Using It in a Component

Here's how you'd use this in a React component. Notice: no different endpoints, no conditional logic in the component:

export function GenerateForm() {
  const [response, setResponse] = useState('');
  const { generate, isLoading } = useGenerate({
    mode: 'streaming', // or 'buffered' based on user preference
    onChunk: (chunk) => setResponse(prev => prev + chunk),
    onComplete: (full) => console.log('Done:', full),
    onError: (err) => console.error('Failed:', err),
  });

  return (
    <div>
      <button 
        onClick={() => generate('Explain React 19 streaming...')}
        disabled={isLoading}
      >
        Generate
      </button>
      <div className="mt-4 whitespace-pre-wrap">
        {response}
      </div>
    </div>
  );
}

Want to toggle between modes based on network speed? Add a preference hook:

function useNetworkMode() {
  const [connection, setConnection] = useState<'4g' | '3g'>('4g');

  useEffect(() => {
    const nav = navigator as any;
    if (nav.connection) {
      const updateMode = () => {
        const effectiveType = nav.connection.effectiveType;
        setConnection(['3g', '2g'].includes(effectiveType) ? '3g' : '4g');
      };
      updateMode();
      nav.connection.addEventListener('change', updateMode);
      return () => nav.connection.removeEventListener('change', updateMode);
    }
  }, []);

  return connection === '3g' ? 'streaming' : 'buffered';
}

Gotcha: SSE and Content-Length

Here's what burned me: I initially didn't set Cache-Control: no-cache on the FastAPI response. Some proxies buffered the entire stream before sending it to the client, defeating the purpose. The headers in the endpoint above fix that.

Also, never add Content-Length to a streaming response. It's impossible to know the length ahead of time, and setting it incorrectly breaks everything.

Why Not WebSockets?

People always ask. WebSockets are overkill here. You're not bidirectional. SSE is simpler, works over HTTP/2, and survives proxies better. Save WebSockets for real-time collaboration.

The Win

One endpoint. One error handler. One place to add retry logic, caching headers, rate limits, or monitoring. Your API surface is minimal. Your cognitive load is lower. Your tests are faster.

This is the pattern I run in production at CitizenApp. It scales to thousands of concurrent streams without duplication headaches.

React 19 Streaming from FastAPI: Building Truly Progressive UI Without Separate Stream and Non-Stream Endpoints

React 19 Streaming from FastAPI: Building Truly Progressive UI Without Separate Stream and Non-Stream Endpoints

Why This Matters

The FastAPI Endpoint

The React 19 Client Hook

Using It in a Component

Gotcha: SSE and Content-Length

Why Not WebSockets?

The Win

Tags

Author

Stats

Published