React 19 Streaming from FastAPI: Building Truly Progressive UI Without Separate Stream and Non-Stream Endpoints
I've watched teams burn engineering hours duplicating endpoints. /api/generate for buffered responses. /api/generate/stream for streaming. Different error handling, different retry logic, different bugs. It's absurd.
Here's the truth: a single endpoint that always streams is enough. Your React 19 client decides whether to buffer the entire response or render it progressively. One source of truth. One place to fix bugs. One endpoint to monitor.
I built this pattern into CitizenApp because we support users on everything from fiber to 3G. Some want instant full responses. Some want to see Claude thinking in real-time. Same endpoint. Different consumption.
Why This Matters
Most teams default to separate endpoints because they think streaming and buffering are fundamentally different operations. They're not. Streaming is just where you buffer.
Server-side buffering (traditional approach):
- Collect entire response → send JSON → client renders instantly
- Wastes bandwidth if user leaves early
- Feels slower on slower networks
- Server holds memory longer
Client-side buffering (our approach):
- Stream chunks → client accumulates → render whenever you want
- User sees progress immediately
- Same network footprint either way
- Server frees memory faster
The endpoint doesn't care which strategy you use. It streams. Period.
The FastAPI Endpoint
Here's a single endpoint that powers both streaming and buffered consumption:
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
import anthropic
import json
import asyncio
app = FastAPI()
@app.post("/api/generate")
async def generate(request: dict):
"""
Single endpoint. Always streams via Server-Sent Events.
Client decides whether to buffer or render progressively.
"""
prompt = request.get("prompt")
if not prompt:
raise HTTPException(status_code=400, detail="prompt required")
async def event_generator():
client = anthropic.AsyncAnthropic()
full_text = ""
try:
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": prompt}
],
) as stream:
for text in stream.text_stream:
full_text += text
# SSE format: "data: " prefix + JSON + double newline
chunk = {
"type": "text_delta",
"delta": text,
"stop_reason": None
}
yield f"data: {json.dumps(chunk)}\n\n"
# Signal completion
yield f"data: {json.dumps({'type': 'message_stop', 'stop_reason': 'end_turn'})}\n\n"
except Exception as e:
yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
}
)
That's it. No /stream variant. No buffer mode. One endpoint that streams everything via Server-Sent Events (SSE). The format is dead simple:
-
type: text_delta= chunk arrived -
type: message_stop= done -
type: error= something broke
The React 19 Client Hook
Now the client hook that consumes this endpoint. Notice how we support both streaming and buffering without separate API calls:
import { useState, useCallback } from 'react';
type StreamMode = 'streaming' | 'buffered';
interface UseGenerateOptions {
mode?: StreamMode;
onChunk?: (chunk: string) => void;
onComplete?: (fullText: string) => void;
onError?: (error: string) => void;
}
export function useGenerate({
mode = 'streaming',
onChunk,
onComplete,
onError,
}: UseGenerateOptions) {
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const generate = useCallback(
async (prompt: string) => {
setIsLoading(true);
setError(null);
try {
const response = await fetch('/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const reader = response.body?.getReader();
if (!reader) throw new Error('No response body');
let fullText = '';
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
try {
const event = JSON.parse(line.slice(6));
if (event.type === 'text_delta') {
fullText += event.delta;
// Streaming mode: call onChunk immediately
if (mode === 'streaming' && onChunk) {
onChunk(event.delta);
}
} else if (event.type === 'message_stop') {
// Done
} else if (event.type === 'error') {
throw new Error(event.message);
}
} catch (e) {
// Ignore JSON parse errors on partial lines
}
}
}
// Buffered mode: call onChunk once with full text
if (mode === 'buffered' && onChunk) {
onChunk(fullText);
}
onComplete?.(fullText);
} catch (err) {
const message = err instanceof Error ? err.message : 'Unknown error';
setError(message);
onError?.(message);
} finally {
setIsLoading(false);
}
},
[mode, onChunk, onComplete, onError]
);
return { generate, isLoading, error };
}
Using It in a Component
Here's how you'd use this in a React component. Notice: no different endpoints, no conditional logic in the component:
export function GenerateForm() {
const [response, setResponse] = useState('');
const { generate, isLoading } = useGenerate({
mode: 'streaming', // or 'buffered' based on user preference
onChunk: (chunk) => setResponse(prev => prev + chunk),
onComplete: (full) => console.log('Done:', full),
onError: (err) => console.error('Failed:', err),
});
return (
<div>
<button
onClick={() => generate('Explain React 19 streaming...')}
disabled={isLoading}
>
Generate
</button>
<div className="mt-4 whitespace-pre-wrap">
{response}
</div>
</div>
);
}
Want to toggle between modes based on network speed? Add a preference hook:
function useNetworkMode() {
const [connection, setConnection] = useState<'4g' | '3g'>('4g');
useEffect(() => {
const nav = navigator as any;
if (nav.connection) {
const updateMode = () => {
const effectiveType = nav.connection.effectiveType;
setConnection(['3g', '2g'].includes(effectiveType) ? '3g' : '4g');
};
updateMode();
nav.connection.addEventListener('change', updateMode);
return () => nav.connection.removeEventListener('change', updateMode);
}
}, []);
return connection === '3g' ? 'streaming' : 'buffered';
}
Gotcha: SSE and Content-Length
Here's what burned me: I initially didn't set Cache-Control: no-cache on the FastAPI response. Some proxies buffered the entire stream before sending it to the client, defeating the purpose. The headers in the endpoint above fix that.
Also, never add Content-Length to a streaming response. It's impossible to know the length ahead of time, and setting it incorrectly breaks everything.
Why Not WebSockets?
People always ask. WebSockets are overkill here. You're not bidirectional. SSE is simpler, works over HTTP/2, and survives proxies better. Save WebSockets for real-time collaboration.
The Win
One endpoint. One error handler. One place to add retry logic, caching headers, rate limits, or monitoring. Your API surface is minimal. Your cognitive load is lower. Your tests are faster.
This is the pattern I run in production at CitizenApp. It scales to thousands of concurrent streams without duplication headaches.

