Streaming a pg_restore through Python without deadlocking

You want to restore a database dump that arrived as an upload — a file object, maybe gigabytes — and you don't want to spool the whole thing to disk or load it into memory first. So you reach for the obvious thing: open pg_restore with subprocess, write the dump to its stdin in chunks, capture stderr so you can show the user a real error if it fails.

proc = subprocess.Popen(
    ["pg_restore", "-d", dbname],
    stdin=subprocess.PIPE,
    stderr=subprocess.PIPE,   # so we can report a real error
)
for chunk in iter(lambda: fileobj.read(65536), b""):
    proc.stdin.write(chunk)
proc.stdin.close()
proc.wait()

It works on your test dump. It works in CI. Then someone restores a real database and the process hangs forever — no error, no progress, both sides just... stop. Welcome to the pipe-buffer deadlock, one of the most reliable ways to wedge a subprocess pipeline.

Why it hangs

A pipe is a fixed-size OS buffer — classically around 64KB. When it fills up, whoever's writing to it blocks until someone reads the other end.

Now trace the standoff:

You're busy in a loop writing the dump to the child's stdin.
pg_restore is chewing through it and writing progress and warnings to stderr — which you said you'd capture (stderr=PIPE) but aren't reading yet, because you're stuck in step 1.
The stderr pipe fills to 64KB. pg_restore now blocks on its next write to stderr.
Blocked on stderr, pg_restore stops reading its stdin.
With nobody draining stdin, your stdin.write() fills its pipe and blocks too.

Both processes are now asleep waiting for the other to move. Nothing times out. Nothing errors. It just hangs. And it only shows up on dumps big or chatty enough to overflow that one buffer — which is exactly why it sails through testing and dies in production.

The usual advice is "use communicate()," and communicate() does solve the deadlock — by reading stdout/stderr on threads for you. But it wants you to hand it the entire input at once, which buffers your whole multi-gigabyte dump in memory. That's the thing we were specifically trying not to do. The moment you want to stream stdin and capture stderr, you're back to handling the pipes yourself.

The fix: drain stderr on a thread

You don't need a bigger buffer. You need someone reading stderr the whole time you're writing stdin. So give that job to a thread:

err_chunks: list[bytes] = []
drainer = threading.Thread(
    target=lambda: err_chunks.extend(iter(lambda: proc.stderr.read(8192), b"")),
    daemon=True,
)
drainer.start()

with proc.stdin as stdin:
    for chunk in iter(lambda: fileobj.read(65536), b""):
        stdin.write(chunk)        # stderr can never back up on us now

proc.wait(timeout=RESTORE_TIMEOUT)
drainer.join(timeout=5)

if proc.returncode != 0:
    raise EngineError(pull_cause(b"".join(err_chunks)))

The drainer reads stderr in a tight read(8192)-until-b"" loop, so the child's stderr buffer never fills. Meanwhile the main thread streams stdin to its heart's content. The with proc.stdin block closes stdin when the dump is exhausted — that EOF is how pg_restore knows the input is done — and then we wait(), join the drainer, and read the captured stderr only after the tool has exited. No buffer, no deadlock, and we still get the real error message on failure.

That's the entire trick. Everything below is the part that turns it from a snippet into something you'd trust with a database.

The details that bite

This is roughly what cli2ui's streaming restore does, and the corners worth knowing about:

Peek the first bytes to pick the tool. A custom-format archive starts with the PGDMP marker and goes through pg_restore; a plain SQL dump goes through psql. So read the leading 5 bytes, decide, then write those bytes back before the rest of the stream:

  head = fileobj.read(5)
  is_custom = head[:5] == b"PGDMP"
  ...
  with proc.stdin as stdin:
      if head:
          stdin.write(head)     # don't lose the bytes you peeked
      for chunk in iter(lambda: fileobj.read(65536), b""):
          stdin.write(chunk)

Catch BrokenPipeError. If the tool hits an error and exits while you're still writing, your next stdin.write() raises BrokenPipeError. That's not a bug to crash on — it means the child died early and the real reason is sitting in the stderr you drained. Catch it, wait(), and report from err_chunks.
Make failure mean failure. By default pg_restore limps past errors and tells you a count at the end; psql happily runs the next statement after a failed one. For a restore you want all-or-nothing, so: pg_restore --exit-on-error, and for the SQL path psql -v ON_ERROR_STOP=1 --single-transaction (the whole restore in one transaction — a failure leaves the database untouched).
Daemon thread + a join timeout. The drainer is daemon=True so a wedged reader can never keep your process alive, and wait()/join() both carry timeouts so a genuinely stuck tool gets killed instead of hanging your request forever — the very thing we set out to avoid.

The takeaway

The deadlock isn't really about Postgres — it's about pipes. Any time you stream into a child's stdin and capture its stderr (or stdout), you've signed up to read the other end concurrently, or the OS buffer will eventually introduce the two of you to a deadlock. A thread is the boring, correct answer: one side writes, the other drains, and they meet at the join.

This is one piece of cli2ui — a local-only web UI over the psql commands you keep half-remembering. No AI, no SaaS. It's MIT-licensed on GitHub. What command do you reach for that should be a button?

Streaming a pg_restore through Python without deadlocking

Why it hangs

The fix: drain stderr on a thread

The details that bite

The takeaway

Tags

Author

Stats

Published

You Might Also Like

Some friends wanted to see how I use DigitalOcean. So I built them the smallest real app I could.

The LLM Visibility Tools Cost $79/Month. Mine is Open Source.

On programming languages, targets, and platforms

I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.

My eval harness paid for itself on the first run: 0.57 0.96, two bugs no unit test could catch

Never lose a training run again: a checkpoint-and-resume playbook for ephemeral GPUs