You want to restore a database dump that arrived as an upload โ a file object, maybe gigabytes โ and you don't want to spool the whole thing to disk or load it into memory first. So you reach for the obvious thing: open pg_restore with subprocess, write the dump to its stdin in chunks, capture stderr so you can show the user a real error if it fails.
proc = subprocess.Popen(
["pg_restore", "-d", dbname],
stdin=subprocess.PIPE,
stderr=subprocess.PIPE, # so we can report a real error
)
for chunk in iter(lambda: fileobj.read(65536), b""):
proc.stdin.write(chunk)
proc.stdin.close()
proc.wait()
It works on your test dump. It works in CI. Then someone restores a real database and the process hangs forever โ no error, no progress, both sides just... stop. Welcome to the pipe-buffer deadlock, one of the most reliable ways to wedge a subprocess pipeline.
Why it hangs
A pipe is a fixed-size OS buffer โ classically around 64KB. When it fills up, whoever's writing to it blocks until someone reads the other end.
Now trace the standoff:
- You're busy in a loop writing the dump to the child's stdin.
-
pg_restoreis chewing through it and writing progress and warnings to stderr โ which you said you'd capture (stderr=PIPE) but aren't reading yet, because you're stuck in step 1. - The stderr pipe fills to 64KB.
pg_restorenow blocks on its next write to stderr. - Blocked on stderr,
pg_restorestops reading its stdin. - With nobody draining stdin, your
stdin.write()fills its pipe and blocks too.
Both processes are now asleep waiting for the other to move. Nothing times out. Nothing errors. It just hangs. And it only shows up on dumps big or chatty enough to overflow that one buffer โ which is exactly why it sails through testing and dies in production.
The usual advice is "use communicate()," and communicate() does solve the deadlock โ by reading stdout/stderr on threads for you. But it wants you to hand it the entire input at once, which buffers your whole multi-gigabyte dump in memory. That's the thing we were specifically trying not to do. The moment you want to stream stdin and capture stderr, you're back to handling the pipes yourself.
The fix: drain stderr on a thread
You don't need a bigger buffer. You need someone reading stderr the whole time you're writing stdin. So give that job to a thread:
err_chunks: list[bytes] = []
drainer = threading.Thread(
target=lambda: err_chunks.extend(iter(lambda: proc.stderr.read(8192), b"")),
daemon=True,
)
drainer.start()
with proc.stdin as stdin:
for chunk in iter(lambda: fileobj.read(65536), b""):
stdin.write(chunk) # stderr can never back up on us now
proc.wait(timeout=RESTORE_TIMEOUT)
drainer.join(timeout=5)
if proc.returncode != 0:
raise EngineError(pull_cause(b"".join(err_chunks)))
The drainer reads stderr in a tight read(8192)-until-b"" loop, so the child's stderr buffer never fills. Meanwhile the main thread streams stdin to its heart's content. The with proc.stdin block closes stdin when the dump is exhausted โ that EOF is how pg_restore knows the input is done โ and then we wait(), join the drainer, and read the captured stderr only after the tool has exited. No buffer, no deadlock, and we still get the real error message on failure.
That's the entire trick. Everything below is the part that turns it from a snippet into something you'd trust with a database.
The details that bite
This is roughly what cli2ui's streaming restore does, and the corners worth knowing about:
-
Peek the first bytes to pick the tool. A custom-format archive starts with the
PGDMPmarker and goes throughpg_restore; a plain SQL dump goes throughpsql. So read the leading 5 bytes, decide, then write those bytes back before the rest of the stream:
head = fileobj.read(5)
is_custom = head[:5] == b"PGDMP"
...
with proc.stdin as stdin:
if head:
stdin.write(head) # don't lose the bytes you peeked
for chunk in iter(lambda: fileobj.read(65536), b""):
stdin.write(chunk)
Catch
BrokenPipeError. If the tool hits an error and exits while you're still writing, your nextstdin.write()raisesBrokenPipeError. That's not a bug to crash on โ it means the child died early and the real reason is sitting in the stderr you drained. Catch it,wait(), and report fromerr_chunks.Make failure mean failure. By default
pg_restorelimps past errors and tells you a count at the end;psqlhappily runs the next statement after a failed one. For a restore you want all-or-nothing, so:pg_restore --exit-on-error, and for the SQL pathpsql -v ON_ERROR_STOP=1 --single-transaction(the whole restore in one transaction โ a failure leaves the database untouched).Daemon thread + a join timeout. The drainer is
daemon=Trueso a wedged reader can never keep your process alive, andwait()/join()both carry timeouts so a genuinely stuck tool gets killed instead of hanging your request forever โ the very thing we set out to avoid.
The takeaway
The deadlock isn't really about Postgres โ it's about pipes. Any time you stream into a child's stdin and capture its stderr (or stdout), you've signed up to read the other end concurrently, or the OS buffer will eventually introduce the two of you to a deadlock. A thread is the boring, correct answer: one side writes, the other drains, and they meet at the join.
This is one piece of cli2ui โ a local-only web UI over the
psqlcommands you keep half-remembering. No AI, no SaaS. It's MIT-licensed on GitHub. What command do you reach for that should be a button?













