Enter fullscreen mode

You want your app to have its own copy of a user's email: to search it fast, run analytics over it, or feed it to a model without hitting the provider on every query. The naive version is a one-time pull that's stale the moment it finishes, because new mail keeps arriving and old mail gets read, moved, and deleted. A real sync is two jobs working together: backfill the history once, then track changes forever after. This post builds that out with the Email API and shows the CLI shortcuts for testing each piece.

It's a worked use case rather than an endpoint tour, pulling together list messages, cursor pagination, and the message webhooks from two angles: the HTTP API your backend runs and the nylas CLI for the terminal. I work on the CLI, so the commands below are the ones I reach for when I'm checking what a sync will actually see.

The two phases of a mailbox sync

A durable sync splits into a backfill and a tail. The backfill is a one-time walk through every message already in the mailbox, page by page, writing each into your database. The tail is the ongoing part: webhooks tell you about each new message, each change, and each deletion as it happens, so your copy keeps pace with the real mailbox instead of drifting from it.

The reason to separate them is that they have different shapes. Backfill is a bounded loop over a known, finite set, optimized for throughput, and you run it once per grant. The tail is an unbounded event stream optimized for latency, and it runs for the life of the connection. A mailbox with 40,000 messages takes a while to backfill but only has to be done once, while the tail handles the handful of changes that trickle in every minute after. Build them as two code paths that write to the same store.

Phase 1: backfill with cursor pagination

The backfill walks the mailbox in pages. A GET /v3/grants/{grant_id}/messages call returns a page of messages plus a next_cursor in the response. You pass that cursor back as the page_token query parameter on the next call, and you keep going until a response comes back without a next_cursor, which means you've reached the end.

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages?limit=50&page_token=<CURSOR>" \
  --header "Authorization: Bearer <NYLAS_API_KEY>"

The limit parameter sets the page size. If you start hitting 429 or provider rate limits while listing, the docs suggest dropping limit to 20 and adding query parameters to narrow the results. The loop is simple: call, write the page to your database, read next_cursor, repeat. Persist the cursor between calls rather than holding it only in memory, because that's what makes the backfill resumable if the process restarts midway, which on a large mailbox it eventually will.

Back up a mailbox from the CLI

Before writing the backfill loop, it helps to see what it'll pull, and the CLI does the pagination for you. nylas email list --all walks the mailbox with pagination handled internally, the inbox only by default, and --max caps the total so a test run doesn't walk everything. Add --json to get structured output you can pipe into a file or jq.

# Pull up to 500 messages across all folders as JSON
nylas email list --all --max 500 --all-folders --json > backup.json

By default the command only lists the inbox, so --all-folders is what makes it a real backup rather than an inbox dump. This is the fast way to answer practical questions before you build anything: how many messages are in this mailbox, what the payload looks like, and roughly how long a full walk takes. The --all flag uses the same cursor pagination underneath that your backend loop will, so what you see here is what your sync will get.

Phase 2: stay in sync with webhooks

Once the history is in, the tail keeps your copy current, and it's event-driven rather than polled. You subscribe to three message triggers and act on each. message.created fires when mail arrives, so you insert it. message.updated fires when a message changes, read, starred, or moved, so you update your row. And message.deleted fires when one is removed, so you delete or tombstone yours.

app.post("/webhooks/nylas", async (req, res) => {
  res.sendStatus(200); // acknowledge fast
  const { type, data } = req.body;
  const msg = data.object;
  if (type.startsWith("message.deleted")) {
    await db.deleteMessage(msg.id);
  } else if (type.startsWith("message.created") || type.startsWith("message.updated")) {
    // Payloads over 1 MB arrive as ".truncated" with the body removed; re-fetch it.
    const full = type.endsWith(".truncated") ? await fetchMessage(msg.id) : msg;
    await db.upsertMessage(full);
  }
});

Acknowledge the delivery fast with a 200 and do the database write after, so a slow query never causes a webhook timeout and redelivery. Two details make the matching prefix-based rather than exact. A notification over 1 MB arrives with a .truncated suffix on the type and its body field stripped, so you re-query the message to get the full content. And if you turn on field customization, the type gains a .transformed suffix. Created and updated both do the same upsert, since keying on the message ID means a create and a later update land on the same row, which is exactly the idempotency the next section is about. The deletion path is the one most people forget, and skipping it leaves your copy showing mail the user already trashed.

Make every write idempotent

The single rule that keeps a sync correct is that every write is keyed on the message ID, so doing it twice is the same as doing it once. This matters because the two phases overlap and webhooks repeat. A message can arrive over a webhook while your backfill is still running and pull the same one, and providers redeliver webhooks, so the same message.created can fire more than once. Without idempotency, you get duplicate rows for one email.

An upsert solves all of it. When you write a message, insert it if the ID is new and overwrite if the ID exists, rather than blindly inserting. Then the backfill and the tail can both touch the same message in any order with no duplicates, a redelivered webhook is harmless, and re-running a failed backfill page just rewrites rows you already had. The message ID that every endpoint and webhook returns is what you key on, so make it the primary key of your table. Get this one detail right and most of the scary concurrency in a sync stops mattering.

Resume a backfill with a time window

A backfill over a huge mailbox shouldn't be one unbroken loop, because any failure midway means starting over. Two query parameters let you slice it into resumable windows: received_after and received_before each take a Unix timestamp and bound the results to a date range. You walk the mailbox a month at a time, record which windows are done, and a crash only costs you the window in flight.

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages?received_after=1704067200&received_before=1706745600&limit=50" \
  --header "Authorization: Bearer <NYLAS_API_KEY>"

Windowing also lets you prioritize. Sync the last 90 days first so the app has recent mail to show immediately, then backfill the older history in the background where nobody's waiting on it. Date windows are an API feature; the CLI doesn't expose date flags, but nylas email list --from <sender> and --folder <name> scope a test pull to a slice of the mailbox when you want to inspect one sender or folder rather than the whole thing. The combination of cursor pagination within a window and date windows across the mailbox is what turns a fragile one-shot job into one you can stop and resume at will.

Store the right amount, not everything

A message body can be large, and a mailbox has thousands of them, so storing every full body is often the wrong default. The list response includes the body by default, but the select query parameter trims it to just the fields you name, so a metadata-only backfill stays lean. You store what you query on, the ID, sender, subject, date, folders, and read state, and fetch the full body from GET /v3/grants/{grant_id}/messages/{message_id} on demand when a user opens a message.

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages?select=id,from,subject,date,folders,unread&limit=50" \
  --header "Authorization: Bearer <NYLAS_API_KEY>"

With select narrowing the payload, your database stays small and fast for the list and search views that run constantly, and the heavy body is pulled only for the one message someone's reading.

Which way you lean depends on the use case. If you're running full-text search or feeding a model, you need the bodies, so store them, ideally in a store built for large text rather than your primary transactional table. If you're building an inbox list view or analytics over headers, metadata alone is plenty and far cheaper to keep current. Decide up front, because backfilling bodies later means a second walk through the whole mailbox.

Where a local sync pays off

This pattern shows up under a lot of different products, and the mechanism is the same each time. A few that map straight onto it:

Fast search and filtering. Querying your own indexed database returns results in milliseconds, where hitting the provider per query is slow and rate-limited. The sync is what makes instant search possible.
Analytics and reporting. Response times, volume by sender, thread activity, all of it runs over your local copy without touching the mailbox, so a heavy report never competes with live traffic.
AI and retrieval. Feeding email to a model for summarization or retrieval needs the content sitting in a store you control, not fetched live on every prompt.
Offline and resilience. Your app keeps working through a provider hiccup because the data it reads is local, and the tail catches up when the connection returns.

Each is the same two-phase build, backfill then tail, with the same idempotent writes underneath. What changes is how much of each message you keep and what you index, not the sync itself.

Things to keep in mind

A short list of details keeps a mailbox sync correct under load.

Key every write on the message ID. Idempotent upserts are what make overlapping backfill and webhook writes safe; without them you get duplicate rows.
Handle message.deleted, not just created. Skipping deletions leaves your copy showing mail the user already removed, which reads as a bug.
Persist the pagination cursor. Storing next_cursor between pages is what makes a backfill resumable after a restart instead of starting over.
Window large backfills by date. received_after and received_before slice the mailbox into resumable chunks and let you sync recent mail first.
Acknowledge webhooks before the database write. Return 200 fast, then process, so a slow query doesn't trigger a redelivery storm.
Store metadata, fetch bodies on demand. Keep your hot table small and pull the full body only when a message is opened, unless search or AI needs the text.

Wrapping up

A mailbox sync is a backfill plus a tail. Walk the history once with cursor pagination, following next_cursor until it runs out, and slice it into date windows so it's resumable. Then keep it current with the message.created, message.updated, and message.deleted webhooks, writing every change as an upsert keyed on the message ID so overlap and redelivery never duplicate a row. Test the walk from the terminal with nylas email list --all, decide whether you need full bodies or just metadata, and you have a local copy that stays honest with the real mailbox.

Where to go next: