Migrating to Postgres. Since early 2022, Motion was on… | by Sean Callahan

Note the `AND 1=1` at the end, resulting in a scan of the entire table

The vast majority of our real world queries seemed to follow this general pattern where Prisma generates very convoluted SQL with numerous inclusions of various columns and joins. That, combined with the magical Cockroach optimizer, ultimately resulted in extremely high latencies. Here’s a real example of a query that is twenty times faster in Postgres than Cockroach.

SELECT
"teamTasks"."id",
"teamTasks"."name",
"teamTasks"."description",
"teamTasks"."dueDate",
"teamTasks"."duration",
"teamTasks"."completedTime",
"teamTasks"."assigneeUserId",
"teamTasks"."priorityLevel",
"teamTasks"."statusId",
"teamTasks"."projectId",
"teamTasks"."createdByUserId",
"teamTasks"."createdTime",
"teamTasks"."updatedTime",
"teamTasks"."lastInteractedTime",
"teamTasks"."archivedTime",
"teamTasks"."workspaceId",
"teamTasks"."minimumDuration",
"teamTasks"."scheduledStart",
"teamTasks"."scheduledEnd",
"teamTasks"."startDate",
"teamTasks"."isChunkedTask",
"teamTasks"."isUnfit",
"teamTasks"."type",
"teamTasks"."needsReschedule",
"teamTasks"."deadlineType",
"teamTasks"."schedule",
"teamTasks"."parentChunkTaskId",
"teamTasks"."parentRecurringTaskId",
"teamTasks"."rank",
"teamTasks"."isFixedTimeTask",
"teamTasks"."slug",
"teamTasks"."previousSlugs",
"teamTasks"."endDate",
"teamTasks"."isBusy",
"teamTasks"."isAutoScheduled",
"teamTasks"."ignoreWarnOnPastDue",
"teamTasks"."scheduleOverridden",
"teamTasks"."snoozeUntil",
"teamTasks"."manuallyStarted",
"teamTasks_chunks"."data" AS "chunks",
"teamTasks_labels"."data" AS "labels",
"teamTasks_blockedTasks"."data" AS "blockedTasks",
"teamTasks_blockingTasks"."data" AS "blockingTasks"
FROM
"TeamTask" "teamTasks"
LEFT JOIN LATERAL (
SELECT
coalesce(json_agg(json_build_array("teamTasks_chunks"."id")), '[]'::json) AS "data"
FROM
"TeamTask" "teamTasks_chunks"
WHERE
"teamTasks_chunks"."parentChunkTaskId" = "teamTasks"."id") "teamTasks_chunks" ON TRUE
LEFT JOIN LATERAL (
SELECT
coalesce(json_agg(json_build_array("teamTasks_labels"."labelId")), '[]'::json) AS "data"
FROM
"TeamTaskLabel" "teamTasks_labels"
WHERE
"teamTasks_labels"."taskId" = "teamTasks"."id") "teamTasks_labels" ON TRUE
LEFT JOIN LATERAL (
SELECT
coalesce(json_agg(json_build_array("teamTasks_blockedTasks"."blockingId")), '[]'::json) AS "data"
FROM
"TeamTaskBlocker" "teamTasks_blockedTasks"
WHERE
"teamTasks_blockedTasks"."blockedId" = "teamTasks"."id") "teamTasks_blockedTasks" ON TRUE
LEFT JOIN LATERAL (
SELECT
coalesce(json_agg(json_build_array("teamTasks_blockingTasks"."blockedId")), '[]'::json) AS "data"
FROM
"TeamTaskBlocker" "teamTasks_blockingTasks"
WHERE
"teamTasks_blockingTasks"."blockingId" = "teamTasks"."id") "teamTasks_blockingTasks" ON TRUE
WHERE ("teamTasks"."workspaceId" in(
SELECT
"workspaceId" FROM "WorkspaceMember"
WHERE ("WorkspaceMember"."userId" = '{{user_id}}'
AND "WorkspaceMember"."deletedTime" IS NULL))
and("teamTasks"."workspaceId" in('{{workspace_id}}')
and("teamTasks"."id" in(
SELECT
"taskId" FROM "TeamTaskLabel"
WHERE
"TeamTaskLabel"."labelId" in('{{label_id}}'))
OR "teamTasks"."id" NOT in(
SELECT
"taskId" FROM "TeamTaskLabel"))
AND "teamTasks"."archivedTime" IS NULL
AND "teamTasks"."completedTime" IS NULL))

Most of our queries on the Team Tasks table were, on average, three times slower on Cockroach compared to Postgres.

There were a handful of UI issues that also plagued us throughout the process.

Unused Indices

The UI for unused indices would show a mostly used indices, which periodically led to confusion by developers. We never got to the bottom of whether this was due to us using Prisma or not.

Cockroach’s list of recommendations, advocating developers dropped indexes that are still used

2. Cancelling running queries

By now we knew that running expensive queries on Cockroach was extremely scary. But since it’s a distributed cluster, cancelling a query isn’t so simple. With Postgres, you can simply hit cancel on TablePlus (or your preferred SQL client). On Cockroach, you have to actually log into the console and cancel the query, and pray all nodes cancel before they fall over. (At least once, they did not. You can probably guess what happened.)

3. Support

First, the support portal is a totally different website that doesn’t share auth with the main portal. Second, you have to re-input a lot of data they already know about you (cluster ID, etc). And by the time they respond it’s typically been a week. Normally that’s fine, but once they rolled out a bug which of course took us down immediately. That was not the time to log into a separate portal and input mundane details.

Throughout the over two years we were using Cockroach, we consistently ran into periodic Tailscale connectivity issues.

getaddrinfo ENOTFOUND internal-motion-dev-2022-10-grc.gcp-us-central1.cockroachlabs.cloud
PrismaClientInitializationError: Can't reach database server at 'internal-motion-prod-gnp.gcp-us-central1.cockroachlabs.cloud':'26257'

No matter what we did though, these issues would suddenly arise, and, an hour later, just as suddenly disappear. These issues persisted in every environment (Airbyte connectors, CI, and our local TablePlus clients). We never solved it, and have never had similar issues with Postgres.

By Jan 2024, our largest table had roughly 100 million rows. As mentioned earlier, while there were many tools built to import data into Cockroach, there were no ETL tools other than a very alpha Airbyte connector that kept timing out due to a memory leak.

So I decided to build a custom ETL solution.

Around this time, Bun was gaining popularity, so this migration was a convenient excuse to satisfy my curiosity. The Bun script roughly did the following steps:

Read the database schema and all of its table information
Dump the data of each table to a dedicated file on disk
Spawn a Bun child process for each table in our schema
Each child process would then initiate a streaming connection of the data from the dumped data for that table
The stream was just a CSV stream of all the rows in the table
The “sink” for our stream would just insert the rows into Postgres

I got to this point relatively quickly and should have known this was going too well to be true. It wasn’t until I started to run test migrations on our production database that I learned that Cockroach used slightly different byte encoding in JSON and array columns than Postgres did. The following few weeks involved using Csv-js to put together a custom CSV parsing pipeline to transform all of the data from Cockroach to something that was compatible with Postgres, but also identical from a user perspective.

When the night of the migration came, I spun up the largest VM I could get my hands on GCP (128 core VM), turned on Motion’s maintenance mode, and let it rip. The production migration script from start to finish took roughly 15 minutes to run through the entire DB.

Everything you read above was done by one person (me), over the course of several weeks. In total, we were down for just under an hour from midnight to 1am pacific time — and of course, there was zero data loss. We could have been even more aggressive and limited the downtime to just 15 minutes, but we opted to be super safe and gradually ramp traffic back up.

Afterwards, we saw an immediate 33% drop in aggregated request latencies. But the exciting part was just beginning. Thanks to the Postgres ecosystem and tools like PGAnalyze, we fixed half a dozen unoptimized queries in just a few hours after the migration.

Despite being conservative and over provisioning our postgres cluster, we still ended up saving the business a little over $110,000 a year (would be even higher when factoring in the continued growth of traffic to the system).

3D printing 3D scanning 5G 6G Adaptive learning AI AI ethics AI governance AI-driven automation AI-driven chatbots AI-driven healthcare AR/VR (Augmented and Virtual Reality)Artificial intelligence Augmented reality Automation Autonomous drones Autonomous vehicles Big data Bioinformatics Biometric security Blockchain Blockchain security Blockchain-as-a-Service Chatbots Cloud computing Cloud infrastructure Cloud security Cloud-native applications Cognitive computing Cryptocurrency Cyber defense Cyber-physical systems Cybersecurity Cybersecurity frameworks Data analytics Data governance Data lakes Data mining Data privacy Deep learning DevOps Digital currency Digital ecosystems Digital payments Digital transformation Digital twins Digital wallets Drones Edge AI Edge computing eSIM technology Fintech Fintech innovation Geospatial analytics Gig economy platforms Green technology Human augmentation Hybrid cloud Hyperautomation Image recognition Intelligent apps Internet of Behaviors (IoB)IoT (Internet of Things)IT operations IT security Machine learning Metaverse Microservices Mobile app development Multi-cloud environments Multi-factor authentication Natural language processing Neural networks Open-source software Predictive analytics Privacy-enhancing technologies Quantum computing Quantum encryption Quantum sensors Renewable energy storage Renewable energy tech Robotics Robotics process automation (RPA)SaaS (Software as a Service)Self-driving cars Serverless computing Smart cities Smart contracts Smart devices Smart grids Smart homes Supply chain tech Tech sustainability Video streaming Virtual assistants Virtual reality Voice recognition Wearable health tech Wearable technology Zero-trust security

Migrating to Postgres. Since early 2022, Motion was on… | by Sean Callahan | May, 2025

Liverpool star’s partner announces pregnancy as they share baby bump picture | Football | Sport

Olympic wrestler Kyle Snyder addresses arrest in prostitution sting: ‘Not conclusion of my journey’

Related Posts

Leave a Comment Cancel Reply