Next.js + Supabase app in production: what would I do differently

published 2024-08-16

Most codebases go through multiple revisions and changes during their lifetimes. That includes a mid-size Next.js application I’ve helped build from the ground up. Here I try to analyze what design choices were correct and have been serving well, and which ones have required migration operations or other large codebase refactoring.

This article and its learnings apply primarily to a mid-size app where the main goal is having authenticated users managing their own data in some way. If building a public-facing ecommerce site or something else, some of these won’t apply in the same way.

The technology stack is essentially self-hosted Next.js, cloud Supabase, AWS, Tailwind, Inngest.

Next.js

Use zod for structure types

the first attempt declared types directly with TypeScript

what worked better defined type in zod and use `z.infer` for the TypeScript type

When defining data structures used in your project, e.g. User or Project or DataMutationRoutePayload, it is best to define the data structure shape with zod and use z.infer to get the TypeScript type. The Zod schema makes it super easy to validate that objects passed around match the expected structure. While TypeScript types take care of some data structure concerns, things always sneak in and its best to actually validate the data you’re working with, especially just before inserting it into a database.

The resulting schemas are also usable for surprisingly many things from ChatGPT structured outputs to typesafe server actions.

Use server components

This is a general recommendation: fetch as much serverside data as possible inside server components.

Keeping data fetches near to where they’re actually used helps with code organization and security (auditing data fetches is easier if they happen near the use site).

Prefer server components to route handlers for data fetching

the first attempt Async data was retrieved from a route handler after page mount

what worked better Streamed the data to user from server component

Let’s say you’re building a feedback form where for some reason you have a selector of each product (thousands of them) in your database. Since you probably don’t want to block the initial page load, how should you load in the list of products shown in the selector? I think async promises passed from a server component to a client component is the answer.

Current version of Next.js with React Server Components supports passing in promises and using use hook to get the result, and in my opinion it is the ultimate answer to passing in asynchronous data to the client, because it keeps the data loading call near to where it’s being used and offers native React tools (Suspense, use hook) to deal with the pending promise.

In fact, the data loading speed with RSC promises can be faster than route handlers, because you’re already initiating the data fetching while doing the serverside render, whereas route handler only starts loading data after the React-frontend has mounted and initiated the data load call. These make it a nonbrainer choice for most data loading purposes.

An exception to server components and promises is easily cacheable and globally used data. For those, I would still use route handlers.

Prefer server actions to route handlers for data mutation

the first attempt Route handlers were used for all data mutation

what worked better Uses server actions for most cases for cleaner and more type-safe RPC

Server actions work nicely for data mutations, because

they can be used directly as form actions
their arguments and return values are type-checked (but not validated! See the next section)
they tend to be colocated next to the React components that actually call them, which is nice for code organization
the closure scope is encrypted, which acts as another form of security

Thus, for most data mutation operations it doesn’t make sense to use route handlers.

Server actions are harder to mock than route handlers, but with e2e testing approach they shouldn’t need to be mocked anyway.

Create shared utility wrappers for route handlers, page handlers, and server actions

the first attempt Copy-paste authorization and validation code

what worked better Utility higher-order function for authorization, validation, and authentication

You should create wrapper higher-order functions for your page handlers, route handlers, and server actions. Next.js essentially operates by exporting normal JS functions, so creating HOFs over these is very easy and saves a lot of effort.

I like having these features in HOFs:

authenticated flag (whether to check if user is authenticated for this operation)
Authorization. In my case this means a mandatory permission string that the user is verified against. Having it mandatory in the type system makes it harder to forget
Parameter schemas. Specifying zod schemas for accepted parameters and (importantly) automatically figuring out the right types for those makes life easy and helps prevent garbage data.
Error handling integrations. In my case, wrapping server actions in the Sentry handler
Rate limiting. Making sure that the more expensive operations don’t crash everything
admin flag. Sometimes some operations should only be done by admins

Generally, having one higher order function per each feature (route, page, server action) works pretty well. Sometimes you can do well with a more specialized HOF per subset of your web application. For instance, if your application is about managing musical artists and their gigs, having /artists/* and /gigs/* routes have their own lower-level HOFs might be beneficial.

There’s also some libraries for this, e.g. zsa for server actions.

Do pagination and filtering on the client

what worked better send raw data to client and have them do the filtering

The applicability of this one depends greatly on the details, but assuming you’re operating with at most few thousands of items per tenant, it is often much nicer to just send all data to the client and handle any filtering, sorting, or pagination operations there.

The math is pretty straightforward: assuming each row has 10 fields with 15 bytes worth of data, 2000 rows would be 300kB of data. While not completely free to transmit over the network, savings on development costs from being able to work on all data locally instead of over network requests makes this easily a worthwhile decision.

The math is different if the site and search are public (and thus highly cacheable) or you’re operating in higher orders of magnitude of data.

Avoid mocking Supabase calls

the first attempt created custom Supabase mocks by mocking the PostgREST api calls

what worked better spin up a local Supabase instance with seeded test data for e2e tests

Mocking Supabase calls sounds good in theory: you’re making it faster and easier to run the tests since a separate Supabase instance is not needed and the end result is the same. In practice though, maintaining mocks is painful and requires remembering the context in which the mocks were created, they fail to accurately represent actual database behavior (and catch e.g. invalid db function corner cases or foreign key constraints), and computing for CI purposes is anyway much cheaper than developer/QA working hours.

Run E2E tests on a built app instead of dev mode in CI

the first attempt ran e2e tests in Next.js dev mode

what worked better built Next.js app and ran e2e tests on that

Running e2e tests on built app reduces flakiness (CI can focus on running tests instead of both compiling pages and testing them) and running tests in an environment that represents production is better in either case.

Supabase / Database

Always use surrogate keys

the first attempt Natural keys for some entities that felt natural

what worked better Used unique UUIDs for everything, since they were needed at some point anyway

This especially concerns startups or other companies at an early phase. It is impossible to know how data will get used in the future or what kind of business objective changes there are, so relying on natural keys being and staying a good identifier for data is painting yourself into a corner.

A good example: people registry that uses a national id as the natural key is no longer such a good idea after expanding to another country.

Design row-level security (RLS) in a performant way

the first attempt RLS was built on a naive row-by-row assumption

what worked better RLS that relies on cached bulk queries can speed up some reads by an order of magnitude

(a good source on RLS performance that also acted as inspiration here: https://github.com/orgs/supabase/discussions/14576)

Row-level security functions can be a good way to absolutely destroy application performance. It is especially sneaky, because while RLS performance generally follows O(n) complexity, in practice this can cause something reminiscent of the n+1 database query problem. This means that while the RLS conditions seem quite simple, they cause tons of extra joins on queries that return many (hundreds or thousands of) rows.

In general, following the best practices in the Github link above are the way to go. These are some of the tricks I’ve found especially useful:

Wrap values that don’t change over the query in select statements
- e.g. auth.uid() won’t change over the database call, so you can wrap them in (select auth.uid()) to “cache” them instead of calling auth.uid for each RLS-checked row
Eliminate database calls altogether by e.g. querying JWT claims instead of the database or relying on cached data
Use batched RLS functions
- Instead of checking whether a flower belongs in a flower_shop that we have access to by doing a join between the two tables, create a authorized_flower_shops function that returns id of each flower shop we have access to and use something like flower.shop_id in (select authorized_flower_shops()). This caches the result of authorized_flower_shops so it becomes substantially faster to query for authorized flowers.
If all else fails, create a server-side verified route (or security definer db function) that does authorization check once and then fetches everything else bypassing RLS

Use RLS for selects. Avoid RLS for mutations

the first attempt Used RLS for everything and essentially interacted with the database via clientside Supabase calls

what worked better Used RLS only for selects and route all mutations through server (where they're done with service role)

Row-level security for everything is nice in theory: you’re making authorization part of the database itself so nothing can slip past the security veil and you’re then able to do all database operations on the client itself.

In practice in the real world there are many concerns that make this impossible. For instance, you might want to validate some fields of the to-be inserted records with an external API. While these are technically doable with RLS, it becomes easier at some point to just draw a hard line between read operations (that can still use RLS) and write operations that should be routed through a server.

Technically, Column Level Security is an answer to this and more, but it gets too arbitrary to my liking. It’s easier to consider authorization on row-level.

If you want to stick to the client being able to do everything, you can still utilize security definer database functions to do customized write operations.

Create admin mode using impersonation instead of RLS

the first attempt Used RLS to give admins access to everything

what worked better Used an explicit impersonation mode for audited access

Many apps will eventually need an admin mode that can access effectively all rows on demand. Avoid the easy option of adding admin exceptions to RLS rules (e.g. if user.email ends with @mycompany.com, grant access to everything).

This is for few reasons:

auditing admin access is easier if you have to manually enter admin mode instead of just having access to everything
even as an admin user, you should see the app in the same way the normal users do. This helps avoid bugs that are only visible to non-admin users.
admin-bypass for RLS might stop you from seeing some RLS-related performance issues that would otherwise appear

Instead, add some kind of impersonation mode where an admin user can start impersonating a non-admin user to interact with the way they see the app. I wrote an article on how to do this in practice.

Package architecture

Use pnpm instead of npm

the first attempt Used npm

what worked better Used pnpm

Oh boy. Many reasons:

less problems with package version and dependency clashes

npm likes to have all dependencies in flat node_modules/, which is problematic if your direct and indirect dependencies have clashing version requirements. This can cause very annoying bugs that are super hard to debug (think axios 0.x version being used instead of 1.x causing some headers to be dropped from every request)

pnpm is just faster!
there is built-in patch support in pnpm, which most apps that do anything complicated will need at some point anyway

Split off clearly independent services into their own packages

Slightly controversial, but monorepos are not the evil. pnpm workspaces + turborepo makes working with monorepos a breeze and having each clear service in their own repo helps (and forces) you to identify common dependencies and split them off into their own maintainable components. Additionally, monorepo services are faster to build on change (especially with turborepo pruning), because you’re building only what’s needed for an individual feature.

Simple design system component library

Focus being on simple, there should be a set of components to use that conform to the project’s design guide (or at least to-be design guide by the common use of same components) and are fun to use.

To me, simplicity means that creating a button in most cases should look like <Button>hello world</Button>, i.e. no extra classes or attributes required. Now variants, style modifiers, or loading spinners are fine to support but those should not be required just for a simple button!

In most projects, I would create the component library as its own package (or folder) with a simple set of components that appear in many places, like <Button>, <TextInput>, and <Alert>.

Nowadays, if I was starting a design system component library package from scratch, I would look into utilizing shadcn and perhaps combining it with Tailwind and HeadlessUI.

Build tailwind only in the apps they’re being used in

A slight appendix to the component library section: tailwind classes should be generated only in the apps that use them instead of the design system component library. (This was something that we did and ended up somewhat backfiring)

For example, given a structure like this:

apps/dashboard
apps/metrics
packages/ui-system

the tailwind config in each of dashboard and metrics should include the ui-system source files rather than ui-system building its own tailwind css file. This is because multiple tailwind css outputs can cause annoying classname clashes/ordering issues, app-specific tailwind building is more atomic and performant, and this just keeps code generally simpler.

Other infrastructure

Prefer something like Inngest over crontabs or home-built scheduling solutions

Scheduling events or reacting to triggers is easy with Inngest, it comes with a cool dashboard, and the background events still run on your servers. Also the free tier is pretty nice.

What more is there to ask for?

That being said, Inngest is best used as a background task runner and such. For high capacity message queue needs I would use something self-hosted.

Conclusion

These are some of the decisions I’ve made and found useful. If you have other ideas on what you would do differently or want one of these topics expanded, feel free to contact me!