Column Encryption
Encrypt columns using Transparent Column Encryption.
Supabase provides a secure method for encrypting data using Vault, our Postgres secrets manager. Vault is a Postgres extension with an integrated UI intended to act as a secure global secrets management for your project.
In addition to the Vault secret storage table, Supabase also enables an advanced feature called Transparent Column Encryption (TCE) which provides a safe way to encrypt columns in your own tables so that they don't leak into logs and backups. It can also provide row-level authenticated encryption.
Column Encryption comes with tradeoffs that need to be considered before using it.
-
Encryption and decryption both take time, inserting and selecting encrypted data takes more time than a "plain" column of data. Queries that must load a lot of rows with encrypted columns will be slower than those that are not encrypted.
-
Encrypted columns should never be indexed. This is because the index will store the encrypted value of a column, which would not be useful. To look up an encrypted column, you must first look up its row by some other unencrypted column, like a primary key.
-
Encrypted columns can be queried in a
WHERE
clause, but this can also have some negative performance consequences, since the value must be decrypted in order to matched to anyWHERE
qualifiers. For small result sets the impact is small, but queries that match many rows will cause each row to be decrypted in order to check against theWHERE
clause. In the worst case, the entire table must be scanned and decrypted to check against aWHERE
clause. -
While you can encrypt multiple columns in the same table, each column must go through a full encryption cycle, so two columns will take twice the time as one, three columns three time as long, and so on. It is usually better to break up large tables into smaller tables with at most one or two encrypted columns, and associated them with foreign keys that can be used to JOIN them when necessary.
In general, it is a bad idea to over-use column encryption for mundane data or data that you need to search against such as names, user or account types, addresses, country codes, etc. Column encryption is intended to be used for very sensitive data that would cause serious issues if it were to leak, such as API keys, payment keys, highly sensitive personal information, etc.
Encryption Keys#
Encryption requires keys. Keeping the keys in the same database as the encrypted data would be unsafe. It would be like leaving a key in a door lock. For that reason, encryption keys are kept separate from the database.
Supabase column encryption and the Vault both use a low level Postgres extension library called pgsodium. This library in turn wraps a very popular encryption library called libsodium. These libraries provide functionally for doing something called Key Derivation. Supabase uses key derivation to derive the encryption keys used by column encryption. These keys are derived from a Root Key that is stored by Supabase, outside the database. If you need to migrate encrypted data from one system to another, you must ensure that this root key is copied over as well. There is a REST API for getting the root key for a project.
Encrypting columns#
When creating a new column in the Dashboard, you can choose to encrypt a text
or bytea
column. You will choose which key you would like to encrypt it with by selecting an existing key ID or creating a new one.
Once you've created an encrypted column, you can insert data into the table like you would any other table. For example if you were to insert an email address into an encrypted column, you will see that the address is transparently converted into an encrypted value on the new row.
As mentioned above, indexing an encrypted column serves no purpose and has a detrimental effect on performance. Indexing encrypted values is not useful since the index needs to store the unencrypted value to be usable, and that would defeat the purpose of storing encrypted data. Do not create indexes on encrypted columns.
Decrypting data#
Decrypted data is accessed using a special view that is automatically created after adding an encrypted column to a table. This view decrypts the data row-by-row as you access it. By default, this view is called decrypted_<your-table-name>
. In the example below, the decryption view for the profiles
table is called decrypted_profiles
. Notice there is a new column in the view called decrypted_emails
that contains the decrypted email value.
Accessing decrypted data is slower than accessing unencrypted data. Any row you query from this view will go through a decryption function, which takes some time and can be a significant performance pentalty if your query loads lots of rows. It's always advisable to look up rows by some indexed unencrypted key first, like a primary key, so that only one row is decrypted at a time.
Using an Encrypted Table#
Now that you have column encryption setup for a table, it's easy to use by simply inserting data into the table, and querying that data by looking at its generated view. The view is named decrypted_<table_name>
and by default is in the same schema as your table:
_10insert into secrets_10 (secret, account_id)_10values_10 ('1234-5678-8765-4321', 123);
Now that you have inserted data, look at the table and notice how the secret is encrypted. This is the data that is stored on disk, the encrypted card number, the key id, and the account id, but the key itself is not stored. This means if someone gets a backup or dump of your database, they cannot decrypt the secret, they do not have the key, only the key ID:
_10> select * from secrets where account_id = 123;_10-[ RECORD 1 ]------+---------------------------------------------------------------------_10id | 1_10secret | jf8KfImkKTr+j4gzyDZQtLDEFL9eSlFuKjNlNEJvDg+OIKUr2wjF/8NnYcLisb5F9xiN_10account_id | 123_10key_id | 7f753c4f-8c68-457a-8801-1798b2e9f44d_10nonce | \x300a14aa721184ff7cf0f6bf088da267
For you, the developer, you need the unencrypted secret for you application. No problem, you can access that data using the dynamically generated decryption view decrypted_secrets
:
_10> select * from decrypted_secrets where account_id = 123;_10-[ RECORD 1 ]----------------+---------------------------------------------------------------------_10id | 1_10secret | jf8KfImkKTr+j4gzyDZQtLDEFL9eSlFuKjNlNEJvDg+OIKUr2wjF/8NnYcLisb5F9xiN_10decrypted_secret | 1234-5678-8765-4321_10account_id | 123_10key_id | 7f753c4f-8c68-457a-8801-1798b2e9f44d_10nonce | \x300a14aa721184ff7cf0f6bf088da267
Notice how there is a new column called decrypted_secret
. This column is not stored in database or on disk at all, it is generated “on-the-fly” as you select from the view. Database dumps do not contain this information, only the view itself, and most importantly, raw decryption keys are never stored.
Granting API Access to encrypted columns#
When you create an encrypted column in the public
database schema, then that column and the decryption view that is created for it are given public access via the PostgREST API. In general it is not recommended to encrypt columns in the public schema, as this adds some additional security considerations that you need to think about, but it is still possible and useful in some cases depending on your level of security comfort. Only you can decide the risks and rewards of making decrypted data available to public API access. It is very strongly recommended that you must protect your public encrypted columns with a Row Level Security (RLS) Policy.
Even though objects created in the public
schema default to public access, There is an additional layer of security with pgsodium that must be enabled in order for decryption to work with the PostgREST API. The permissions necessary to call pgsodium functions must be granted to one of the API role that is accessing the public objects.
There are three roles available for API access with Supabase, The anon
role represents unauthenticated anonymous users and should never be granted access to pgsodium. The service_role
which is meant to be used by other services in your system, and whose token should never be exposed publicly, and the authenticated
role which represents authenticated users who present a valid JWT token and are identified by the system as a known user.
By default, only the service_user
role is granted access to the column encryption functions. If you wish to make encrypted columns available to your authenticated users, then you must run GRANT pgsodium_keyiduser TO authenticated;
from the Supabase console or logged in as the postgres
role. This table shows the default permission settings for API roles accessing the pgsodium encryption functions:
Role | pgsodium_keyiduser | pgsodium_keymaker |
---|---|---|
anon | X | X |
authenticated | X | X |
service_role | Y | X |
How Key Derivation Works#
The current state-of-the-art in encryption libraries is libsodium.
libsodium offers a range of APIs for authenticated secret and public key encryption, key derivation, encrypted streaming, AEAD, various forms of hashing, and much more.
This powerful API is available to PostgreSQL using the pgsodium extension. pgsodium provides all the functionality of the full libsodium API, but previously it required developers to set up database encryption themselves, which remained a challenge even for those familiar with database administration.
To solve this problem, pgsodium now has a full key management API, primarily via the table pgsodium.key
and the pgsodium.create_key()
function. This key table contains no raw keys, but instead uses libsodium Key IDs to derive keys that are used internally for encryption. A key derivation function is used with an internal root key that is unavailable to SQL and not stored in the database, but rather managed by you externally using flexible scripts, or by Supabase automatically as part of our service offering.
The simplest way to use pgsodium to encrypt and decrypt data is to first create a Key ID. Valid Key IDs are stored in pgsodium in a special extension table, and they can be created using the pgsodium.create_key()
function. This function takes a number of arguments depending on how it's used, but the simplest case is to create a new key with no arguments:
_11select * from pgsodium.create_key();_11-[ RECORD 1 ]---+-------------------------------------_11id | eaa20d8c-c77c-4985-9f73-2a5f5d1f1e6d_11name |_11status | valid_11key_type | aead-det_11key_id | 2_11key_context | \x7067736f6469756d_11created | 2022-11-13 21:19:35.765823+00_11expires |_11associated_data |
This key can now be used with pgsodium encryption functions by its UUID (eaa20d8c-c77c-4985-9f73-2a5f5d1f1e6d
). For example:
_10select * from pgsodium.crypto_aead_det_encrypt (_10 'this is the message', -- a message to encrypt_10 'this is associated data', -- some authenticated associated data_10 'eaa20d8c-c77c-4985-9f73-2a5f5d1f1e6d'::uuid -- key ID_10);
This produces the following encrypted “ciphertext” using the aead-det
algorithm from the libsodium XChaCha20-SIV encryption function.
_10-[ RECORD 1 ]-----------+---------------------------------------------------------------------------------------------------------_10crypto_aead_det_encrypt | \\x099baa820250d7375ed141f8f1936af384bc229f3de1010a6eff6ffdaf3998baffbae75b5cd83d1c469407ff2d3764a428b742
Now to decrypt the ciphertext, pass it to the decryption function with the same Key ID:
_10select *_10from convert_from(pgsodium.crypto_aead_det_decrypt (_10 '\\x099baa820250d7375ed141f8f1936af384bc229f3de1010a6eff6ffdaf3998baffbae75b5cd83d1c469407ff2d3764a428b742',_10 'this is associated data',_10 'eaa20d8c-c77c-4985-9f73-2a5f5d1f1e6d'::uuid_10), 'utf8');
Which recovers the original “plaintext” message:
_10-[ RECORD 1 ]+--------------------_10convert_from | this is the message
In the above example, there is no raw key, only a Key ID which is used to derive the key used to encrypt the message and authenticate it with the associated data. In fact, it is impossible for a SQL user to derive the key used above, and if the Key ID is stored, then no encryption keys or decrypted information will leak into backups, disk storage, or the database WAL stream.
Transparent Column Encryption#
As of pgsodium 3.0.0 and up, the extension offers a simple and declarative Transparent Column Encryption feature (TCE). This feature is now shipped with all Supabase projects. TCE allows you to specify encrypted columns within a table and generates a new view that “wraps” that table to decrypt the contents.
TCE works using two dynamically generated objects for tables that contain encrypted columns:
- an
INSERT UPDATE
trigger that encrypts data when it is inserted or modified - a view that is created to wrap the table to decrypt the data when it is accessed
To “transparently” decrypt the table, access the dynamically generated view instead of the table. For every encrypted column in the table, the view will have an additional decrypted column that shows the decrypted result.
It's worth noting at this point that sometimes there is some confusion about handling encrypted data with TCE. The T
stands for Transparent which means, you can always see decrypted data through the view, where the decrypted data can't be see is when stored on disk, or in pg_dumps, backups, WAL streams, etc. This is often called Encryption At Rest and is one layer in many that may be used to encrypt and protect your data.
Often Transparent encryption is understood to be “Transparent Disk Encryption” or “Full Disk Encryption”, this is where a drive is encrypted but reading and writing that drive is decrypted. TCE is similar to this, where data on disk is encrypted, but it is more fine grained, only particular columns are encrypted. The data is also encrypted in the sense that the table stored on disk contains encrypted data, without the view or the key, pg_dumps and backups still contain encrypted data, this is not possible with disk-only encryption.
For the moment TCE only works for columns of type text
(or types castable to text
like json
). Soon we will also support bytea
and possibly more as use cases and tests get better.
TCE uses one of PostgreSQL's lesser-known features: SECURITY LABEL
. A security label can be thought of as a simple label which is attached to an object (a table, column, etc). Each label is scoped to an extension and that extension can provide security features depending on the label.
Let's see a simple example of using SECURITY LABEL
to encrypt a column using pgsodium.
One Key ID for the Entire Column#
For the simplest case, a column can be encrypted with one Key ID which must be of the type aead-det
(as created above):
_10create table secrets (_10 id bigserial primary key,_10 secret_number text_10);_10_10security label for pgsodium_10 on column secrets.secret_number_10 is 'ENCRYPT WITH KEY ID e348034b-3f07-4878-aad6-000511d12826';
The advantage of this approach is simplicity - the user creates one key and labels a column with it. The cryptographic algorithm for this approach uses a nonceless encryption algorithm called crypto_aead_det_xchacha20()
. This algorithm is written by the author of libsodium and can be found here.
Using one key for an entire column means that whoever can decrypt one row can decrypt them all from a database dump. Also changing (rotating) the key means rewriting the whole table.
One Key ID per Row#
A more fine grained approach would be storing one Key ID per row:
_10create table secrets (_10 id bigserial primary key,_10 secret text,_10 key_id uuid not null default 'e348034b-3f07-4878-aad6-000511d12826'::uuid_10);_10_10security label for pgsodium_10 on column secrets.secret_10 is 'ENCRYPT WITH KEY COLUMN key_id';
This approach ensures that a key for one user doesn't necessarily decrypt any others. While rows can share key IDs, they don't necessarily have to (unlike the first example above where one key id was used for the entire column). It also acts as a natural partition that can work in conjunction with Row Level Security to share distinct keys between owners.
Notice also how there is a DEFAULT
value for the key_id
. In a way, this gives you the best of both approaches - encrypting the column when a per-row key ID is not provided. The downside to this approach is that you need to store one key ID per row, which takes up more disk space (but that's cheap!).
One Key ID per Row with Nonce Support#
The default cryptographic algorithm for the above approach uses a nonceless encryption algorithm called crypto_aead_det_xchacha20()
. This algorithm has the advantage that it does not require nonce values, the disadvantage is that duplicate plaintexts will produce duplicate ciphertexts.
Nonces are some extra cryptographic context that is used in many cryptographic algorithms to produce different ciphertexts, even if the plaintexts are the same. The nonce does not have to be secret, but it does have to be unique. pgsodium comes with a useful function pgsodium.crypto_aead_det_noncegen()
that will generate a cryptographically secure nonce for you, and in almost all cases it's best to use that function unless you know specifically what you are doing. In password hashing approaches, this is often similar to how a “salt” value is used to deduplicate password hashes.
Duplicate ciphertexts cannot be used to “attack the key”, it can only reveal the duplication. However, duplication is still information. In our examples so far, an attacker might be able to use this information to determine that two accounts share the same secret. While not technically breaking the encryption, this still leaks information to an attacker.
_10create table secrets (_10 id bigserial primary key,_10 secret text,_10 key_id uuid not null default 'e348034b-3f07-4878-aad6-000511d12826'::uuid,_10 nonce bytea default pgsodium.crypto_aead_det_noncegen()_10);_10_10security label for pgsodium_10 on column secrets.secret_10 is 'ENCRYPT WITH KEY COLUMN key_id NONCE nonce';
This is the most secure form of TCE - there is a unique key ID and a unique nonce per row.
One Key ID per Row with Associated Data#
The encryption that is used for TCE is one of a family of functions provided by libsodium to do Authenticated Encryption with Associated Data or AEAD Encryption. The “associated” data is plaintext (unencrypted) information that is mixed into the authentication signature of the encrypted data, such that when you authenticate the data, you also know that the associated data is authentic.
AEAD is helpful because often you have metadata associated with a secret, which isn't confidential but must not be forged.
In our secret example, we might associate a "secret
" with an "account_id
". But what if a malicious actor wanted to use someone else's secret on their own account? If someone could forge the account_id
data column, swapping an account_id
with their own account_id
, then you could be tricked into using the wrong secret. By “associating” the account_id
with the secret
, it cannot be forged without throwing an error.
Like above, this is done simply by extending the security label with the associated data column:
_11create table secrets (_11 id bigserial primary key,_11 secret text,_11 account_id integer,_11 key_id uuid not null default 'e348034b-3f07-4878-aad6-000511d12826'::uuid,_11 nonce bytea default pgsodium.crypto_aead_det_noncegen()_11);_11_11security label for pgsodium_11 on column secrets.secret_11 is 'ENCRYPT WITH KEY COLUMN key_id ASSOCIATED (account_id) NONCE nonce';
The new label indicates which column is to be associated with the secret, and that's it! Your account_id
and secret are now protected under the same authentication signature as the secret itself.
Resources#
- Supabase Vault
- Read more about Supabase Vault in the blog post
- Supabase Vault on GitHub