You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-11-19-encryption-in-duckdb.md
+31-21Lines changed: 31 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ tags: ["deep dive"]
10
10
11
11
> If you would like to use encryption in DuckDB, we recommend using the latest stable version, v1.4.2. For more details, see the [latest release blog post]({% post_url 2025-11-12-announcing-duckdb-142 %}#vulnerabilities).
12
12
13
-
Many years ago, we read the excellent “[Code Book](https://en.wikipedia.org/wiki/The_Code_Book)” by [Simon Singh](https://en.wikipedia.org/wiki/Simon_Singh). Did you know that [Mary, Queen of Scots](https://en.wikipedia.org/wiki/Mary,_Queen_of_Scots), used an [encryption method harking back to Julius Caesar](https://en.wikipedia.org/wiki/Caesar_cipher) to encrypt her more saucy letters? But alas: The cipher was broken and the contents of the letters got her [executed](https://en.wikipedia.org/wiki/Execution_of_Mary,_Queen_of_Scots).
13
+
Many years ago, we read the excellent “[Code Book](https://en.wikipedia.org/wiki/The_Code_Book)” by [Simon Singh](https://en.wikipedia.org/wiki/Simon_Singh). Did you know that [Mary, Queen of Scots](https://en.wikipedia.org/wiki/Mary,_Queen_of_Scots), used an [encryption method harking back to Julius Caesar](https://en.wikipedia.org/wiki/Caesar_cipher) to encrypt her more saucy letters? But alas: the cipher was broken and the contents of the letters got her [executed](https://en.wikipedia.org/wiki/Execution_of_Mary,_Queen_of_Scots).
14
14
15
15
Those [days](https://en.wikipedia.org/wiki/Crypto_Wars), strong encryption software and hardware is a commodity. Modern CPUs [come with specialized cryptography instructions](https://developer.arm.com/documentation/ddi0602/2025-09/SIMD-FP-Instructions/AESE--AES-single-round-encryption-), and operating systems small and big contain [mostly](https://www.heartbleed.com/)-robust cryptography software like OpenSSL.
16
16
@@ -53,14 +53,11 @@ After the main database header, DuckDB stores two 4KB database headers that cont
53
53
54
54
Blocks in DuckDB are by default 256KB, but their size is configurable. At the start of each *plaintext* block there is an 8-byte block header, which stores an 8-byte checksum. The checksum is a simple calculation that is often used in database systems to check for any corrupted data.
55
55
56
+
<imgsrc="{% link images/blog/encryption/checksum.png %}"width="400" />
56
57
57
-
Figure 3
58
+
For encrypted blocks however, its block header consists of 64-bytes instead of 8 bytes for the checksum. The block header for encrypted blocks contains a 16-byte *nonce/IV* and, optionally, a 16-byte *tag*, depending on which encryption cipher is used. The nonce and tag are stored in plaintext, but the checksum is encrypted for better security. Note that the block header always needs to be 8-bytes aligned to calculate the checksum.
58
59
59
-
60
-
For encrypted blocks however, its block header consists of 64-bytes instead of 8 bytes for the checksum (Figure 4). The block header for encrypted blocks contains a 16-byte *nonce/IV* and, optionally, a 16-byte *tag*, depending on which encryption cipher is used. The nonce and tag are stored in plaintext, but the checksum is encrypted for better security. Note that the block header always needs to be 8-bytes aligned to calculate the checksum.
61
-
62
-
63
-
Figure 4
60
+
<imgsrc="{% link images/blog/encryption/encrypted-blocks.png %}"width="400" />
This way you’ll disable a checkpointing on closing the database, meaning that the WAL does not get merged into the main database file. In addition, by setting wal_autocheckpoint to a high threshold, this will avoid intermediate checkpoints to happen and the WAL will persist. For example, we can create a persistent WAL file by first setting the above PRAGMAS, then attach an encrypted database, and then create a table where we insert 3 values.
@@ -85,11 +85,11 @@ If we now close the DuckDB process, we can see that there is a `.wal` file shown
85
85
86
86
Before writing new entries (inserts, updates, deletes) to the database, these entries are essentially logged and appended to the WAL. Only *after* logged entries are flushed to disk, a transaction is considered as committed. A plaintext WAL entry has the following structure:
87
87
88
-
Figure 7
88
+
<imgsrc="{% link images/blog/encryption/plaintext-wal-entry.png %}"width="400" />
89
89
90
90
Since the WAL is append-only, we encrypt a WAL entry *per value*. For AES-GCM this means that we append a nonce and a tag to each entry. The structure in which we do this is depicted in [image]. When we serialize an encrypted entry to the encrypted WAL, we first store the length in plaintext, because we need to know how many bytes we should decrypt. The length is followed by a nonce, which on its turn is followed by the encrypted checksum and the encrypted entry itself. After the entry, a 16-byte tag is stored for verification.
91
91
92
-
Figure 8
92
+
<imgsrc="{% link images/blog/encryption/encrypted-wal-entry.png %}"width="400" />
93
93
94
94
Encrypting the WAL is triggered by default when an encryption key is given for any (un)encrypted database.
95
95
@@ -99,7 +99,7 @@ Temporary files are used to store intermediate data that is often necessary for
99
99
100
100
#### The Structure of Temporary Files
101
101
102
-
There are three different types of temporary files in DuckDB: (1) temporary files that have the same layout as a regular 256KB block (figure 3), (2) compressed temporary files and (3) temporary files that exceed the standard 256KB block size. The former two are suffixed with .tmp, while the latter is distinguished by a suffix with .block. To keep track of the size of .block temporary files, they are always prefixed with its length. As opposed to regular database blocks, temporary files do not contain a checksum to check for data corruption, since the calculation of a checksum is somewhat expensive.
102
+
There are three different types of temporary files in DuckDB: (1) temporary files that have the same layout as a regular 256KB block, (2) compressed temporary files and (3) temporary files that exceed the standard 256KB block size. The former two are suffixed with .tmp, while the latter is distinguished by a suffix with .block. To keep track of the size of .block temporary files, they are always prefixed with its length. As opposed to regular database blocks, temporary files do not contain a checksum to check for data corruption, since the calculation of a checksum is somewhat expensive.
103
103
104
104
#### Encrypting Temporary Files
105
105
@@ -109,7 +109,10 @@ To force DuckDB to produce temporary files, you can use a simple trick by just s
@@ -132,7 +135,7 @@ In DuckDB, you can (1) encrypt an existing database, (2) initialize a new, empty
132
135
install tpch;
133
136
load tpch;
134
137
ATTACH 'encrypted.duckdb' AS encrypted (ENCRYPTION_KEY 'asdf');
135
-
ATTACH 'unencrypted.duckdb' as unencrypted;
138
+
ATTACH 'unencrypted.duckdb' AS unencrypted;
136
139
USE unencrypted;
137
140
CALL dbgen(sf=1);
138
141
COPY FROM DATABASE unencrypted to encrypted;
@@ -141,15 +144,19 @@ There is not a trivial way to prove that a database is encrypted, but correctly
141
144
142
145
When we use ent after executing the above chunk of SQL, i.e., `ent encrypted.duckdb`, this will result in an entropy of 7.99999 bits per byte. If we do the same for the plaintext (unencrypted) database, this results in 7.65876 bits per byte. Note that the plaintext database also has a high entropy, but this is due to compression.
143
146
144
-
Let’s now visualize both the plaintext and encrypted data with binocle. For the visualization we created both a plaintext DuckDB database with scale factor 1 of TPC-H data, and an encrypted one. The binary data of the plaintext database is visualized in figure 5, while the encrypted database is visualized in figure 6.
147
+
Let’s now visualize both the plaintext and encrypted data with binocle. For the visualization we created both a plaintext DuckDB database with scale factor 1 of TPC-H data and an encrypted one:
In these figures, we can clearly observe that the encrypted database file (figure 6) seems completely random, while the plaintext database file (figure 5) shows some clear structure in its binary data.
159
+
In these figures, we can clearly observe that the encrypted database file seems completely random, while the plaintext database file shows some clear structure in its binary data.
153
160
154
161
To decrypt an encrypted database, we can use the following SQL:
155
162
@@ -170,7 +177,10 @@ COPY FROM DATABASE encrypted TO new_encrypted;
170
177
The default encryption algorithm is AES GCM. This is recommended since it also authenticates data by calculating a tag. Depending on the use case, you can also use AES CTR. This is faster than AES GCM since it skips calculating a tag after encrypting all data. You can specify the CTR cipher as follows:
Now we use DuckDB’s neat `SUMMARIZE` command three times: Once on the unencrypted database, and once on the encrypted database using MbedTLS and once on the encrypted database using OpenSSL. We set a very low memory limit to force more reading and writing from disk.
218
+
Now we use DuckDB’s neat `SUMMARIZE` command three times: once on the unencrypted database, and once on the encrypted database using MbedTLS and once on the encrypted database using OpenSSL. We set a very low memory limit to force more reading and writing from disk.
0 commit comments