ZFS and encrypted backups

12 minute read

About

This blog post describes how to implement a redundant and secure backup solution for personal data.

The implementation is based on ZFS. An advanced file system which has three design goals: data integrity, pooled storage, and performance. In addition, ZFS supports native encryption at-rest.

Design

The main goal of the backup solution is to maintain data availability, integrity and privacy.

  • Availability: Hard drives will eventually fail. To reduce risk of data loss it is recommended to mirror disks in a pool. ZFS supports mirrored pools.

  • Integrity: ZFS includes checksums of data. Thus, ZFS can automatically correct errors if mirror or parity blocks are available.

  • Privacy: Access to the data should be restricted using access control mechanisms and encryption.

The personal data is stored on a local Network Attached Storage (NAS) server. However, to ensure data redundancy, the data should be replicated to a remote location. Whether you fully trust the remote location depends on a number of factors. The most secure solution is to implement client-side encryption before transferring data to the remote backup server.

Client side encryption is implemented using ZFS native encryption at-rest.

The backup and restore process

Backup

  1. Create an encrypted dataset on the local server
  2. Store data on the local dataset
  3. Create a local snapshot of the data
  4. Send a encrypted snapshot (raw data) to a remote ZFS file system

Restore

  1. Retrieve raw data from remote ZFS file system
  2. Decrypt raw data and mount dataset

The alternatives

There are alternatives to using native ZFS encryption:

  • geli: block-device layer disk encryption at-rest for FreeBSD
  • cryptsetup and dm-crypt: transparent disk encryption at-rest. Based on the Linux Unified Key Setup (LUKS) specification.

The main advantage using these is that the entire disk is encrypted. The encryption takes place beneath ZFS. A disadvantage is that each disk of a ZFS pool must be encrypted and decrypted before the disk is imported into the ZFS pool. Configuring geli or cryptsetup/dm-crypt on a ZFS volume is an option, but it is not compatible with ZFS’s compression feature. In addition, ZFS native encryption can use a powerful feature of the ZFS file system; incremental snapshots in combination with zfs raw send and zfs raw receive.

More details at: Overview of Encryption Implementation(Problems with Non-Native Encryption) [blog.heckel.xyz].

Security

ZFS native encryption is enabled on datasets or volumes. This implies that some of the data is not encrypted:

Encrypted Not encrypted
File and Zvol data Dataset / snapshot names
File attrs, ACLs, permissions Dataset hierarchy and properties
Directory listings Pool layout and file size
FUID mappings and data Deduplication tables
Master encryption keys Everything in RAM

Note:

  • deduplication data is encrypted.
  • encrypted data is also encrypted when using L2ARC and ZIL compression

Whether plaintext data represents a significant risk has to be assessed in your threat model.

Deploy a remote backup server

Choose a suitable hosting provider and install an operating system with support for ZFS. FreeBSD running on Hetzner is a good alternative. Install FreeBSD from a rescue image by following this tutorial [community.hetzner.com]

Boot into the bsdinstaller

esp0x31@laptop: [~]$ ssh root@<server IP>
-------------------------------------------------------------------

Welcome to the FreeBSD Rescue System.

To install a new FreeBSD operation system, run 'bsdinstallimage'
and follow the instructions.

More information at http://wiki.hetzner.de

-------------------------------------------------------------------

[root@rescue ~]# export TERM=linux
[root@rescue ~]# curl -s http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/13.0-RELEASE/MANIFEST --output /usr/freebsd-dist/MANIFEST
[root@rescue ~]# bsdinstall

Configure ZFS on remote server

# List disks on server**
root@backup: [/root]# geom disk list | grep -A 2 Name
...
1. Name: da1
   Mediasize: 85899345920 (80G)
   Sectorsize: 512
--
root@backup: [/root]#

# Create ZFS dataset
root@backup: [/root]# zpool create zbackup /dev/da1

# Verify pools and dataset
root@backup: [/root]# zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zbackup  79.5G   408K  79.5G        -         -     0%     0%  1.00x    ONLINE  -
root@backup: [/root]# zfs list
NAME      USED  AVAIL     REFER  MOUNTPOINT
zbackup   336K  77.0G       96K  /zbackup
root@backup: [/root]# 

Configure ZFS on home server

I am running FreeBSD on a local NAS. ZFS is configured using two disks in mirror mode. The data is structured into several encrypted ZFS datasets.

# Create ZFS pool
root@nas: [/root]# zpool create zdata mirror /dev/ada1 /dev/ada2

# Create encrypted dataset
root@nas: [/root]# zfs create -o encryption=aes-256-gcm -o keylocation=prompt -o keyformat=passphrase zdata/media

# Verify encryption
root@nas: [/root]# zfs get -r -p encryption,keystatus,pbkdf2iters zdata/media
NAME                         PROPERTY     VALUE        SOURCE
zdata/media                  encryption   aes-256-gcm  -
zdata/media                  keystatus    available    -
zdata/media                  pbkdf2iters  350000       -
root@nas: [/root]#

Send encrypted snapshot

First, create a snapshot of the dataset. The snapshot is encrypted with keys derived from the provided passphrase. The snapshot is then sent to the remote backup server using SSH. The SSH protocol provides confidentiality, integrity and authentication in-transit. The remote backup server receives the snapshot as raw data. Thus, the remote backup server cannot decrypt the data.

IMPORTANT: The zfs send operation uses the --raw option. This flag must be set according to the zfs-send.8:

Note that if you do not use this flag for sending encrypted datasets, data will be sent unencrypted and may be re-encrypted with a different encryption key on the receiving system, which will disable the ability to do a raw send to that system for incrementals.

Create and verify snapshot

# Create snapshot
root@nas: [/zdata]# zfs snapshot zdata/media@first
root@nas: [/zdata]# 

# Verify snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME                   USED  AVAIL     REFER  MOUNTPOINT
zdata/media@first        0B      -     2.07G  -
root@nas: [/zdata]#

# Verify snapshot encryption
root@nas: [/zdata]# zfs get -r -p encryption,keystatus,compression zdata/media@first
NAME               PROPERTY     VALUE           SOURCE
zdata/media@first  encryption   aes-256-gcm     -
zdata/media@first  keystatus    available       -
zdata/media@first  compression  -               -
root@nas: [/zdata]#

Send encrypted snapshot to remote server

# IMPORTANT: use --raw to send raw data without encryption key
root@nas: [/zdata]# zfs send -v --raw zdata/media@first | ssh root@<server IP> "zfs receive zbackup/media@first"
full send of zdata/media@first estimated size is 2.06G
total estimated size is 2.06G
TIME        SENT   SNAPSHOT zdata/media@first
08:57:19   4.46M   zdata/media@first
08:57:20   10.6M   zdata/media@first
08:57:21   16.6M   zdata/media@first
08:57:22   22.6M   zdata/media@first
...
09:03:11   2.06G   zdata/media@first
root@nas: [/zdata]#
root@nas: [/zdata]# 

Verify on remote host

# List snapshots on remote backup server
root@backup: [/zbackup]# zfs list -rt snap zbackup
NAME                               USED  AVAIL     REFER  MOUNTPOINT
zbackup/media@first                  0B      -     2.07G  -
root@backup: [/zbackup]#

# List encryption key status on remote backup server
root@backup: [/zbackup]# zfs get -rt snap -p encryption,keystatus zbackup/media
NAME                 PROPERTY    VALUE        SOURCE
zbackup/media@first  encryption  aes-256-gcm  -
zbackup/media@first  keystatus   unavailable  -
root@backup: [/zbackup]#

Incremental send

Subsequent backup operations is using incremental send. This operation is documented in zfs-send.8:

-I snapshot Generate a stream package that sends all intermediary snapshots from the first snapshot to the second snapshot. For example, -I @a fs@d is similar to -i @a fs@b; -i @b fs@c; -i @c fs@d. The incre- mental source may be specified as with the -i option.

# Create snapshot
root@nas: [/zdata]# zfs snapshot zdata/media@2022-04-09

# Verify snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME                     USED  AVAIL     REFER  MOUNTPOINT
zdata/media@first       3.10M      -     2.07G  -
zdata/media@2022-04-09     0B      -     2.09G  -
root@nas: [/zdata]#

# Send incremental snapshot to remote backup server
# IMPORTANT: use --raw to send encrypted**
root@nas: [/zdata]# zfs send -v --raw -I zdata/media@first zdata/media@2022-04-09 | ssh root@<server IP> "zfs receive zbackup/media"
send from @first to zdata/media@2022-04-09 estimated size is 22.2M
total estimated size is 22.2M
TIME        SENT   SNAPSHOT zdata/media@2022-04-09
11:11:52   7.32M   zdata/media@2022-04-09
11:11:53   13.5M   zdata/media@2022-04-09
11:11:54   19.4M   zdata/media@2022-04-09
root@nas: [/zdata]#

# Verify snapshot on remote server
root@backup: [/zbackup]# zfs list -rt snap zbackup
NAME                               USED  AVAIL     REFER  MOUNTPOINT
zbackup/media@first               3.08M      -     2.07G  -
zbackup/media@2022-04-09             0B      -     2.09G  -
root@backup: [/zbackup]#

# Verify unavailable decryption keys remote server
root@backup: [/zbackup]# zfs get -rt snap -p encryption,keystatus zbackup/media
NAME                      PROPERTY    VALUE        SOURCE
zbackup/media@first       encryption  aes-256-gcm  -
zbackup/media@first       keystatus   unavailable  -
zbackup/media@2022-04-09  encryption  aes-256-gcm  -
zbackup/media@2022-04-09  keystatus   unavailable  -
root@backup: [/zbackup]#

Restore dataset from remote server

The data can be restored from the remote backup server by reversing the send and receive operations. This procedure use the -R option documented in zfs-send.8:

-R, –replicate Generate a replication stream package, which will replicate the specified file system, and all descendent file systems, up to the named snapshot. When received, all properties, snapshots, descendent file systems, and clones are preserved.

The restore procedure retrieves a raw data stream and stores this in a new dataset. Note that the dataset cannot be access before the encryption key is loaded.

# Restore all snapshots from remote server to local NAS
root@nas: [/zdata]# ssh root@<server IP> "zfs send -v --raw -R zbackup/media@2022-04-09" | zfs receive -v zdata/restore-media
full send of zbackup/media@first estimated size is 2.06G
send from @first to zbackup/media@2022-04-09 estimated size is 22.2M
total estimated size is 2.08G
receiving full stream of zbackup/media@first into zdata/restore-media@first
TIME        SENT   SNAPSHOT zbackup/media@first
11:16:34   4.33M   zbackup/media@first
11:16:35   8.72M   zbackup/media@first
11:16:36   21.1M   zbackup/media@first
...
11:18:13   2.00G   zbackup/media@first
11:18:14   2.03G   zbackup/media@first
11:18:15   2.05G   zbackup/media@first
received 2.06G stream in 103 seconds (20.5M/sec)
receiving incremental stream of zbackup/media@2022-04-09 into zdata/restore-media@2022-04-09
TIME        SENT   SNAPSHOT zbackup/media@2022-04-09
11:18:16   2.06G   zbackup/media@2022-04-09
11:18:17   2.07G   zbackup/media@2022-04-09
11:18:18   2.08G   zbackup/media@2022-04-09
received 22.7M stream in 3 seconds (7.56M/sec)
root@nas: [/zdata]#

# Verify local snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME                             USED  AVAIL     REFER  MOUNTPOINT
zdata/media@first               3.10M      -     2.07G  -
zdata/media@2022-04-09             0B      -     2.09G  -
zdata/restore-media@first       3.08M      -     2.07G  -
zdata/restore-media@2022-04-09     0B      -     2.09G  -
root@nas: [/zdata]#

# Fail: load encryption key
root@nas: [/zdata]# zfs mount zdata/restore-media
cannot mount 'zdata/restore-media': encryption key not loaded
root@nas: [/zdata]# 

# Load encryption key
root@nas: [/zdata]# zfs load-key -r zdata/restore-media
Enter passphrase for 'zdata/restore-media':
1 / 1 key(s) successfully loaded
root@nas: [/zdata]# zfs mount zdata/restore-media
root@nas: [/zdata]# cat /zdata/restore-media/README.md
.
root@nas: [/zdata]#

A note about cryptography

Design and implementation

The developers describe the ZFS native encryption implementation in the document Overview of Encryption Implementation [blog.heckel.xyz].

ZFS native encryption use the block cipher Advanced Encryption Standard (AES), a symmetric-key algorithm. Block ciphers can operate in several modes of operation. ZFS native encryption supports Galois/Counter Mode (GCM) and CCM mode (counter with CBC-MAC).

The default cipher for ZFS native encryption is AES-256-GCM, an authenticated encryption cipher (AEAD) which provides both confidentiality and data origin authentication. Therefore, the encryption process produces both ciphertext, and a message authentication code (MAC). The MAC is used to verify the integrity of the data and the identity of the sender.

Both GCM and CCM require an unique initial vector (IV) which is used as a salt/nonce for the encryption algorithm. The IV is stored in the file system metadata, and is transferred together with the MAC to the remote server when using zfs send --raw. The IV must be distinct, and cannot be reused. RFC 5288 - AES Galois Counter Mode (GCM) Cipher Suites for TLS, 6.1. Counter Reuse [www.rfc-editor.org]. The IV, however, does not have to be kept secret.

When creating a encrypted dataset using keyformat=passphrase, the user provides a secure passphrase and number of iterations. The passphrase, iterations and a random value from the Pseudo Random Number Generator (PRNG) are used as input arguments to a Password-Based Key Derivation Function 2 (PBKDF2) function.

The PBKDF2 function returns a wrapping key with high entropy. The wrapping key is then used to encrypt a randomly generated Master key. Finally, an encryption key is generated from a Hash-Based Key Derivation Function (HKDF) using the the Master key and a salt as input arguments.

The encryption key, together with the IV and plaintext data are arguments to the Encryption function which produces ciphertext and MAC from plaintext data:

                                                                                       -----------
                                                               -------------           |         |
                                                               | Plaintext |-------->  |         |     -------------- 
            -------------     --------                         -------------           |         | --> | Ciphertext | 
            | MasterKey | --> |      |      --------------     ------------------      |         |     --------------
--------    -------------     | HKDF | -- > | Salt cache | --> | Encryption key | -->  | Encrypt | 
|      |      --------        |      |      --------------     ------------------      |         |     --------------
| PRNG | -->  | Salt |    --> |      |                                                 |         | --> |     MAC    |
|      |      --------        --------                         ----------              |         |     --------------
|      | ----------------------------------------------------> |   IV   | --------->   |         |
-------                                                        ----------              -----------
                                                                                       

Key rotation is managed by ZFS. Changing the user’s key (e.g. passphrase) does not require re-encrypting the entire dataset.

Some interesting milestones

Recommended resources

Updated: