ZFS and encrypted backups
About
This blog post describes how to implement a redundant and secure backup solution for personal data.
The implementation is based on ZFS. An advanced file system which has three design goals: data integrity, pooled storage, and performance. In addition, ZFS supports native encryption at-rest.
Design
The main goal of the backup solution is to maintain data availability, integrity and privacy.
-
Availability: Hard drives will eventually fail. To reduce risk of data loss it is recommended to mirror disks in a pool. ZFS supports mirrored pools.
-
Integrity: ZFS includes checksums of data. Thus, ZFS can automatically correct errors if mirror or parity blocks are available.
-
Privacy: Access to the data should be restricted using access control mechanisms and encryption.
The personal data is stored on a local Network Attached Storage (NAS) server. However, to ensure data redundancy, the data should be replicated to a remote location. Whether you fully trust the remote location depends on a number of factors. The most secure solution is to implement client-side encryption before transferring data to the remote backup server.
Client side encryption is implemented using ZFS native encryption at-rest.
The backup and restore process
Backup
- Create an encrypted dataset on the local server
- Store data on the local dataset
- Create a local snapshot of the data
- Send a encrypted snapshot (raw data) to a remote ZFS file system
Restore
- Retrieve raw data from remote ZFS file system
- Decrypt raw data and mount dataset
The alternatives
There are alternatives to using native ZFS encryption:
- geli: block-device layer disk encryption at-rest for FreeBSD
- cryptsetup and dm-crypt: transparent disk encryption at-rest. Based on the Linux Unified Key Setup (LUKS) specification.
The main advantage using these is that the entire disk is encrypted. The encryption takes place beneath ZFS. A disadvantage is that each disk of a ZFS pool must be encrypted and decrypted before the disk is imported into the ZFS pool. Configuring geli or cryptsetup/dm-crypt on a ZFS volume is an option, but it is not compatible with ZFS’s compression feature. In addition, ZFS native encryption can use a powerful feature of the ZFS file system; incremental snapshots in combination with zfs raw send and zfs raw receive.
More details at: Overview of Encryption Implementation(Problems with Non-Native Encryption) [blog.heckel.xyz].
Security
ZFS native encryption is enabled on datasets or volumes. This implies that some of the data is not encrypted:
Encrypted | Not encrypted |
---|---|
File and Zvol data | Dataset / snapshot names |
File attrs, ACLs, permissions | Dataset hierarchy and properties |
Directory listings | Pool layout and file size |
FUID mappings and data | Deduplication tables |
Master encryption keys | Everything in RAM |
Note:
- deduplication data is encrypted.
- encrypted data is also encrypted when using L2ARC and ZIL compression
Whether plaintext data represents a significant risk has to be assessed in your threat model.
Deploy a remote backup server
Choose a suitable hosting provider and install an operating system with support for ZFS. FreeBSD running on Hetzner is a good alternative. Install FreeBSD from a rescue image by following this tutorial [community.hetzner.com]
Boot into the bsdinstaller
esp0x31@laptop: [~]$ ssh root@<server IP>
-------------------------------------------------------------------
Welcome to the FreeBSD Rescue System.
To install a new FreeBSD operation system, run 'bsdinstallimage'
and follow the instructions.
More information at http://wiki.hetzner.de
-------------------------------------------------------------------
[root@rescue ~]# export TERM=linux
[root@rescue ~]# curl -s http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/13.0-RELEASE/MANIFEST --output /usr/freebsd-dist/MANIFEST
[root@rescue ~]# bsdinstall
Configure ZFS on remote server
# List disks on server**
root@backup: [/root]# geom disk list | grep -A 2 Name
...
1. Name: da1
Mediasize: 85899345920 (80G)
Sectorsize: 512
--
root@backup: [/root]#
# Create ZFS dataset
root@backup: [/root]# zpool create zbackup /dev/da1
# Verify pools and dataset
root@backup: [/root]# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zbackup 79.5G 408K 79.5G - - 0% 0% 1.00x ONLINE -
root@backup: [/root]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zbackup 336K 77.0G 96K /zbackup
root@backup: [/root]#
Configure ZFS on home server
I am running FreeBSD on a local NAS. ZFS is configured using two disks in mirror mode. The data is structured into several encrypted ZFS datasets.
# Create ZFS pool
root@nas: [/root]# zpool create zdata mirror /dev/ada1 /dev/ada2
# Create encrypted dataset
root@nas: [/root]# zfs create -o encryption=aes-256-gcm -o keylocation=prompt -o keyformat=passphrase zdata/media
# Verify encryption
root@nas: [/root]# zfs get -r -p encryption,keystatus,pbkdf2iters zdata/media
NAME PROPERTY VALUE SOURCE
zdata/media encryption aes-256-gcm -
zdata/media keystatus available -
zdata/media pbkdf2iters 350000 -
root@nas: [/root]#
Send encrypted snapshot
First, create a snapshot of the dataset. The snapshot is encrypted with keys derived from the provided passphrase. The snapshot is then sent to the remote backup server using SSH. The SSH protocol provides confidentiality, integrity and authentication in-transit. The remote backup server receives the snapshot as raw data. Thus, the remote backup server cannot decrypt the data.
IMPORTANT: The zfs send operation uses the --raw
option. This flag must be set according to the zfs-send.8:
Note that if you do not use this flag for sending encrypted datasets, data will be sent unencrypted and may be re-encrypted with a different encryption key on the receiving system, which will disable the ability to do a raw send to that system for incrementals.
Create and verify snapshot
# Create snapshot
root@nas: [/zdata]# zfs snapshot zdata/media@first
root@nas: [/zdata]#
# Verify snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME USED AVAIL REFER MOUNTPOINT
zdata/media@first 0B - 2.07G -
root@nas: [/zdata]#
# Verify snapshot encryption
root@nas: [/zdata]# zfs get -r -p encryption,keystatus,compression zdata/media@first
NAME PROPERTY VALUE SOURCE
zdata/media@first encryption aes-256-gcm -
zdata/media@first keystatus available -
zdata/media@first compression - -
root@nas: [/zdata]#
Send encrypted snapshot to remote server
# IMPORTANT: use --raw to send raw data without encryption key
root@nas: [/zdata]# zfs send -v --raw zdata/media@first | ssh root@<server IP> "zfs receive zbackup/media@first"
full send of zdata/media@first estimated size is 2.06G
total estimated size is 2.06G
TIME SENT SNAPSHOT zdata/media@first
08:57:19 4.46M zdata/media@first
08:57:20 10.6M zdata/media@first
08:57:21 16.6M zdata/media@first
08:57:22 22.6M zdata/media@first
...
09:03:11 2.06G zdata/media@first
root@nas: [/zdata]#
root@nas: [/zdata]#
Verify on remote host
# List snapshots on remote backup server
root@backup: [/zbackup]# zfs list -rt snap zbackup
NAME USED AVAIL REFER MOUNTPOINT
zbackup/media@first 0B - 2.07G -
root@backup: [/zbackup]#
# List encryption key status on remote backup server
root@backup: [/zbackup]# zfs get -rt snap -p encryption,keystatus zbackup/media
NAME PROPERTY VALUE SOURCE
zbackup/media@first encryption aes-256-gcm -
zbackup/media@first keystatus unavailable -
root@backup: [/zbackup]#
Incremental send
Subsequent backup operations is using incremental send. This operation is documented in zfs-send.8:
-I snapshot Generate a stream package that sends all intermediary snapshots from the first snapshot to the second snapshot. For example, -I @a fs@d is similar to -i @a fs@b; -i @b fs@c; -i @c fs@d. The incre- mental source may be specified as with the -i option.
# Create snapshot
root@nas: [/zdata]# zfs snapshot zdata/media@2022-04-09
# Verify snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME USED AVAIL REFER MOUNTPOINT
zdata/media@first 3.10M - 2.07G -
zdata/media@2022-04-09 0B - 2.09G -
root@nas: [/zdata]#
# Send incremental snapshot to remote backup server
# IMPORTANT: use --raw to send encrypted**
root@nas: [/zdata]# zfs send -v --raw -I zdata/media@first zdata/media@2022-04-09 | ssh root@<server IP> "zfs receive zbackup/media"
send from @first to zdata/media@2022-04-09 estimated size is 22.2M
total estimated size is 22.2M
TIME SENT SNAPSHOT zdata/media@2022-04-09
11:11:52 7.32M zdata/media@2022-04-09
11:11:53 13.5M zdata/media@2022-04-09
11:11:54 19.4M zdata/media@2022-04-09
root@nas: [/zdata]#
# Verify snapshot on remote server
root@backup: [/zbackup]# zfs list -rt snap zbackup
NAME USED AVAIL REFER MOUNTPOINT
zbackup/media@first 3.08M - 2.07G -
zbackup/media@2022-04-09 0B - 2.09G -
root@backup: [/zbackup]#
# Verify unavailable decryption keys remote server
root@backup: [/zbackup]# zfs get -rt snap -p encryption,keystatus zbackup/media
NAME PROPERTY VALUE SOURCE
zbackup/media@first encryption aes-256-gcm -
zbackup/media@first keystatus unavailable -
zbackup/media@2022-04-09 encryption aes-256-gcm -
zbackup/media@2022-04-09 keystatus unavailable -
root@backup: [/zbackup]#
Restore dataset from remote server
The data can be restored from the remote backup server by reversing the send and receive operations. This procedure use the -R option documented in zfs-send.8:
-R, –replicate Generate a replication stream package, which will replicate the specified file system, and all descendent file systems, up to the named snapshot. When received, all properties, snapshots, descendent file systems, and clones are preserved.
The restore procedure retrieves a raw data stream and stores this in a new dataset. Note that the dataset cannot be access before the encryption key is loaded.
# Restore all snapshots from remote server to local NAS
root@nas: [/zdata]# ssh root@<server IP> "zfs send -v --raw -R zbackup/media@2022-04-09" | zfs receive -v zdata/restore-media
full send of zbackup/media@first estimated size is 2.06G
send from @first to zbackup/media@2022-04-09 estimated size is 22.2M
total estimated size is 2.08G
receiving full stream of zbackup/media@first into zdata/restore-media@first
TIME SENT SNAPSHOT zbackup/media@first
11:16:34 4.33M zbackup/media@first
11:16:35 8.72M zbackup/media@first
11:16:36 21.1M zbackup/media@first
...
11:18:13 2.00G zbackup/media@first
11:18:14 2.03G zbackup/media@first
11:18:15 2.05G zbackup/media@first
received 2.06G stream in 103 seconds (20.5M/sec)
receiving incremental stream of zbackup/media@2022-04-09 into zdata/restore-media@2022-04-09
TIME SENT SNAPSHOT zbackup/media@2022-04-09
11:18:16 2.06G zbackup/media@2022-04-09
11:18:17 2.07G zbackup/media@2022-04-09
11:18:18 2.08G zbackup/media@2022-04-09
received 22.7M stream in 3 seconds (7.56M/sec)
root@nas: [/zdata]#
# Verify local snapshot
root@nas: [/zdata]# zfs list -rt snapshot zdata
NAME USED AVAIL REFER MOUNTPOINT
zdata/media@first 3.10M - 2.07G -
zdata/media@2022-04-09 0B - 2.09G -
zdata/restore-media@first 3.08M - 2.07G -
zdata/restore-media@2022-04-09 0B - 2.09G -
root@nas: [/zdata]#
# Fail: load encryption key
root@nas: [/zdata]# zfs mount zdata/restore-media
cannot mount 'zdata/restore-media': encryption key not loaded
root@nas: [/zdata]#
# Load encryption key
root@nas: [/zdata]# zfs load-key -r zdata/restore-media
Enter passphrase for 'zdata/restore-media':
1 / 1 key(s) successfully loaded
root@nas: [/zdata]# zfs mount zdata/restore-media
root@nas: [/zdata]# cat /zdata/restore-media/README.md
.
root@nas: [/zdata]#
A note about cryptography
Design and implementation
The developers describe the ZFS native encryption implementation in the document Overview of Encryption Implementation [blog.heckel.xyz].
ZFS native encryption use the block cipher Advanced Encryption Standard (AES), a symmetric-key algorithm. Block ciphers can operate in several modes of operation. ZFS native encryption supports Galois/Counter Mode (GCM) and CCM mode (counter with CBC-MAC).
The default cipher for ZFS native encryption is AES-256-GCM, an authenticated encryption cipher (AEAD) which provides both confidentiality and data origin authentication. Therefore, the encryption process produces both ciphertext, and a message authentication code (MAC). The MAC is used to verify the integrity of the data and the identity of the sender.
Both GCM and CCM require an unique initial vector (IV) which is used as a salt/nonce for the encryption algorithm. The IV is stored in the file system metadata, and is transferred together with the MAC to the remote server when using zfs send --raw
. The IV must be distinct, and cannot be reused. RFC 5288 - AES Galois Counter Mode (GCM) Cipher Suites for TLS, 6.1. Counter Reuse [www.rfc-editor.org]. The IV, however, does not have to be kept secret.
When creating a encrypted dataset using keyformat=passphrase, the user provides a secure passphrase and number of iterations. The passphrase, iterations and a random value from the Pseudo Random Number Generator (PRNG) are used as input arguments to a Password-Based Key Derivation Function 2 (PBKDF2) function.
The PBKDF2 function returns a wrapping key with high entropy. The wrapping key is then used to encrypt a randomly generated Master key. Finally, an encryption key is generated from a Hash-Based Key Derivation Function (HKDF) using the the Master key and a salt as input arguments.
The encryption key, together with the IV and plaintext data are arguments to the Encryption function which produces ciphertext and MAC from plaintext data:
-----------
------------- | |
| Plaintext |--------> | | --------------
------------- -------- ------------- | | --> | Ciphertext |
| MasterKey | --> | | -------------- ------------------ | | --------------
-------- ------------- | HKDF | -- > | Salt cache | --> | Encryption key | --> | Encrypt |
| | -------- | | -------------- ------------------ | | --------------
| PRNG | --> | Salt | --> | | | | --> | MAC |
| | -------- -------- ---------- | | --------------
| | ----------------------------------------------------> | IV | ---------> | |
------- ---------- -----------
Key rotation is managed by ZFS. Changing the user’s key (e.g. passphrase) does not require re-encrypting the entire dataset.
Some interesting milestones
-
Nov, 2016: ZFS native encryption is developed by Datto. Implementation details was presented in the document Overview of Encryption Implementation. The implementation status was Fully implemented (except for raw sends).
-
Jun, 2017: Tom Caputi, the author of ZFS native encryption, answers questions about the implementation on HackerNews Playing with ZFS encryption on Linux.
-
Aug, 2017: The commit Native Encryption for ZFS on Linux describes the implmentation details, including the support for raw, encrypted sends and receives.
-
Oct 2017: This PR includes fixes for bugs and documentation issues found after the encryption patch was merged and general code improvements for long-term maintainability Post-Encryption Followup
-
Aug, 2018: Added support for AES-NI in Add support for selecting encryption backend
-
Apr, 2019: Clarify and improve encryption documentation
-
Feb 2020: The default encryption algorithm was changed to AES-GCM in commit ICP: Improve AES-GCM performance
-
Aug 2020: Support for ZFS native encryption was included in FreeBSD 13. The ZFS implementation is now provided by OpenZFS.