ZFS and encrypted backups
About
This blog post describes how to implement a redundant and secure backup solution for personal data.
The implementation is based on ZFS. An advanced file system which has three design goals: data integrity, pooled storage, and performance. In addition, ZFS supports native encryption at-rest.
Design
The main goal of the backup solution is to maintain data availability, integrity and privacy.
-
Availability: Hard drives will eventually fail. To reduce risk of data loss it is recommended to mirror disks in a pool. ZFS supports mirrored pools.
-
Integrity: ZFS includes checksums of data. Thus, ZFS can automatically correct errors if mirror or parity blocks are available.
-
Privacy: Access to the data should be restricted using access control mechanisms and encryption.
The personal data is stored on a local Network Attached Storage (NAS) server. However, to ensure data redundancy, the data should be replicated to a remote location. Whether you fully trust the remote location depends on a number of factors. The most secure solution is to implement client-side encryption before transferring data to the remote backup server.
Client side encryption is implemented using ZFS native encryption at-rest.
The backup and restore process
Backup
- Create an encrypted dataset on the local server
- Store data on the local dataset
- Create a local snapshot of the data
- Send a encrypted snapshot (raw data) to a remote ZFS file system
Restore
- Retrieve raw data from remote ZFS file system
- Decrypt raw data and mount dataset
The alternatives
There are alternatives to using native ZFS encryption:
- geli: block-device layer disk encryption at-rest for FreeBSD
- cryptsetup and dm-crypt: transparent disk encryption at-rest. Based on the Linux Unified Key Setup (LUKS) specification.
The main advantage using these is that the entire disk is encrypted. The encryption takes place beneath ZFS. A disadvantage is that each disk of a ZFS pool must be encrypted and decrypted before the disk is imported into the ZFS pool. Configuring geli or cryptsetup/dm-crypt on a ZFS volume is an option, but it is not compatible with ZFS’s compression feature. In addition, ZFS native encryption can use a powerful feature of the ZFS file system; incremental snapshots in combination with zfs raw send and zfs raw receive.
More details at: Overview of Encryption Implementation(Problems with Non-Native Encryption) [blog.heckel.xyz].
Security
ZFS native encryption is enabled on datasets or volumes. This implies that some of the data is not encrypted:
Encrypted | Not encrypted |
---|---|
File and Zvol data | Dataset / snapshot names |
File attrs, ACLs, permissions | Dataset hierarchy and properties |
Directory listings | Pool layout and file size |
FUID mappings and data | Deduplication tables |
Master encryption keys | Everything in RAM |
Note:
- deduplication data is encrypted.
- encrypted data is also encrypted when using L2ARC and ZIL compression
Whether plaintext data represents a significant risk has to be assessed in your threat model.
Deploy a remote backup server
Choose a suitable hosting provider and install an operating system with support for ZFS. FreeBSD running on Hetzner is a good alternative. Install FreeBSD from a rescue image by following this tutorial [community.hetzner.com]
Boot into the bsdinstaller
Configure ZFS on remote server
Configure ZFS on home server
I am running FreeBSD on a local NAS. ZFS is configured using two disks in mirror mode. The data is structured into several encrypted ZFS datasets.
Send encrypted snapshot
First, create a snapshot of the dataset. The snapshot is encrypted with keys derived from the provided passphrase. The snapshot is then sent to the remote backup server using SSH. The SSH protocol provides confidentiality, integrity and authentication in-transit. The remote backup server receives the snapshot as raw data. Thus, the remote backup server cannot decrypt the data.
IMPORTANT: The zfs send operation uses the --raw
option. This flag must be set according to the zfs-send.8:
Note that if you do not use this flag for sending encrypted datasets, data will be sent unencrypted and may be re-encrypted with a different encryption key on the receiving system, which will disable the ability to do a raw send to that system for incrementals.
Create and verify snapshot
Send encrypted snapshot to remote server
Verify on remote host
Incremental send
Subsequent backup operations is using incremental send. This operation is documented in zfs-send.8:
-I snapshot Generate a stream package that sends all intermediary snapshots from the first snapshot to the second snapshot. For example, -I @a fs@d is similar to -i @a fs@b; -i @b fs@c; -i @c fs@d. The incre- mental source may be specified as with the -i option.
Restore dataset from remote server
The data can be restored from the remote backup server by reversing the send and receive operations. This procedure use the -R option documented in zfs-send.8:
-R, –replicate Generate a replication stream package, which will replicate the specified file system, and all descendent file systems, up to the named snapshot. When received, all properties, snapshots, descendent file systems, and clones are preserved.
The restore procedure retrieves a raw data stream and stores this in a new dataset. Note that the dataset cannot be access before the encryption key is loaded.
A note about cryptography
Design and implementation
The developers describe the ZFS native encryption implementation in the document Overview of Encryption Implementation [blog.heckel.xyz].
ZFS native encryption use the block cipher Advanced Encryption Standard (AES), a symmetric-key algorithm. Block ciphers can operate in several modes of operation. ZFS native encryption supports Galois/Counter Mode (GCM) and CCM mode (counter with CBC-MAC).
The default cipher for ZFS native encryption is AES-256-GCM, an authenticated encryption cipher (AEAD) which provides both confidentiality and data origin authentication. Therefore, the encryption process produces both ciphertext, and a message authentication code (MAC). The MAC is used to verify the integrity of the data and the identity of the sender.
Both GCM and CCM require an unique initial vector (IV) which is used as a salt/nonce for the encryption algorithm. The IV is stored in the file system metadata, and is transferred together with the MAC to the remote server when using zfs send --raw
. The IV must be distinct, and cannot be reused. RFC 5288 - AES Galois Counter Mode (GCM) Cipher Suites for TLS, 6.1. Counter Reuse [www.rfc-editor.org]. The IV, however, does not have to be kept secret.
When creating a encrypted dataset using keyformat=passphrase, the user provides a secure passphrase and number of iterations. The passphrase, iterations and a random value from the Pseudo Random Number Generator (PRNG) are used as input arguments to a Password-Based Key Derivation Function 2 (PBKDF2) function.
The PBKDF2 function returns a wrapping key with high entropy. The wrapping key is then used to encrypt a randomly generated Master key. Finally, an encryption key is generated from a Hash-Based Key Derivation Function (HKDF) using the the Master key and a salt as input arguments.
The encryption key, together with the IV and plaintext data are arguments to the Encryption function which produces ciphertext and MAC from plaintext data:
Key rotation is managed by ZFS. Changing the user’s key (e.g. passphrase) does not require re-encrypting the entire dataset.
Some interesting milestones
-
Nov, 2016: ZFS native encryption is developed by Datto. Implementation details was presented in the document Overview of Encryption Implementation. The implementation status was Fully implemented (except for raw sends).
-
Jun, 2017: Tom Caputi, the author of ZFS native encryption, answers questions about the implementation on HackerNews Playing with ZFS encryption on Linux.
-
Aug, 2017: The commit Native Encryption for ZFS on Linux describes the implmentation details, including the support for raw, encrypted sends and receives.
-
Oct 2017: This PR includes fixes for bugs and documentation issues found after the encryption patch was merged and general code improvements for long-term maintainability Post-Encryption Followup
-
Aug, 2018: Added support for AES-NI in Add support for selecting encryption backend
-
Apr, 2019: Clarify and improve encryption documentation
-
Feb 2020: The default encryption algorithm was changed to AES-GCM in commit ICP: Improve AES-GCM performance
-
Aug 2020: Support for ZFS native encryption was included in FreeBSD 13. The ZFS implementation is now provided by OpenZFS.