How to create simple encrypted remote backups

30.03.2021 yahe administration linux security

Every once in while I get asked if a certain backup scheme is a good idea and oftentimes the suggested backup solution is beyond what I would use myself. Duplicity, its simplification Duply or the not-so-dissimilar contenders Borg and Restic are among those solutions that are mentioned most often, with solutions like Bacula and its offspring Bareos coming much later.

Unfortunately, I would not trust any of these tools further than I could throw a harddrive containing a backup created with them. The reason for this is somewhat simple: In my opinion, all of these solutions are too complex in a worst-case scenario.

As soon as I mention this opinion, most people I talk to want to know how I do backups and they want to know if I ever tried those integrated solutions. Yes, I have used Duplicity for years until a system of mine broke down. I had an up-to-date backup of that system but still lost a lot of (fortunately not so important) data because the Duplicity backup had become inconsistent over time without notice. I was able to manually extract some data out of the backup, but it was not worth the time. That was the moment when I decided that I did not want to be reliant on such a software again.

1. The design goals

There are several design goals that I wanted to achieve with my personal backup solution:

  • It should be encrypted.
  • It should be easy to test the restorability of the backup.
  • It should work with off-the-shelf software that people already know.
  • It should be suitable for cost-efficient remote backups in the cloud.
  • It should work with backup targets without requiring special server software.

There also some non-goals that are not so important for me:

  • It is not required to backup live systems. A file-based backup is sufficent and there are means to prevent files from being modified during the backup process like temporary filesystem snapshots that are used as the basis for the actual backup.
  • It is not required to have versioned backups. While this would be an added bonus, having an up-to-date backup that can be restored reliably is much more important.
  • It is not required to deduplicate content. Deduplication increases complexity.

2. The encryption layer

Let us start with the encryption. For this I have chosen the FUSE wrapper GoCryptFS which is available on GitHub. It is developed by @rfjacob who was one of the most active maintainers of the well-known EncFS encryption layer back in 2015-2018. The "project was inspired by EncFS and strives to fix its security issues while providing good performance" and it looks like he achieved that goal.

Using GoCryptFS ist pretty simple. After downloading the static binary from the GitHub repository you can create the required folders and initialize a so-called reverse repository. A reverse repository takes an unencrypted source directory and provides an encrypted version of the contained files through a second directory. This way you can encrypt files in-memory on-the-fly instead of requiring additional storage space for the encrypted copy of the files. The ad-hoc encryption and decryption of GoCrypt will come in handy for restores as well.

For our purposes we will create three folders:

  • ./unencrypted will contain our source material to be backed up
  • ./encrypted will contain the ad-hoc encrypted files to be backed up
  • ./decrypted will contain the ad-hoc decrypted files of the backup
### create the local folders
mkdir -p ./unencrypted ./encrypted ./decrypted

### initialize the reverse encryption
gocryptfs --init --reverse ./unencrypted

### you can use the --plaintextnames parameter
### if the file names are not confidential
# gocryptfs --init --reverse --plaintextnames ./unencrypted

### mount the unencrypted folder in reverse mode
gocryptfs --reverse ./unencrypted ./encrypted

After initializing the reverse repository in the ./unencrypted folder you will find a new file called ./unencrypted/.gocryptfs.reverse.conf. This file contains relevant encryption parameters that are required to be able to encrypt the files. When the reverse repository is mounted into the ./encrypted folder you will find a file called ./encrypted/gocryptfs.conf which is an exact copy of the previous ./unencrypted/.gocryptfs.reverse.conf file. It is required to be able to decrypt the files again. You must not lose this file!

There are two possibilities to prevent this:

  • You can create a paper-based backup of the ./encrypted/gocryptfs.conf. As the configuration can only be used in conjunction with the corresponding password, it should be save to store this paper-based backup somewhere even if someone should be able to read it.
  • You can create a paper-based backup of the masterkey that is printed to the screen when initializing the reverse repository. However, you have to make sure that no-one can read the masterkey as it is not password-protected.

If you did not use the --plaintextnames or the --deterministic-names option you should also find a new file called ./encrypted/gocryptfs.diriv. By default GoCryptFS does not only encrypt the file content but also the file name. The directory initialization vectors (dirivs) contained in the gocryptfs.diriv files are required to be able to decrypt the file names again. If you cannot risk to lose file names or if your file names are not confidential then it might be better for you to disable the file name encryption or use the deterministic file name encryption.

3. The remote access

I use a FUSE wrapper to mount the remote storage as if it were a local device. Typically your backup software would have to be able to upload files to the remote storage itself, unnecessarily complicating things. The FUSE wrapper takes this complexity away from the actual backup tool. Using a FUSE wrapper will also come in handy later on when restoring data.

Typical FUSE wrappers include:

  • davfs2 for WebDAV compatible storage
  • s3fs for AWS S3 compatible storage
  • SSHFS for SFTP storage

I personally use SSHFS as I use a remote VM with SFTP access to store my backups. This also reduces the amount of transferred data for the restore test as we will see later.

### create the remote folder
mkdir ./backup

### mount the remote storage
sshfs backup@backup.example.com:/backup ./backup

### you should use additional parameters
### if you run into problems
# sshfs backup@backup.example.com:/backup ./backup -o ServerAliveInterval=15 -o idmap=user -o uid=$(id -u) -o gid=$(id -g) -o rw

### create the remote subfolders
mkdir ./backup/checksums ./backup/files ./backup/snapshots

4. The backup process

Now we are ready to create the actual backup. For this we will use rsync which is used far and wide for such tasks and has some nice benefits:

  • By default, rsync identifies modified files through their size and last-modification date. Thanks to the FUSE wrapper the local size and last-modification date and the remote size and last-modification date can easily be compared without having to download the files. So unlike other approaches you do not need a local copy of your whole backup. (If you use an SSH server as a backup target you could also use the integrated SSH support of rsync.)
  • rsync can easily be restarted should a synchronization fail. Unlike other solutions you do not have to wonder what happens when a backup task really fails. Files are encrypted in-memory thanks to GoCryptFS and rsync just starts comparing the local and remote copy from the beginning when restarting the synchronization process.
  • Thanks to the --backup-dir= and --delete parameters rsync provides a rather simple versioning of files. Files that have changed or that have been deleted between synchronizations are moved to the provided backup directory path and can easily be accessed. If you need more storage you can delete backup directories of earlier synchronizations.
### copy over files and keep modified and deleted files
rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --delete ./encrypted/ ./backup/files/

### you should add the --chmod=+w parameter
### if created folders in the backup target are not writable
# rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --chmod=+w --delete ./encrypted/ ./backup/files/

5. The restore test (regularly)

You do not have a proper backup unless you have successfully tried to restore it. However, restore-testing remote backups can be ressource-intensive. The way I do it is to calculate checksums of the local files and of the remote files which are then compared to make sure that the remote copy is identical to the local copy.

### enter the encrypted directory
cd ./encrypted

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../original

### copy the checksums over to the remote server
cp ../original ../backup/checksums/original

### leave the encrypted directory
cd ..

Calculating the checksums of the remote files through the FUSE wrapper is possible. Unfortunately, this would mean to download the whole backup in the background. As the checksum calculation is separated from the checksum comparison we can optimize things a bit. Given that you have SSH access to the remote target you can log into the remote server, calculate the checksums there and only transfer the checksum file to compare it with the checksums of the original files. This greatly reduces the amount of data that have to be transferred for the restore test.

### enter the backup files directory
cd ./backup/files

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../checksums/backup

### leave the backup files directory
cd ../..

Comparing checksums that have been written to files has one caveat. The files might be sorted differently. You have to remember this and sort the checksum files before diffing them or otherwise you might find a lot of deviations.

### sort the checksums
sort ./backup/checksums/backup > ./backup/checksums/backup.sorted
sort ./backup/checksums/original > ./backup/checksums/original.sorted

### compare the checksums
diff ./backup/checksums/original.sorted ./backup/checksums/backup.sorted

6. The additional restore test (at least once)

The suggested restore test has one small imperfection. It may speed up the comparison of the local and remote copy, but this is only true for the encrypted files. Normally you would want to make sure that the decrypted files are identical to the original unencrypted files as well. There are two different approaches to achieve this:

  • You could download the whole backup, mount that backup via GoCryptFS and then compare the original unencrypted files and the decrypted backup. However, to do this you would have to transfer a lot of data back to your local storage and keep that second copy for the comparison.
  • Instead of calculating the checksums of the encrypted files you could calculate them of the unencrypted files. You could log into the remote target via SSH, mount the remote backup via GoCryptFS and calculate the checksums of the decrypted backup. However, to do this you would have to trust the remote target, but then the encryption would not have to be done in the first place.

The solution that I chose is a bit different: Thanks to the regular restore test I already know that the local encrypted files and the remote files are identical. That also means that the local encrypted files will decrypt to the exact same result as the remote files. So, I can just mount the local encrypted files via GoCryptFS which can then be compared to the original unencrypted files. As the encryption and decryption happen in-memory on-the-fly it is not necessary to keep a second copy of the data around.

### mount the encrypted folder in forward mode
gocryptfs ./encrypted ./decrypted

When calculating the checksums of the original unencrypted files we have to ignore the .gocryptfs.reverse.conf file as it will not be present after the decryption.

### enter the unencrypted directory
cd ./unencrypted

### create checksums of all files
### but ignore the gocryptfs config file
find . -type f ! -path "./.gocryptfs.reverse.conf" -print0 | xargs -0 sha1sum > ../unencrypted

### leave the unencrypted directory
cd ..

Calculating the checksums of the decrypted files might take a bit longer. Remember that in this case each file is read from disk, encrypted and then decrypted before calculating the actual checksum.

### enter the decrypted directory
cd ./decrypted

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../decrypted

### leave the decrypted directory
cd ..

After sorting the checksum files we can finally compare them.

### sort the checksums
sort ./decrypted > ./decrypted.sorted
sort ./unencrypted > ./unencrypted.sorted

### compare the checksums
diff ./unencrypted.sorted ./decrypted.sorted

This whole process does not necessarily have to be done for each and every backup. It is primarily used to make sure that the encryption layer works as expected. After you are done, do not forget to unmount the decryption folder.

### unmount the directory
fusermount -u ./decrypted

7. The restore

Thanks to the usage of the FUSE wrapper it is pretty easy to restore files from the remote backup. For other applications the FUSE mount looks like any local storage device which means that you can also mount the remote backup directly via GoCryptFS.

### mount the backup folder in forward mode
gocryptfs ./backup/files ./decrypted

Now it is possible to browse the backup and search for the files that you want to restore. Once you found them you can just copy them over. During the copy process the files will be downloaded and decrypted in-memory on-the-fly. After you are done, just unmount the decryption folder.

### unmount the directory
fusermount -u ./decrypted

8. Closing up

There we have it. By combining some tools that do their own job we have created a backup solution that is - in my opinion - easy to understand and use. Those tools make up a solution that is better than the single parts alone:

  • We used GoCryptFS to encrypt files in-memory on-the-fly.
  • We used SSHFS to seamlessly access the remote target.
  • We used rsync to synchronize the local encrypted files to the remote target.
  • We used sha1sum, sort and diff to test the restorability of the remote backup.

Best of it all: Those tools are independent of each other. Most of them could be replaced should it become necessary. I hope that you can see the benefits of this approach to simple encrypted remote backups.

So finally, as a last step, do not forget to unmount the used folders. 😃

### unmount the directories
fusermount -u ./backup
fusermount -u ./encrypted

Search

Categories

administration (45)
arduino (12)
calcpw (3)
code (38)
hardware (20)
java (2)
legacy (113)
linux (31)
publicity (8)
raspberry (3)
review (2)
security (65)
thoughts (22)
update (11)
windows (17)
wordpress (19)