Every once in while I get asked if a certain backup scheme is a good idea and oftentimes the suggested backup solution is beyond what I would use myself. Duplicity, its simplification Duply or the not-so-dissimilar contenders Borg and Restic are among those solutions that are mentioned most often, with solutions like Bacula and its offspring Bareos coming much later.
Unfortunately, I would not trust any of these tools further than I could throw a harddrive containing a backup created with them. The reason for this is somewhat simple: In my opinion, all of these solutions are too complex in a worst-case scenario.
As soon as I mention this opinion, most people I talk to want to know how I do backups and they want to know if I ever tried those integrated solutions. Yes, I have used Duplicity for years until a system of mine broke down. I had an up-to-date backup of that system but still lost a lot of (fortunately not so important) data because the Duplicity backup had become inconsistent over time without notice. I was able to manually extract some data out of the backup, but it was not worth the time. That was the moment when I decided that I did not want to be reliant on such a software again.
There are several design goals that I wanted to achieve with my personal backup solution:
There also some non-goals that are not so important for me:
Let us start with the encryption. For this I have chosen the FUSE wrapper GoCryptFS which is available on GitHub. It is developed by @rfjacob who was one of the most active maintainers of the well-known EncFS encryption layer back in 2015-2018. The "project was inspired by EncFS and strives to fix its security issues while providing good performance" and it looks like he achieved that goal.
Using GoCryptFS ist pretty simple. After downloading the static binary from the GitHub repository you can create the required folders and initialize a so-called reverse repository. A reverse repository takes an unencrypted source directory and provides an encrypted version of the contained files through a second directory. This way you can encrypt files in-memory on-the-fly instead of requiring additional storage space for the encrypted copy of the files. The ad-hoc encryption and decryption of GoCrypt will come in handy for restores as well.
For our purposes we will create three folders:
./unencryptedwill contain our source material to be backed up
./encryptedwill contain the ad-hoc encrypted files to be backed up
./decryptedwill contain the ad-hoc decrypted files of the backup
### create the local folders mkdir -p ./unencrypted ./encrypted ./decrypted ### initialize the reverse encryption gocryptfs --init --reverse ./unencrypted ### you can use the --plaintextnames parameter ### if the file names are not confidential # gocryptfs --init --reverse --plaintextnames ./unencrypted ### mount the unencrypted folder in reverse mode gocryptfs --reverse ./unencrypted ./encrypted
After initializing the reverse repository in the
./unencrypted folder you will find a new file called
./unencrypted/.gocryptfs.reverse.conf. This file contains relevant encryption parameters that are required to be able to encrypt the files. When the reverse repository is mounted into the
./encrypted folder you will find a file called
./encrypted/gocryptfs.conf which is an exact copy of the previous
./unencrypted/.gocryptfs.reverse.conf file. It is required to be able to decrypt the files again. You must not lose this file!
There are two possibilities to prevent this:
./encrypted/gocryptfs.conf. As the configuration can only be used in conjunction with the corresponding password, it should be save to store this paper-based backup somewhere even if someone should be able to read it.
If you did not use the
--plaintextnames or the
-deterministic-names option you should also find a new file called
./encrypted/gocryptfs.diriv. By default GoCryptFS does not only encrypt the file content but also the file name. The directory initialization vectors (dirivs) contained in the
gocryptfs.diriv files are required to be able to decrypt the file names again. If you cannot risk to lose file names or if your file names are not confidential then it might be better for you to disable the file name encryption or use the deterministic file name encryption.
I use a FUSE wrapper to mount the remote storage as if it were a local device. Typically your backup software would have to be able to upload files to the remote storage itself, unnecessarily complicating things. The FUSE wrapper takes this complexity away from the actual backup tool. Using a FUSE wrapper will also come in handy later on when restoring data.
Typical FUSE wrappers include:
I personally use SSHFS as I use a remote VM with SFTP access to store my backups. This also reduces the amount of transferred data for the restore test as we will see later.
### create the remote folder mkdir ./backup ### mount the remote storage sshfs firstname.lastname@example.org:/backup ./backup ### you should use additional parameters ### if you run into problems # sshfs email@example.com:/backup ./backup -o ServerAliveInterval=15 -o idmap=user -o uid=$(id -u) -o gid=$(id -g) -o rw ### create the remote subfolders mkdir ./backup/checksums ./backup/files ./backup/snapshots
Now we are ready to create the actual backup. For this we will use rsync which is used far and wide for such tasks and has some nice benefits:
--deleteparameters rsync provides a rather simple versioning of files. Files that have changed or that have been deleted between synchronizations are moved to the provided backup directory path and can easily be accessed. If you need more storage you can delete backup directories of earlier synchronizations.
### copy over files and keep modified and deleted files rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --delete ./encrypted/ ./backup/files/ ### you should add the --chmod=+w parameter ### if created folders in the backup target are not writable # rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --chmod=+w --delete ./encrypted/ ./backup/files/
You do not have a proper backup unless you have successfully tried to restore it. However, restore-testing remote backups can be ressource-intensive. The way I do it is to calculate checksums of the local files and of the remote files which are then compared to make sure that the remote copy is identical to the local copy.
### enter the encrypted directory cd ./encrypted ### create checksums of all files find . -type f -print0 | xargs -0 sha1sum > ../original ### copy the checksums over to the remote server cp ../original ../backup/checksums/original ### leave the encrypted directory cd ..
Calculating the checksums of the remote files through the FUSE wrapper is possible. Unfortunately, this would mean to download the whole backup in the background. As the checksum calculation is separated from the checksum comparison we can optimize things a bit. Given that you have SSH access to the remote target you can log into the remote server, calculate the checksums there and only transfer the checksum file to compare it with the checksums of the original files. This greatly reduces the amount of data that have to be transferred for the restore test.
### enter the backup files directory cd ./backup/files ### create checksums of all files find . -type f -print0 | xargs -0 sha1sum > ../checksums/backup ### leave the backup files directory cd ../..
Comparing checksums that have been written to files has one caveat. The files might be sorted differently. You have to remember this and sort the checksum files before diffing them or otherwise you might find a lot of deviations.
### sort the checksums sort ./backup/checksums/backup > ./backup/checksums/backup.sorted sort ./backup/checksums/original > ./backup/checksums/original.sorted ### compare the checksums diff ./backup/checksums/original.sorted ./backup/checksums/backup.sorted
The suggested restore test has one small imperfection. It may speed up the comparison of the local and remote copy, but this is only true for the encrypted files. Normally you would want to make sure that the decrypted files are identical to the original unencrypted files as well. There are two different approaches to achieve this:
The solution that I chose is a bit different: Thanks to the regular restore test I already know that the local encrypted files and the remote files are identical. That also means that the local encrypted files will decrypt to the exact same result as the remote files. So, I can just mount the local encrypted files via GoCryptFS which can then be compared to the original unencrypted files. As the encryption and decryption happen in-memory on-the-fly it is not necessary to keep a second copy of the data around.
### mount the encrypted folder in forward mode gocryptfs ./encrypted ./decrypted
When calculating the checksums of the original unencrypted files we have to ignore the
.gocryptfs.reverse.conf file as it will not be present after the decryption.
### enter the unencrypted directory cd ./unencrypted ### create checksums of all files ### but ignore the gocryptfs config file find . -type f ! -path "./.gocryptfs.reverse.conf" -print0 | xargs -0 sha1sum > ../unencrypted ### leave the unencrypted directory cd ..
Calculating the checksums of the decrypted files might take a bit longer. Remember that in this case each file is read from disk, encrypted and then decrypted before calculating the actual checksum.
### enter the decrypted directory cd ./decrypted ### create checksums of all files find . -type f -print0 | xargs -0 sha1sum > ../decrypted ### leave the decrypted directory cd ..
After sorting the checksum files we can finally compare them.
### sort the checksums sort ./decrypted > ./decrypted.sorted sort ./unencrypted > ./unencrypted.sorted ### compare the checksums diff ./unencrypted.sorted ./decrypted.sorted
This whole process does not necessarily have to be done for each and every backup. It is primarily used to make sure that the encryption layer works as expected. After you are done, do not forget to unmount the decryption folder.
### unmount the directory fusermount -u ./decrypted
Thanks to the usage of the FUSE wrapper it is pretty easy to restore files from the remote backup. For other applications the FUSE mount looks like any local storage device which means that you can also mount the remote backup directly via GoCryptFS.
### mount the backup folder in forward mode gocryptfs ./backup/files ./decrypted
Now it is possible to browse the backup and search for the files that you want to restore. Once you found them you can just copy them over. During the copy process the files will be downloaded and decrypted in-memory on-the-fly. After you are done, just unmount the decryption folder.
### unmount the directory fusermount -u ./decrypted
There we have it. By combining some tools that do their own job we have created a backup solution that is - in my opinion - easy to understand and use. Those tools make up a solution that is better than the single parts alone:
Best of it all: Those tools are independent of each other. Most of them could be replaced should it become necessary. I hope that you can see the benefits of this approach to simple encrypted remote backups.
So finally, as a last step, do not forget to unmount the used folders. 😃
### unmount the directories fusermount -u ./backup fusermount -u ./encrypted