How to create simple encrypted remote backups

30.03.2021 yahe administration linux security

Every once in while I get asked if a certain backup scheme is a good idea and oftentimes the suggested backup solution is beyond what I would use myself. Duplicity, its simplification Duply or the not-so-dissimilar contenders Borg and Restic are among those solutions that are mentioned most often, with solutions like Bacula and its offspring Bareos coming much later.

Unfortunately, I would not trust any of these tools further than I could throw a harddrive containing a backup created with them. The reason for this is somewhat simple: In my opinion, all of these solutions are too complex in a worst-case scenario.

As soon as I mention this opinion, most people I talk to want to know how I do backups and they want to know if I ever tried those integrated solutions. Yes, I have used Duplicity for years until a system of mine broke down. I had an up-to-date backup of that system but still lost a lot of (fortunately not so important) data because the Duplicity backup had become inconsistent over time without notice. I was able to manually extract some data out of the backup, but it was not worth the time. That was the moment when I decided that I did not want to be reliant on such a software again.

1. The design goals

There are several design goals that I wanted to achieve with my personal backup solution:

  • It should be encrypted.
  • It should be easy to test the restorability of the backup.
  • It should work with off-the-shelf software that people already know.
  • It should be suitable for cost-efficient remote backups in the cloud.
  • It should work with backup targets without requiring special server software.

There also some non-goals that are not so important for me:

  • It is not required to backup live systems. A file-based backup is sufficent and there are means to prevent files from being modified during the backup process like temporary filesystem snapshots that are used as the basis for the actual backup.
  • It is not required to have versioned backups. While this would be an added bonus, having an up-to-date backup that can be restored reliably is much more important.
  • It is not required to deduplicate content. Deduplication increases complexity.

2. The encryption layer

Let us start with the encryption. For this I have chosen the FUSE wrapper GoCryptFS which is available on GitHub. It is developed by @rfjacob who was one of the most active maintainers of the well-known EncFS encryption layer back in 2015-2018. The "project was inspired by EncFS and strives to fix its security issues while providing good performance" and it looks like he achieved that goal.

Using GoCryptFS ist pretty simple. After downloading the static binary from the GitHub repository you can create the required folders and initialize a so-called reverse repository. A reverse repository takes an unencrypted source directory and provides an encrypted version of the contained files through a second directory. This way you can encrypt files in-memory on-the-fly instead of requiring additional storage space for the encrypted copy of the files. The ad-hoc encryption and decryption of GoCrypt will come in handy for restores as well.

For our purposes we will create three folders:

  • ./unencrypted will contain our source material to be backed up
  • ./encrypted will contain the ad-hoc encrypted files to be backed up
  • ./decrypted will contain the ad-hoc decrypted files of the backup
### create the local folders
mkdir -p ./unencrypted ./encrypted ./decrypted

### initialize the reverse encryption
gocryptfs --init --reverse ./unencrypted

### you can use the --plaintextnames parameter
### if the file names are not confidential
# gocryptfs --init --reverse --plaintextnames ./unencrypted

### mount the unencrypted folder in reverse mode
gocryptfs --reverse ./unencrypted ./encrypted

After initializing the reverse repository in the ./unencrypted folder you will find a new file called ./unencrypted/.gocryptfs.reverse.conf. This file contains relevant encryption parameters that are required to be able to encrypt the files. When the reverse repository is mounted into the ./encrypted folder you will find a file called ./encrypted/gocryptfs.conf which is an exact copy of the previous ./unencrypted/.gocryptfs.reverse.conf file. It is required to be able to decrypt the files again. You must not lose this file!

There are two possibilities to prevent this:

  • You can create a paper-based backup of the ./encrypted/gocryptfs.conf. As the configuration can only be used in conjunction with the corresponding password, it should be save to store this paper-based backup somewhere even if someone should be able to read it.
  • You can create a paper-based backup of the masterkey that is printed to the screen when initializing the reverse repository. However, you have to make sure that no-one can read the masterkey as it is not password-protected.

If you did not use the --plaintextnames option you should also find a new file called ./encrypted/gocryptfs.diriv. By default GoCryptFS does not only encrypt the file content but also the file name. The directory initialization vectors (dirivs) contained in the gocryptfs.diriv files are required to be able to decrypt the file names again. If you cannot risk to lose file names or if your file names are not confidential then it might be better for you to disable the file name encryption.

3. The remote access

I use a FUSE wrapper to mount the remote storage as if it were a local device. Typically your backup software would have to be able to upload files to the remote storage itself, unnecessarily complicating things. The FUSE wrapper takes this complexity away from the actual backup tool. Using a FUSE wrapper will also come in handy later on when restoring data.

Typical FUSE wrappers include:

  • davfs2 for WebDAV compatible storage
  • s3fs for AWS S3 compatible storage
  • SSHFS for SFTP storage

I personally use SSHFS as I use a remote VM with SFTP access to store my backups. This also reduces the amount of transferred data for the restore test as we will see later.

### create the remote folder
mkdir ./backup

### mount the remote storage
sshfs backup@backup.example.com:/backup ./backup

### you should use additional parameters
### if you run into problems
# sshfs backup@backup.example.com:/backup ./backup -o ServerAliveInterval=15 -o idmap=user -o uid=$(id -u) -o gid=$(id -g) -o rw

### create the remote subfolders
mkdir ./backup/checksums ./backup/files ./backup/snapshots

4. The backup process

Now we are ready to create the actual backup. For this we will use rsync which is used far and wide for such tasks and has some nice benefits:

  • By default, rsync identifies modified files through their size and last-modification date. Thanks to the FUSE wrapper the local size and last-modification date and the remote size and last-modification date can easily be compared without having to download the files. So unlike other approaches you do not need a local copy of your whole backup. (If you use an SSH server as a backup target you could also use the integrated SSH support of rsync.)
  • rsync can easily be restarted should a synchronization fail. Unlike other solutions you do not have to wonder what happens when a backup task really fails. Files are encrypted in-memory thanks to GoCryptFS and rsync just starts comparing the local and remote copy from the beginning when restarting the synchronization process.
  • Thanks to the --backup-dir= and --delete parameters rsync provides a rather simple versioning of files. Files that have changed or that have been deleted between synchronizations are moved to the provided backup directory path and can easily be accessed. If you need more storage you can delete backup directories of earlier synchronizations.
### copy over files and keep modified and deleted files
rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --delete ./encrypted/ ./backup/files/

### you should add the --chmod=+w parameter
### if created folders in the backup target are not writable
# rsync -abEP "--backup-dir=../snapshots/$(date '+%Y%m%d-%H%M%S')/" --chmod=+w --delete ./encrypted/ ./backup/files/

5. The restore test (regularly)

You do not have a proper backup unless you have successfully tried to restore it. However, restore-testing remote backups can be ressource-intensive. The way I do it is to calculate checksums of the local files and of the remote files which are then compared to make sure that the remote copy is identical to the local copy.

### enter the encrypted directory
cd ./encrypted

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../original

### copy the checksums over to the remote server
cp ../original ../backup/checksums/original

### leave the encrypted directory
cd ..

Calculating the checksums of the remote files through the FUSE wrapper is possible. Unfortunately, this would mean to download the whole backup in the background. As the checksum calculation is separated from the checksum comparison we can optimize things a bit. Given that you have SSH access to the remote target you can log into the remote server, calculate the checksums there and only transfer the checksum file to compare it with the checksums of the original files. This greatly reduces the amount of data that have to be transferred for the restore test.

### enter the backup files directory
cd ./backup/files

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../checksums/backup

### leave the backup files directory
cd ../..

Comparing checksums that have been written to files has one caveat. The files might be sorted differently. You have to remember this and sort the checksum files before diffing them or otherwise you might find a lot of deviations.

### sort the checksums
sort ./backup/checksums/backup > ./backup/checksums/backup.sorted
sort ./backup/checksums/original > ./backup/checksums/original.sorted

### compare the checksums
diff ./backup/checksums/original.sorted ./backup/checksums/backup.sorted

6. The additional restore test (at least once)

The suggested restore test has one small imperfection. It may speed up the comparison of the local and remote copy, but this is only true for the encrypted files. Normally you would want to make sure that the decrypted files are identical to the original unencrypted files as well. There are two different approaches to achieve this:

  • You could download the whole backup, mount that backup via GoCryptFS and then compare the original unencrypted files and the decrypted backup. However, to do this you would have to transfer a lot of data back to your local storage and keep that second copy for the comparison.
  • Instead of calculating the checksums of the encrypted files you could calculate them of the unencrypted files. You could log into the remote target via SSH, mount the remote backup via GoCryptFS and calculate the checksums of the decrypted backup. However, to do this you would have to trust the remote target, but then the encryption would not have to be done in the first place.

The solution that I chose is a bit different: Thanks to the regular restore test I already know that the local encrypted files and the remote files are identical. That also means that the local encrypted files will decrypt to the exact same result as the remote files. So, I can just mount the local encrypted files via GoCryptFS which can then be compared to the original unencrypted files. As the encryption and decryption happen in-memory on-the-fly it is not necessary to keep a second copy of the data around.

### mount the encrypted folder in forward mode
gocryptfs ./encrypted ./decrypted

When calculating the checksums of the original unencrypted files we have to ignore the .gocryptfs.reverse.conf file as it will not be present after the decryption.

### enter the unencrypted directory
cd ./unencrypted

### create checksums of all files
### but ignore the gocryptfs config file
find . -type f ! -path "./.gocryptfs.reverse.conf" -print0 | xargs -0 sha1sum > ../unencrypted

### leave the unencrypted directory
cd ..

Calculating the checksums of the decrypted files might take a bit longer. Remember that in this case each file is read from disk, encrypted and then decrypted before calculating the actual checksum.

### enter the decrypted directory
cd ./decrypted

### create checksums of all files
find . -type f -print0 | xargs -0 sha1sum > ../decrypted

### leave the decrypted directory
cd ..

After sorting the checksum files we can finally compare them.

### sort the checksums
sort ./decrypted > ./decrypted.sorted
sort ./unencrypted > ./unencrypted.sorted

### compare the checksums
diff ./unencrypted.sorted ./decrypted.sorted

This whole process does not necessarily have to be done for each and every backup. It is primarily used to make sure that the encryption layer works as expected. After you are done, do not forget to unmount the decryption folder.

### unmount the directory
fusermount -u ./decrypted

7. The restore

Thanks to the usage of the FUSE wrapper it is pretty easy to restore files from the remote backup. For other applications the FUSE mount looks like any local storage device which means that you can also mount the remote backup directly via GoCryptFS.

### mount the backup folder in forward mode
gocryptfs ./backup/files ./decrypted

Now it is possible to browse the backup and search for the files that you want to restore. Once you found them you can just copy them over. During the copy process the files will be downloaded and decrypted in-memory on-the-fly. After you are done, just unmount the decryption folder.

### unmount the directory
fusermount -u ./decrypted

8. Closing up

There we have it. By combining some tools that do their own job we have created a backup solution that is - in my opinion - easy to understand and use. Those tools make up a solution that is better than the single parts alone:

  • We used GoCryptFS to encrypt files in-memory on-the-fly.
  • We used SSHFS to seamlessly access the remote target.
  • We used rsync to synchronize the local encrypted files to the remote target.
  • We used sha1sum, sort and diff to test the restorability of the remote backup.

Best of it all: Those tools are independent of each other. Most of them could be replaced should it become necessary. I hope that you can see the benefits of this approach to simple encrypted remote backups.

So finally, as a last step, do not forget to unmount the used folders. 😃

### unmount the directories
fusermount -u ./backup
fusermount -u ./encrypted

Cryptographic Vulnerabilities within the Nextcloud Server Side Encryption

16.11.2020 yahe publicity security

Nearly a year ago I wrote that I had an extensive look into the server side encryption that is provided by the Default Encryption Module of Nextcloud. I also mentioned that I have written some helpful tools and an elaborate description for people that have to work with its encryption.

What I did not write about at that time was that I had also discovered several cryptographic vulnerabilities. After a full year, these have now finally been fixed, the corresponding HackerOne reports have been disclosed and so I think it is about time to also publish the whitepaper that I have written about these vulnerabilities.

The paper is called "Cryptographic Vulnerabilities and Other Shortcomings of the Nextcloud Server Side Encryption as implemented by the Default Encryption Module" and is available through the Cryptology ePrint Archive as report 2020/1439. The vulnerabilities presented in this paper have received their own CVEs, namely:

  • CVE-2020-8133 went to the vulnerability described in the chapter "Insufficient integrity protection of files leads to breach of integrity (I)". More details can be found in the HackerOne report 661051 and in the Nextcloud Security Advisory NC-SA-2020-038.
  • CVE-2020-8150 went to the vulnerability described in the chapter "Insufficient integrity protection of files leads to breach of integrity (III)". More details can be found in the HackerOne report 742588 and in the Nextcloud Security Advisory NC-SA-2020-039.
  • CVE-2020-8152 went to the vulnerability described in the chapter "Insufficient integrity protection of files leads to breach of integrity (II)". More details can be found in the HackerOne report 743505 and in the Nextcloud Security Advisory NC-SA-2020-040.
  • CVE-2020-8259 went to the vulnerability described in the chapter "Insufficient integrity protection of public keys leads to breach of confidentiality". More details can be found in the HackerOne report 732431 and in the Nextcloud Security Advisory NC-SA-2020-041.

Having such an in-depth look into the implementation of a real-world application has been a lot of fun. However, I am also relieved that this project now finally comes to an end. I am eager to start with something new. 😃


Dropbox: Use the API in 5 Simple Steps

07.01.2020 yahe code

A few weeks ago I wrote about the new cryptographic basis of the Shared-Secrets service. What I did not write about was that one user asked if a file-sharing option could be added. I declined because sharing files is nothing that the service is meant to be there for. But I tried whether sharing files would be possible.

To share the files I needed a place to store them, but I did not feel like storing them on the actual server. So I searched for an easy file storage service. The first service that came to my mind was Dropbox. I had never used Dropbox nor the Dropbox API but felt that it should not be all too difficult. But I did not know how easy it actually was to use the Dropbox API. Because of this simplicity I decided to write a short blog post about it. So here are 5 simple steps to use the Dropbox API...

1. Register your Account

First of all you need to register a Dropbox account. For this you should use a valid e-mail address that you have access to because you will have to verify that e-mail address later on.

Dropbox Account Registration

2. Verify your E-Mail Address

After logging in with your new account you can visit the app creation page where you are requested to verify your e-mail address. You have to proceed with the verification in order to create your own Dropbox App.

Dropbox E-Mail Address Verification

3. Create your Dropbox App

Now that you have verified your e-mail address you can visit the app creation page again to create the actual app. Select to use the Dropbox API, to store the files in a separate App folder, choose a name for your App and you are ready to go.

Dropbox App Creation

4. Retrieve Your Access Token

When you have created the app the app details will be shown to you. Scroll down a bit and you will find a button labeled "Generate" in the "OAuth 2" section that will generate your individual access token. The Access Token displayed below the "Generated access token" heading is needed to identify your Dropbox Account.

Dropbox App 1 Dropbox App 2

5. Use the Dropbox API

Using the Dropbox API itself is relatively simple as most actions can be done with a single REST API call. Here are some PHP examples that illustrate the usage of the Dropbox API. First of all we define the Access Token which is needed by all API calls:

  // see https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/
  define("DROPBOX_ACCESS_TOKEN", "YOUR DROPBOX ACCESS TOKEN");

Now, in order to store a given string ($content) in Dropbox we can use the following function. On failure it returns NULL and on success it returns a random identifier through which the content can be retrieved again:

  function dropbox_upload($content) {
    $result = null;

    // store content in memory
    if ($handler = fopen("php://memory", "rw")) {
      try {
        if (strlen($content) === fwrite($handler, $content)) {
          if (rewind($handler)) {
            // get random filename
            $filename = bin2hex(openssl_random_pseudo_bytes(32, $strong_crypto));

            if ($curl = curl_init("https://content.dropboxapi.com/2/files/upload")) {
              try {
                $curl_headers = ["Authorization: Bearer ".DROPBOX_ACCESS_TOKEN,
                                 "Content-Type: application/octet-stream",
                                 "Dropbox-API-Arg: {\"path\":\"/$filename\"}"];

                curl_setopt($curl, CURLOPT_HTTPHEADER,     $curl_headers);
                curl_setopt($curl, CURLOPT_PUT,            true);
                curl_setopt($curl, CURLOPT_CUSTOMREQUEST,  "POST");
                curl_setopt($curl, CURLOPT_INFILE,         $handler);
                curl_setopt($curl, CURLOPT_INFILESIZE,     strlen($content));
                curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

                $response = curl_exec($curl);
                if (200 === curl_getinfo($curl, CURLINFO_RESPONSE_CODE)) {
                  $result = $filename;
                }
              } finally {
                curl_close($curl);
              }
            }
          }
        }
      } finally {
        fclose($handler);
      }
    }

    return $result;
  }

To retrieve the stored content the following function can be used. It requires the random identifier ($filename) as the input parameter:

  function dropbox_download($filename) {
    $result = null;

    if ($curl = curl_init("https://content.dropboxapi.com/2/files/download")) {
      try {
        $curl_headers = ["Authorization: Bearer ".DROPBOX_ACCESS_TOKEN,
                         "Content-Type: application/octet-stream",
                         "Dropbox-API-Arg: {\"path\":\"/$filename\"}"];

        curl_setopt($curl, CURLOPT_HTTPHEADER,     $curl_headers);
        curl_setopt($curl, CURLOPT_POST,           true);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

        $response = curl_exec($curl);
        if (200 === curl_getinfo($curl, CURLINFO_RESPONSE_CODE)) {
          $result = $response;
        }
      } finally {
        curl_close($curl);
      }
    }

    return $result;
  }

In order to delete the stored content the following function can be used. It again requires the random identifier ($filename) as the input parameter:

  function dropbox_delete($filename) {
    $result = null;

    if ($curl = curl_init("https://api.dropboxapi.com/2/files/delete_v2")) {
      try {
        $curl_fields  = json_encode(["path"  => "/$filename"]);
        $curl_headers = ["Authorization: Bearer ".DROPBOX_ACCESS_TOKEN,
                         "Content-Type: application/json"];

        curl_setopt($curl, CURLOPT_HTTPHEADER,     $curl_headers);
        curl_setopt($curl, CURLOPT_POST,           true);
        curl_setopt($curl, CURLOPT_POSTFIELDS,     $curl_fields);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

        $response = curl_exec($curl);
        if (200 === curl_getinfo($curl, CURLINFO_RESPONSE_CODE)) {
          $result = $response;
        }
      } finally {
        curl_close($curl);
      }
    }

    return $result;
  }

And there we have it. With a few lines of code we were able to store, retrieve and delete contents in Dropbox. In my case I also encrypted the content before storing it in Dropbox, something that you should consider as well if you are handling confidential contents. But other than that the shown functions are most of what you will need in the beginning.


Shared-Secrets: Cryptography Reloaded

17.12.2019 yahe code linux security

About 3 years ago I wrote about a tool called Shared-Secrets that I had written. It had the purpose of sharing secrets through encrypted links which should only be retrievable once. Back then I made the decision to base the application on the GnuPG encryption but over the last couple of years I had to learn that this was not the best of all choices. Here are some of the problems that I have found in the meantime:

  • The application started by using the ASCII-armoring of GnuPG to get human-readable outputs for the URL generation. Unfortunately, the ASCII-armoring introduced many possibilities to alter links and thus retrieve secrets more that once.
  • To clean up the interface to GnuPG the application was rewritten to use the GnuPG PECL extension. Unfortunately, this introduced integrity problems and was removed again shortly afterwards.
  • In 2018 the world had to learn through EFail that the integrity protection of GnuPG is actually optional. Thus, the application had to be enhanced to prevent unprotected messages from being decrypted.
  • After this problem I started to poke around GnuPG and the OpenPGP standard and learned that the message format does not support integrity protection for the actual message structure. This means that message packets can be added, moved around or removed. All of these modifications made it possible to alter links and thus retrieve secrets more than once.

As this last issue is a problem with the GnuPG message format itself its solution required to either change or completely replace the cryptographic basis of Shared-Secrets. After thinking about the possible alternatives I decided to design simple message formats and completely rewrite the cryptographic foundation. This new version has been published a few weeks ago and a running instance is also available at secrets.syseleven.de.

This new implementation should solve the previous problems for good and will in future allow me to implement fundamental improvements when they become necessary as I now have a much deeper insight into the used cryptographic algorithms and the design of the message formats.


Nextcloud-Tools: Working with the Nextcloud Server-Side Encryption

02.12.2019 yahe administration code security update

At the beginning of the year we ran into a strange problem with our server-side encrypted Nextcloud installation at work. Files got corrupted when they were moved between folders. We had found another problem with the Nextcloud Desktop client just recently and therefore thought that this was also related to synchronization problems within the Nextcloud Desktop client. Later in the year we bumped into this problem again, but this time it occured while using the web frontend of Nextcloud. Now we understood that the behaviour did not have anything to do with the client but with the server itself. Interestingly, another user had opened a Github issue about this problem at around the same time. As these corruptions lead to quite some workload for the restore I decided to dig deeper to find an adequate solution.

After I had found out how to reproduce the problem it was important for us to know whether corrupted files could still be decrypted at all. I wrote a decryption script and proved that corrupted files could in fact be decrypted when Nextcloud said that they were broken. With this in mind I tried to find out what happened during the encryption and what broke files while being moved. Doing all the research about the server-side encryption of Nextcloud, debugging the software, creating a potential bugfix and coming up with a temporary workaround took about a month of interrupted work.

Even more important than the actual bugfix (as we are currently living with the workaround) is the knowledge we gained about the server-side encryption. Based on this knowledge I developed a bunch of scripts that have been published as nextcloud-tools on GitHub. These scripts can help you to rescue your server-side encrypted files in cases when your database was corrupted or completely lost.

I also wrote an elaborate description about the inner workings of the server-side encryption and tried to get it added to the documentation. It took some time but in the end it worked! For about a week now you can find my description of the Nextcloud Server-Side Encryption Details in the official Nextcloud documentation.

Update

Due to popular demand I wrote the decrypt-all-files.php script that helps you to decrypt all files that have been encrypted with the server-side encryption. It is accompanied with a somewhat extensive description on how to use it.


Search

Links

RSS feed

Categories

administration (43)
arduino (12)
calcpw (2)
code (37)
hardware (16)
java (2)
legacy (113)
linux (29)
publicity (7)
review (2)
security (62)
thoughts (22)
update (9)
windows (17)
wordpress (19)