Encrypting, transferring and decrypting data with sett

Encrypting files

The sett application allows the encryption of any combination of individual files and directories.

The files are first compressed into a single data.tar.gz archive, which is then encrypted with the public key of one or more recipient(s), and signed with the sender’s key. The encrypted data (data.tar.gz.gpg) is then bundled with a metadata file - a plain text file that contains information about who is sending the file and to whom it should be delivered - into a single .zip file. The specifications of the output .zip files produced by sett are described in the sett packaging specifications section.

sett supports multi-recipient data encryption. This allows the encrypted file to be decrypted by multiple recipients.

sett also ensures the integrity of the transferred files by computing checksums on each file that is packaged, and adding this information to the encrypted data. The integrity of each file is verified automatically upon decryption of the file by sett, providing the guarantee that all files were transferred flawlessly.

BioMedIT

Data Transfer Requests: each data transfer into the BioMedIT network must have an authorized Data Transfer Request ID (DTR ID). This ID must be specified at the time the data is encrypted (see below). The ID is added to the encrypted file’s metadata information by sett. A valid and authorized DTR ID value is mandatory for any data transfer into the BioMedIT network. Non-compliant packages will be rejected.

Recipients: each data transfer into the BioMedIT network must be to a recipient assigned to the role of Data Manager for the given project. The recipient’s PGP key must also be approved by the BioMedIT key validation authority. If these conditions are not met, sett will not encrypt the data.

Tip

sett does not support multi-threading during file compression/decompression. This may lead to long processing times in data encryption/decryption when working with large files (e.g. > 100 GB).

Please refer to the working with large files section of the documentation for details on how to alleviate this issue.

Output file naming scheme

By default, encrypted output files produced by sett are named after the pattern:

<project code>_<YYYYMMDD>T<HHMMSS>_<optional suffix>.zip

where:

  • <project code> is the abbreviation/code associated with the project. If no DTR ID value was provided or if Verify DTR is disabled, no project code is added as a prefix to the output file name.
  • <YYYYMMDD> is the current date (Year, Month, Day).
  • <HHMMSS> is the current time (Hours, Minutes, Seconds).
  • <optional suffix> is an optional, custom text that can be added to the file name.

Example: demo_20220211T143311_sib.zip, here demo is the project code and sib is an optional suffix.

The value for the optional suffix can be permanently set in the Settings tab of the sett-gui, or in the sett configuration file.

Using the sett command line, it is possible to completely override the above output file naming scheme by passing the --output option. Overriding the naming scheme is not possible when using sett-gui.

Encrypting data with sett-gui

To encrypt data:

  1. Go to the Encrypt tab of the sett application.

    _images/sett_encrypt_01.png
  2. Select files and/or directories to encrypt: using the Add files and Add directory buttons, select at least one file or directory to encrypt.

    After adding files/directories, they will be listed in the top box of the tab (see figure above).

  3. Select data sender: in the drop-down list found under Sender, select your own PGP key (you are the data sender). For most users, there should in principle be only one key in the Sender drop-down menu: their own key.

    Note

    The Sender key is used to sign the encrypted data, so that the recipient(s) can be confident that the data they receive is genuine.

  4. Select data recipients: add one or more recipients by selecting them from the drop-down list found under Recipients and clicking the + button. Recipients are the people for whom data should be encrypted: their public PGP key will be used to encrypt the data, and only they will be able to decrypt it.

    BioMedIT

    Only recipients assigned to the role of Data Manager of the project for which data is being encrypted are permitted to be used as data recipients.

  5. DTR ID: Data Transfer Request ID associated to the data package that is being encrypted. Specifying a valid DTR ID is mandatory to transfer data into the BioMedIT network.

    For data not intended to be transferred into the BioMedIT network, the DTR ID field can be left empty (or set to any arbitrary value). In this case, Verify DTR must be disabled (in the Settings tab).

    BioMedIT

    DTR ID field is mandatory. Only files encrypted with a valid and authorized DTR ID value can be transferred into the secure BioMedIT network. For this reason, BioMedIT users should always leave the Verify DTR checkbox enabled.

  6. Purpose: purpose of the data transfer, please select either PRODUCTION or TEST, or leave it empty.

    BioMedIT

    This filed is mandatory.

  7. Output suffix (optional): optional suffix value to appended at the end of the file name. A _ separator is automatically added and does not need to be part of the suffix.

    • Only regular alphanumeric, - and _ characters are allowed in the output suffix.
    • The value for the optional suffix can be permanently set in the Settings tab.
    • For more details on the output file naming scheme used by sett, please refer to the output file naming scheme section.
  8. Output location: directory where the encrypted file should be saved.

    By default, output files are saved to the user’s home directory.

  9. Compression level slider: amount of compression to apply to the input data when packaging it.

    Compression values range between 0 (no compression) and 9 (highest compression). Higher compression level result in smaller encrypted output files but require more computing time.

    The default compression level of 5 offers a good balance between output compression and time needed to perform the task. An illustration of compression ratio vs. time is given in the sett benchmarks section.

    If compression is not required, e.g. because the input data is already in a compressed form, the compression level should be set to 0 in order to speed-up the packaging task. Performing compression outside of sett can be useful when working with large files.

  10. Ignore disk space error: disable disk space check before data encryption.

    By default, sett verifies that there is enough free disk space available to save the output file before starting to compress and encrypt data. If this is not the case an error message is displayed and the operation is aborted. Since the compression ratio of the input data cannot be known in advance, sett uses the conservative estimate that the minimum disk space required is equal to the total size of all input files to be encrypted.

    If users think this is too conservative, this verification can be disable by turning the Ignore disk space error checkbox on.

  11. Encryption Test: a test run of the data to encrypt can be performed by clicking the Test button. This will check that all the specified input files can be found on disk, and run additional checks if Verify DTR settings is enabled (default).

    _images/sett_encrypt_02.png
  12. You are now ready to compress and encrypt the data: click Package & Encrypt. A pop-up will appear, asking for the password associated with the sender’s key. After the password is entered, data compression and encryption will start. Progress and error messages are displayed in the Console box.

    When the encryption completed successfully, the Console should display a message that reads: “Completed data encryption” followed by the location and name of the output file, as illustrated in the example below.

    _images/sett_encrypt_03.png

At this point, all input files are compressed, encrypted and bundled into a single .zip file. Data has not yet been transferred to the intended recipient.

Encrypting data on the command line

The sett command to encrypt data is the following. Note that the SENDER and RECIPIENT values can be specified either as a PGP key fingerprint, or as an email address.

# General syntax:
sett encrypt --sender SENDER --recipient RECIPIENT --dtr-id DATA_TRANSFER_ID --purpose PURPOSE --output OUTPUT_FILENAME_OR_DIRECTORY FILES_OR_DIRECTORIES_TO_ENCRYPT

# Example:
# long command line options:
sett encrypt --sender alice@example.com --recipient bob@example.com --dtr-id 42 --purpose PRODUCTION --output test_output ./test_file.txt ./test_directory
# short command line options:
sett encrypt -s alice@example.com -r bob@example.com -t 42 --purpose PRODUCTION -o test_output ./test_file.txt ./test_directory --dry-run

Data can be encrypted for more than one recipient by repeating the flag --recipient, e.g. --recipient RECIPIENT1 --recipient RECIPIENT2 option:

# In this example, Alice encrypts a set of files for both Bob and Chuck.
sett encrypt --sender alice@example.com --recipient bob@example.com chuck@example.com FILES_OR_DIRECTORIES_TO_ENCRYPT

Adding the --dry-run option will run the encrypt command in test mode, i.e. checks are made but no data is encrypted.

The data compression level used by sett can be manually adjusted using the --compression-level option. Compression levels value must be integers between 0 (no compression) and 9 (highest compression). Higher compression levels produce smaller output files but require more computing time, so you may choose a lower level to speed-up compression (e.g. --compression-level=1), or a higher level (e.g. --compression-level=9) to produce smaller output files. The default level is 5.

Before encrypting data, sett verifies that there is enough free disk space available on the local machine to save the encrypted output file (relevant is the current working directory or target folder pointed by --output). If this is not the case an error message is displayed and the operation is aborted. Since the compression ratio of the input data cannot be known in advance, sett uses the conservative estimate that the minimum disk space required is equal to the total size of all input files to be encrypted. If users think this is too conservative (e.g. because they know that their data compresses well), this verification can be disable by passing the --force option.

To automate the encryption process, the --passphrase-cmd option can be used to specify an external command that returns the PGP key password to the standard output.

Important

When using --passphrase-cmd, make sure that the external command and the password store are secure.

sett performs DTR verification if verify_dtr is enabled in settings (default). For non-BioMedIT-related transfers, verify_dtr should be set to false (--dtr-id and --purpose are optional in this mode).

BioMedIT

A valid DTR ID is must be specified via the --dtr-id option and purpose via --purpose.

An optional output suffix can be added to the sett output files by passing the --output-suffix option. Alternatively, the value for the optional suffix can be permanently set in the sett configuration file. For more details on the output file naming scheme used by sett, please refer to the output file naming scheme section.

To completely override the sett output file naming scheme, the --output option can be used to specify the path and name that the output file should have.

Transferring files

Data packages can be transferred to remote servers that support one of the following protocols:

  • SFTP
  • S3 object storage
  • Liquid Files

Important

Only files encrypted with sett, or files that follow the sett packaging specifications can be transferred using sett.

By default, sett verifies data packages before initializing file transfers. These checks are required within the BioMedIT network, but can be skipped in other contexts by disabling Verify DTR, Verify package name, and/or Verify key approval checkboxes in the application settings.

  • Verify DTR: A valid and authorized DTR (Data Transfer Request) ID is required in the package metadata.

  • Verify package name: Package name must match the pattern <project_code>_<date-format>_<package_name_suffix>.zip, where:

    • project_code is the abbreviated project name of the project whose DTR ID was used during data encryption. If no DTR ID value was given at encryption time, or if checking DTR ID is disabled, the pattern against which files named are verified becomes <date-format>_<package_name_suffix>.zip.
    • date_format is the date and time when the data package was created, as specified in the sett packaging format.
    • package_name_suffix is the optional suffix appended to packages names when they are created. When using sett-gui, the suffix value is taken from the Encrypt tab (which itself can be taken from the config file). When using the command line, the suffix value is taken from the sett configuration file.

    The objective of this verification is to avoid that users mistakenly include sensitive information in data package file names.

  • Verify key approval: Verify tha the PGP keys (sender and recipients) have been approved by the central authority.

Transferring files with sett-gui

To transfer encrypted files:

  1. Go to the Transfer tab of the sett application.

    _images/sett_transfer_01.png
  2. Select encrypted files to transfer: click Add files and select a .zip file that was generated using the sett application.

    • Multiple files can be transferred at the same time by adding more than one file.
    • Only .zip files produced by the sett application can be transferred.
  3. Select the data transfer Protocol (sftp, s3 or liquid_files). The choice depends on the server to which the data should be sent, but in most cases sftp is used.

  4. Set the connection parameters for the transfer:

    • User name: the user name with which to connect to the SFTP/liquid files server.

    • Host URL: URL address of the server where the files should be sent.

    • Destination directory: absolute path of directory where files should be saved on the server.

    • SSH key location: name and full path of the private SSH key used for authentication to the SFTP server. This is only required if the SSH key is in a non-standard location. Only RSA keys are accepted.

      • Do not confuse SSH keys - which are used to authenticate yourself when connecting to an SFTP server during file transfer - with PGP keys - which are used to encrypt and sign data.
    • SSH key password: password associated with the private SSH key given under SSH key location.

      BioMedIT

      For BioMedIT users, the SFTP connection parameters User name, Host URL, and Destination directory will be provided by your local BioMedIT node.

  5. You are now ready to transfer the data. Click Transfer selected files and follow the progress of the transfer using the progress bar and the Console box.

Transferring data on the command line

sett command to transfer data:

# General syntax:
sett transfer --protocol=sftp --protocol-args='{"host": "HOST","username":"USERNAME", "destination_dir":"DIR", "pkey":"PRIVATE_RSA_SSH_KEY"}' FILES_TO_TRANSFER
sett transfer --protocol=liquid_files --protocol-args='{"host": "HOST","subject": "SUBJECT", "message": "MESSAGE","api_key":"APIKEY","chunk_size": 100}' FILES_TO_TRANSFER

# Example:
sett transfer --protocol=sftp --protocol-args='{"host":"10.0.73.1","username":"alice", "destination_dir":"/data", "pkey":"~/.ssh/id_rsa"}' encrypted_data.zip

Note that if you are using the Windows Command Prompt, the above syntax must be modified to use double quotes (") instead of single quotes ('), and doubled-double quotes ("") instead of regular double quotes (") around the --protocol-args. Here is an example:

# Example for Windows command prompt:
sett transfer --protocol=sftp --protocol-args="{""host"":""10.0.73.1"",""username"":""alice"", ""destination_dir"":""/data"", ""pkey"":""~/.ssh/id_rsa""}" encrypted_data.zip

For SFTP transfers, an SSH key is required for authentication on the host server. The private SSH key can be provided via 2 mechanisms:

  • Specifying the location of the key via the pkey argument of --protocol-args. See below for more details.
  • Use an SSH agent to provide the key. The SSH agent is automatically detected by sett and no specific input from the user is needed. In ths case, the pkey argument should be skipped.

The --protocol-args value takes a different set of fields depending on the protocol being used:

  • sftp protocol arguments:

    • host

      Address of remote SFTP server to which files should be transferred, e.g. "host":"10.0.73.1".

      Connecting to a specific port on the SFTP server can be done using the syntax: "host":"10.0.73.1:3111" (here 3111 is the port to use). If no port is specified, port 22 is used by default.

    • username

      User name with which to connect to the SFTP service.

    • destination_dir

      Absolute path of the directory on the SFTP server where files should be transferred.

    • pkey

      Path + name of the private SSH key used to authenticate with the SFTP server. This argument is only needed if the authentication method used by the SFTP server is SSH key (not needed with OIDC).

      This argument can also be skipped if the SSH key is provided via an SSH agent - i.e. if pkey is missing, sett will try to find a suitable SSH key from a running agent.

    • pkey_password

      If a private SSH key is passed via the pkey argument, its password should provided via this argument. If the key is not password-protected (⚠ unsecure - not recommended), "pkey_password":"" must be passed. If this argument is missing, the user will be manually prompted to enter the password.

  • liquid_files protocol arguments:

    • host
      Address of remote liquid files server to which files should be transferred.
    • api_key
      The API key from liquid files. You can get from your account: “Account Settings” -> “API”.
    • subject
      Optional subject to send together with the package.
    • message
      Optional message to send together with the package.

Adding the --dry-run option will run the transfer command in test mode, i.e. checks are made but no data is transferred.

To display help for the transfer command: sett transfer --help.

Decrypting files

The sett application allows the decryption and decompression of files in a single step. However, only files encrypted with the sett application, or files that follow the sett packaging specifications can be decrypted with sett.

Tip

sett does not support multi-threading during file compression/decompression. This may lead to long processing times in data encryption/decryption when working with large files (e.g. > 100 GB).

Please refer to the Working with large files section of the documentation for details on how to alleviate this issue.

Decrypting data with sett-gui

To decrypt and decompress files:

  1. Go to the Decrypt tab of the sett application (see figure below).

    _images/sett_decrypt_01.png
  2. Select one or more files to decrypt with Add files.

  3. Selecting Decompress will both decrypt and decompress files. Deselecting this option will only decrypt the package, outputting a compressed file named data.tar.gz that contains the decrypted data.

  4. Location: select a location where to decrypt/decompress files.

  5. Click Decrypt selected files to start the decryption/decompression process. A pop-up dialog box will appear asking for the password associated with the PGP key used to encrypt the files.

Decrypting data on the command line

sett command to decrypt data:

# General syntax:
sett decrypt --output-dir=OUTPUT_DIRECTORY ENCRYPTED_FILES.zip

# Example:
sett decrypt --output-dir=/home/alice/data/unpack_dir /home/alice/data/test_data.zip

To display help for the decrypt command: sett decrypt --help.

To decrypt data without decompressing it, add the --decrypt-only option.

If the --output-dir option is omitted, the data is decrypted in the current working directory.

If you want to automate the decryption process you can use the --passphrase-cmd option with an external command that returns your PGP key password to the standard output.

IMPORTANT: make sure the external command and the password store are secure.

PGP key auto-download and auto-refresh in sett

To ensure that all PGP keys used for data encryption and signing are trustworthy and up to date, sett is configured by default to work in conjunction with the OpenPGP keyserver keys.openpgp.org and the BioMedIT portal. If you are using sett outside of BioMedIT, you will need to disable the usage of the BioMedIT portal via the sett configuration options.

The keyserver is an online server where users upload their public PGP keys, so that people who need these keys - to encrypt data or verify a signature - can easily obtain them.

PGP keys used by sett (data recipients and sender keys) are automatically refreshed from the keyserver each time they are about to be used. This ensures that an up-to-date version of the keys is used. For instance, if a key is revoked, then the user’s local copy of the key will be automatically updated with this information.

If needed, the auto-download/auto-refresh of PGP keys can be disabled via the Allow PGP key auto-download checkbox in the Settings Tab.

The BioMedIT portal is an online service where users can (among other things) register their PGP key and get it approved for usage within BioMedIT. Just before a key is used, sett connects to the portal to verify that it has been approved.

For non-BioMedIT users, key approval should be disabled via the Verify key approval checkbox in the Settings Tab.

BioMedIT

Allow PGP key auto-download and Verify key approval must be enabled.

Setting up predefined connection profiles for frequent use

To avoid retyping connection settings for every transfer, connection profiles can be saved in sett.

You can configure and save connection profiles using the Transfer tab of the sett-gui interface.

Alternatively, you can modify the sett configuration file. An example config file would look like this:

{
  ...

  "connections": {
    "custom_connection": {
      "protocol": "sftp",
      "parameters": {
        "destination_dir": "upload"
        "host": "server.name.com",
        "pkey": "",
        "username":"chuck_norris",
      }
    }
  }
}

Predefined connections can then be selected from the Connection drop-down menu in the Transfer tab of the sett-gui interface.

On the command line, predefined connections can be passed using the --connection option, as illustrated here. In that case, the --protocol and --protocol-args are no longer needed, unless you wish to override part of the predefined connection’s settings:

# General syntax:
sett transfer --connection PREDEFINED_CONNECTION_NAME FILES_TO_TRANSFER

# Example:
sett transfer --connection custom_connection encrypted_data.zip

Working with large files

sett does not support multi-threading during file compression/decompression, and therefore the default encryption and decryption workflows can take a long time when working with large files (> 100 GB).

To alleviate this problem, sett provides the option to skip the compression/decompression steps in both the encryption and decryption workflows. Users can then compress/decompress their data with an external compression tool that supports multi-threading (e.g. pigz on Linux or 7zip on Windows), before encrypting (or after decrypting) it with sett’s compression/decompression option disabled.

Disabling compression/decompression in sett-gui:

  • To disable compression in the encryption workflow, uncheck the Compress input data checkbox in the Encrypt tab.
  • To disable decompression in the decryption workflow, select Decrypt only in the Select data decryption options section of the Decrypt tab.

Disabling compression/decompression in sett command line:

  • To disable compression in the encryption workflow, add the --compression-level=0 option to the sett encrypt command.

  • To disable decompression in the decryption workflow, add the --decrypt-only option to the sett decrypt command.

  • Examples:

    # Data encryption with compression disabled.
    sett encrypt --sender alice.smith@example.com --recipient bob@example.com --dtr-id 42 --compression-level=0 --output test_output ./test_file.txt ./test_directory
    
    # Data decryption with decompression disabled.
    sett decrypt --decrypt-only --output_dir=/home/alice/sett_demo/unpack_dir /home/alice/sett_demo/test_data.zip
    

Split and transfer large files

Transferring a single large files can sometimes prove problematic, e.g. if the connection is unstable. When encountering such problems, one workaround is to split the data to encrypt and transfer into smaller chunks.

Windows - using 7-zip

On Windows, the open source utility 7zip can be used to compress and split data.

  • With 7zip installed, right-click on the folder you want to compress and split: select 7-Zip > Add to archive....
  • In the 7zip dialog box, set the following values:
    • Click on ... to change the name and location of the output archive file.
    • Set Archive Format to zip.
    • Set Split to volumes, bytes to an appropriate size in Megabytes (M) or Gigabytes (G), e.g. 100G for 100 gigabytes chunk size.
  • The output filenames will be numbered like this: archive.zip.001, archive.zip.002 etc.
  • Encrypt these archive files using sett without additional compression, see instructions above.

Once data is transferred and decrypted at the destination, if can be inflated again. This can be done with the unzip command line utility:

  • unzip archive.zip.0xx, where 0xx is the last file, i.e. the file with the highest number. It contains the file structure.

Note: 7zip can also be used in command line.

  • "C:\Program Files\7-Zip\7z.exe" a -v100g "archive.zip" "folder/to/be/archived"
  • The -v100g option creates 100 Gigabytes chunks (volumes).
  • Several files/directories to compress can be specified after "archive.zip", and asterisks * can be used to match multiple files, e.g. "data/*.csv".

Mac OS and Linux - command line

On Mac OS and Linux, the zip command line utility can be used to split and compress data.

  • In the terminal, cd to the directory containing the files/directories to compress.
  • Run the command: zip -s 100g archive.zip <file(s) or directory(ies)>. In this example, data will be split into 100 GB chunks, but the size of chunks can be set as appropriate by replacing 100g with the desired value.
  • This will create numbered files named archive.zip, archive.z01, archive.z02, etc. The archive.zip is the last file created and contains the file structure.
  • Encrypt these archive files using sett without additional compression, see instructions above.

Once data is transferred and decrypted at the destination, it can be inflated using the command unzip archive.zip.

Mac OS with GUI - using Keka

On Mac OS, people preferring to use a GUI than command line can use Keka, a compression app with Finder integration. The app is 5 CHF via Mac the App Store or free via their website (donations welcome).

  • Open Keka, and specify the split size in Megabytes or Gigabytes, e.g. 100 GB.
  • In the Settings menu, activate the Finder integration.
  • Right-click on a directory to compress it.

Automating tasks with sett

For users that need to package and transfer data on a regular basis, a number of options exist to simplify the automation of sett tasks. Please refer to the sett automation section for details.

Installing or running sett behind a proxy

If you are running sett behind a proxy, the shell environment variable ALL_PROXY or HTTPS_PROXY must be set. This is the recommended and global way to specify a proxy. Note that, while certain programs support a proxy option (e.g. pip with --proxy), there is currently no such option in sett.

Example:

ALL_PROXY=https://host.domain:port sett-gui

Note

If your proxy is a socks proxy, you need to install sett with socks proxy support:

pip install [ --user ] sett[socks]

In this case also replace the schema https:// with socks5://.

Checking whether you are using a proxy

On Windows, you can check if you are using a proxy server to access the internet by going to Start > Settings > Network & Internet > Proxy (Windows 10). If the slider under Use a proxy server is “off”, no proxy is being used.

If you are told that you need to set a proxy, input the Address and Port details and click Save. If in doubt please consult with your IT department.

On Mac OS the proxy information is located under the System Preferences > Network > Advanced > Proxies tab of the network interface, usually Ethernet or Wi-Fi.

Known issues and limitations

SSH private key with non-ASCII characters password

Even though it is possible to create an SSH key pair using a password containing non-ASCII characters, it seems like those characters are encoded differently between different operating systems.

As an SSH key might be moved to an machine with another operating system, or encoding might change with a new version, it is impossible to guess the correct encoding in any cases. For this reason, we recommend not to use non-ASCII characters to protect SSH private keys.

If this is still desired, there is a configurable option ssh_password_encoding available in the sett config file which defaults to utf_8 (this is correct encoding for keys generated with ssh-keygen on linux / mac). For keys generated with ssh-keygen on Windows 10, cp437 should work to correctly encode non-ASCII chars. Example of config file with SSH password encoding set to cp437:

{
    "ssh_password_encoding": "cp437"
}