Encrypting, transferring and decrypting data with sett

Encrypting files

The sett application allows the encryption of any combination of individual files and directories. More specifically, the files are first compressed into a single .tar.gz archive, which is then encrypted with the public key of one or more recipient(s), and signed with the sender’s key.

The encrypted data is then bundled with a metadata file - a plain text file that contains information about who is sending the file and to whom it should be delivered - into a single file with a .tar extension. The specifications of the output .tar files produced by sett are described in the sett packaging file format specifications section.

sett supports multi-recipient data encryption. This allows the encrypted file to be decrypted by multiple recipients.

sett also ensures the integrity of the transferred files by computing checksums on each file that is packaged, and adding this information to the encrypted data. The integrity of each file is verified automatically upon decryption of the file by sett, providing the guarantee that all files were transferred flawlessly.

BioMedIT

Data Transfer Requests: each data transfer into the BioMedIT network must have an authorized Data Transfer Request ID (DTR ID). This ID must be specified at the time the data is encrypted (see below). The ID is added to the encrypted file’s metadata information by sett. Having a valid and authorized DTR ID value is mandatory for any data transfer into the BioMedIT network. Non-compliant packages will be rejected.

Recipients: each data transfer into the BioMedIT network must be to a recipient assigned to the role of a Data Manager for a given project. The recipient must also have a valid (signed) public key on the DCC keyserver. If these conditions are not met, sett will not encrypt the data.

Tip

sett does not support multi-threading during file compression/decompression. This may lead to long processing times in data encryption/decryption when working with large files (e.g. > 100 GB).

Please refer to the working with large files section of the documentation for details on how to alleviate this issue.

Encrypting data with sett-gui

To encrypt data:

  1. Go to the Encrypt tab of the sett application.

    _images/sett_encrypt_01.png
  2. Select files and/or directories to encrypt: using the Add files and Add directory buttons, select at least one file or directory to encrypt. After adding files/directories, they will be listed in the top box of the tab (see figure above).

  3. Select data sender: in the drop-down list found under Sender, select your own PGP key (you are the data sender). For most users, there should in principle be only one key in the Sender drop-down menu: their own key.

    Note

    The Sender key is used to sign the encrypted data, so that the recipient(s) can be confident that the data they receive is genuine.

  4. Select data recipients: add one or more recipients by selecting them from the drop-down list found under Recipients and clicking the + button. Recipients are the people for whom data should be encrypted: their public PGP key will be used to encrypt the data, and only they will be able to decrypt it.

    BioMedIT

    For BioMedIT users only recipients assigned to the role of a Data Manager of the project for which data is being encrypted are permitted to be used as data recipients.

  5. DTR ID: specifying a valid Data Transfer Request ID is mandatory for data to be transferred into the BioMedIT network. For data not intended to be transferred into the BioMedIT network, the DTR ID field can be left empty (or set to any arbitrary value), but the Verify DTR ID checkbox must be disabled (see below).

    BioMedIT

    For BioMedIT users, the DTR ID field is mandatory. Only files encrypted with a valid and authorized DTR ID value can be transferred into the secure BioMedIT network. For this reason, BioMedIT users should always leave the Verify DTR ID checkbox enabled.

  6. Verify DTR ID checkbox: by default, Verify DTR ID is enabled and this will enforce the following checks (for data not intended to be transferred into the BioMedIT network, this checkbox should be disabled):

    • DTR ID is valid and the transfer is authorized by the DCC.
    • Sender and Recipients public PGP keys are signed by the central authority defined in the sett configuration file. By default the central authority is the BioMed-IT DCC.
    • Recipients are officially approved Data Managers of the BioMedIT project for which data is being encrypted.
    • The Verify DTR ID option also triggers auto-downloading/refreshing of recipients PGP keys to ensure that the user has the latest versions of its intended recipients’ keys.
  7. Purpose: purpose of the data transfer, please select either PRODUCTION or TEST. Note that this field is mandatory for data to be transferred into the BioMedIT network. For data that will not be transferred into the BioMedIT network, this field can be left empty.

  8. Output suffix (optional): by default, encrypted output files produced by the sett application are named after the current date and time, following the structure YYYYMMDDTHHMMSS.tar (where YYYYMMDD is the current day and HHMMSS the current time).

    If an Output suffix is provided, it gets appended at the end of the file name with a _ separator. The _ separator is automatically added and does not need to be part of the suffix.

    E.g. setting testSuffix in the Output suffix field will generate files that are named YYYYMMDDTHHMMSS_testSuffix.tar. Note: only regular alphanumeric, - and _ characters are allowed in the output suffix.

  9. Output location: select the location where the encrypted file should be saved.

  10. Data package: compress: by default this option is enabled, meaning that sett will compress the input data before encrypting it. If compression is not required, e.g. because the input data is already in a compressed form, then this option can be unchecked. This option can be useful when working with large files.

  11. Encryption Test: a test run of the data to encrypt can be performed by clicking the Test button. This will check that all the specified input files can be found on disk, and run the checks described under the Verify DTR ID option above.

    _images/sett_encrypt_02.png
  12. You are now ready to compress and encrypt the data: click Package & Encrypt. A pop-up will appear, asking for the password associated with the sender’s key. After the password is entered, data compression and encryption will start. Progress and error messages are displayed in the Console box.

    When the encryption completed successfully, the Console should display a message that reads: “Completed data encryption” followed by the location + name of the output file, as illustrated in the example below.

    _images/sett_encrypt_03.png

At this point, all input files are compressed, encrypted and bundled into a single .tar file. Data has not yet been transferred to the intended recipient.

Encrypting data on the command line

The sett command to encrypt data is the following. Note that the SENDER and RECIPIENT values can be specified either as a PGP key fingerprint, or as an email address.

# General syntax:
sett encrypt --sender SENDER --recipient RECIPIENT --dtr-id DATA_TRANSFER_ID --purpose PURPOSE --output-name OUTPUT_FILENAME FILES_OR_DIRECTORIES_TO_ENCRYPT

# Example:
# long command line options:
sett encrypt --sender alice@example.com --recipient bob@example.com --dtr-id 42 --purpose PRODUCTION --output-name test_output ./test_file.txt ./test_directory
# short command line options:
sett encrypt -s alice@example.com -r bob@example.com -t 42 --purpose PRODUCTION -o test_output ./test_file.txt ./test_directory --dry-run

Data can be encrypted for more than one recipient by repeating the flag --recipient, e.g. --recipient RECIPIENT1 --recipient RECIPIENT2 option:

# In this example, Alice encrypts a set of files for both Bob and Chuck.
sett encrypt --sender alice@example.com --recipient bob@example.com chuck@example.com FILES_OR_DIRECTORIES_TO_ENCRYPT

Adding the --dry-run option will run the encrypt command in test mode - i.e. checks are made but no data is encrypted.

The data compression level used by sett can be manually adjusted using the --compression-level option. Compression levels value must be integers between 0 (no compression) and 9 (highest compression). Higher compression levels produce smaller output files but require more computing time, so you may choose a lower level to speed-up compression (e.g. --compression-level=1), or a higher level (e.g. --compression-level=9) to produce smaller output files. The default level is 6.

If you want to automate the encryption process you can use the --passphrase-cmd option with an external command that returns your PGP key password to the standard output. IMPORTANT: make sure that the external command and the password store are secure.

When encrypting data that is intended to be transferred into the BioMedIT network, a valid DTR ID value must be specified via the --dtr-id option. Specifying a --dtr-id value will automatically enable the DTR ID verification and PGP key checking as described the encrypting data with sett-gui section. For data that is not intended to be transferred into the BioMedIT network, the --dtr-id option can be omitted and this will also, by default, skip the DTR ID checks.

DTR ID verification and checking of the sender’s public key and the recipients’ role and public key can also be manually enabled/disabled by using the --verify-dtr and --no-verify-dtr options respectively.

Transferring files

The sett application allows transferring encrypted data to any SFTP or liquid_files enabled remote server. Please note that only files encrypted with the sett application, or files that follow the sett packaging file format specifications can be transferred using sett. In addition, by default, sett verifies that the files to be transferred have a valid DTR (Data Transfer Request) ID value in their metadata. Only files with valid and authorized DTR ID ** **values can be transferred. If you are not transferring files into the BioMedIT network, this behavior should be disabled (instructions below).

Transferring files with sett-gui

To transfer encrypted files:

  1. Go to the Transfer tab of the sett application.

    _images/sett_transfer_01.png
  2. Select encrypted files to transfer: click Add files and select an encrypted .tar file that was generated using the sett application. Note: multiple files can be transferred at the same time by adding more than one file. Note: only .tar files produced by the sett application can be transferred.

  3. Verify DTR ID option:

    • To transfer files into the BioMedIT network, this option is mandatory and must remain selected.
    • To transfer files with no DTR ID value, or a DTR ID value that is not registered within BioMedIT, this option should be deselected.
  4. Select the data transfer Protocol (sftp or liquid_files). The choice depends on the server to which the data should be sent, but in most cases sftp is used.

  5. Set the connection parameters for the transfer:

    • User name: the user name with which to connect to the SFTP/liquid files server.

    • Host URL: URL address of the server where the files should be sent.

    • Destination directory: absolute path of directory where files should be saved on the server.

    • SSH key location: name and full path of the private SSH key used for authentication to the SFTP server. This is only required if the SSH key is in a non-standard location. Only RSA keys are accepted. Note: do not confuse SSH keys - which are used to authenticate yourself when connecting to an SFTP server during file transfer - with PGP keys - which are used to encrypt and sign data.

    • SSH key password: password associated with the private SSH key given under SSH key location.

      BioMedIT

      For BioMedIT users, the SFTP connection parameters User name, Host URL, and Destination directory will be provided by your local BioMedIT node.

  6. You are now ready to transfer the data. Click Transfer selected files and follow the progress of the transfer using the progress bar and the Console box.

Transferring data on the command line

sett command to transfer data:

# General syntax:
sett transfer --protocol=sftp --protocol-args='{"host": "HOST","username":"USERNAME", "destination_dir":"DIR", "pkey":"PRIVATE_RSA_SSH_KEY"}' FILES_TO_TRANSFER
sett transfer --protocol=liquid_files --protocol-args='{"host": "HOST","subject": "SUBJECT", "message": "MESSAGE","api_key":"APIKEY","chunk_size": 100}' FILES_TO_TRANSFER

# Example:
sett transfer --protocol=sftp --protocol-args='{"host":"10.0.73.1","username":"alice", "destination_dir":"/data", "pkey":"~/.ssh/id_rsa"}' encrypted_data.tar

Note that if you are using the Windows Command Prompt, the above syntax must be modified to use double quotes (") instead of single quotes ('), and doubled-double quotes ("") instead of regular double quotes (") around the --protocol-args. Here is an example:

# Example for Windows command prompt:
sett transfer --protocol=sftp --protocol-args="{""host"":""10.0.73.1"",""username"":""alice"", ""destination_dir"":""/data"", ""pkey"":""~/.ssh/id_rsa""}" encrypted_data.tar

For SFTP transfers, an SSH key is required for authentication on the host server. The private SSH key can be provided via 2 mechanisms:

  • Specifying the location of the private key via the pkey argument (see below for more details).

  • Using an ssh agent that will provide access to the private key. In this method, the ssh-agent is automatically detected and no specific input from the user is needed.

    Windows users: please note that the only ssh agent that is currently supported is Pageant, the ssh agent of PuTTY. The Windows OpenSSH agent is not supported.

The --protocol-args value takes a different set of fields for sftp and liquid_files.

In case of sftp:

host
address of remote SFTP server to which files should be transferred. e.g. “10.0.73.1”
username
username with which you connect to the SFTP service.
destination_dir
absolute path of the directory on the SFTP server where files should be transferred.
pkey

path + name of the private SSH key used to authenticate with the SFTP server. This argument is only needed when the authentication method used by the SFTP server is SSH key - if the SFTP server authentication method is OIDC, then this is not needed. You will be prompted for the password of the private key. If you did not protect your private key (⚠ unsecure / not recommended), you have to explicitly specify "pkey_password":"" as an additional argument to the protocol-args

If you do not specify this argument, sett will try to find a suitable ssh keys from a running ssh agent (if available). Windows users: the only ssh agent that is currently supported is “Pageant”, the ssh agent of the “PuTTY” software. The Windows OpenSSH agent is not supported.

Adding the --dry-run option will run the transfer command in test mode, i.e. checks are made but no data is transferred.

Tip

The 2 recommended possibilities to load your ssh private key are:

  • Loading from the ssh agent: don’t specify pkey
  • By loading from a private key file: pass the path to your ssh private key with the pkey field of --protocol-args. You will be prompted for the password.

Tip

To transfer data for a project that is not registered within BioMedIT, don’t specify --dtr_id. If a data transfer id is required, use --no-verify-dtr

In case of liquid_files:

host
address of remote liquid files server to which files should be transferred.
api_key
The API key from liquid files. You can get from your account: “Account Settings” -> “API”.
subject
Optional subject to send together with the package
message
Optional message to send together with the package

To display help for the transfer command: sett transfer --help.

Decrypting files

The sett application allows the decryption and decompression of files in a single step. However, only files encrypted with the sett application, or files that follow the sett packaging file format specifications can be decrypted with sett.

Tip

sett does not support multi-threading during file compression/decompression. This may lead to long processing times in data encryption/decryption when working with large files (e.g. > 100 GB).

Please refer to the Working with large files section of the documentation for details on how to alleviate this issue.

Decrypting data with sett-gui

To decrypt and decompress files:

  1. Go to the Decrypt tab of the sett application (see figure below).

    _images/sett_decrypt_01.png
  2. Select one or more files to decrypt with Add files.

  3. Selecting Decompress will both decrypt and decompress files. Deselecting this option will only decrypt the package, outputting a compressed file named data.tar.gz that contains the decrypted data.

  4. Location: select a location where to decrypt/decompress files.

  5. Click Decrypt selected files to start the decryption/decompression process. A pop-up dialog box will appear asking for the password associated with the PGP key used to encrypt the files.

Decrypting data on the command line

sett command to decrypt data:

# General syntax:
sett decrypt --output-dir=OUTPUT_DIRECTORY ENCRYPTED_FILES.tar

# Example:
sett decrypt --output-dir=/home/alice/data/unpack_dir /home/alice/data/test_data.tar

To display help for the decrypt command: sett decrypt --help.

To decrypt data without decompressing it, add the --decrypt-only option.

If the --output-dir option is omitted, the data is decrypted in the current working directory.

If you want to automate the decryption process you can use the --passphrase-cmd option with an external command that returns your PGP key password to the standard output. IMPORTANT: make sure that the external command and the password store are secure.

Setting up predefined connection profiles for frequent use

To avoid retyping connection settings for every transfer, it is possible to store predefined connection profiles in sett.

You can configure connection profiles using the Transfer tab of the sett-gui interface.

Alternatively, you can modify the sett configuration file. An example config file would look like this:

{
  ...

  "connections": {
    "custom_connection": {
      "protocol": "sftp",
      "parameters": {
        "destination_dir": "upload"
        "host": "server.name.com",
        "pkey": "",
        "username":"chuck_norris",
      }
    }
  }
}

Predefined connections can then be selected from the Connection drop-down menu in the Transfer tab of the sett-gui interface.

On the command line, predefined connections can be passed using the --connection option, as illustrated here. Note that the --protocol and --protocol-args are no longer needed, unless you wish to override part of the predefined connection’s settings:

# General syntax:
sett transfer --connection PREDEFINED_CONNECTION_NAME FILES_TO_TRANSFER

# Example:
sett transfer --connection custom_connection encrypted_data.tar

Working with large files

sett does not support multi-threading during file compression/decompression, and therefore the default encryption and decryption workflows can take a long time when working with large files (> 100 GB).

To alleviate this problem, sett provides the option to skip the compression/decompression steps in both the encryption and decryption workflows. Users can then compress/decompress their data with an external compression tool that supports multi-threading (e.g. pigz on Linux or 7zip on Windows), before encrypting (or after decrypting) it with sett’s compression/decompression option disabled.

Disabling compression/decompression in sett-gui:

  • To disable compression in the encryption workflow, uncheck the Compress input data checkbox in the Encrypt tab.
  • To disable decompression in the decryption workflow, select Decrypt only in the Select data decryption options section of the Decrypt tab.

Disabling compression/decompression in sett command line:

  • To disable compression in the encryption workflow, add the --compression-level=0 option to the sett encrypt command.

  • To disable decompression in the decryption workflow, add the --decrypt-only option to the sett decrypt command.

  • Examples:

    # Data encryption with compression disabled.
    sett encrypt --sender alice.smith@example.com --recipient bob@example.com --dtr-id 42 --compression-level=0 --output_name test_output ./test_file.txt ./test_directory
    
    # Data decryption with decompression disabled.
    sett decrypt --decrypt-only --output_dir=/home/alice/sett_demo/unpack_dir /home/alice/sett_demo/test_data.tar
    

Automating tasks with sett

For users that need to package and transfer data on a regular basis, a number of options exist to simplify the automation of sett tasks. Please refer to the sett automation section for details.

Installing or running sett behind a proxy

If you are running sett behind a proxy, the shell environment variable ALL_PROXY or HTTPS_PROXY must be set. This is the recommended and global way to specify a proxy. Note that, while certain programs support a proxy option (e.g. pip with --proxy), there is currently no such option in sett.

Example:

ALL_PROXY=https://host.domain:port sett-gui

Note

If your proxy is a socks proxy, you need to install sett with socks proxy support:

pip install [ --user ] sett[socks]

In this case also replace the schema https:// with socks5://.

Checking whether you are using a proxy

On Windows, you can check if you are using a proxy server to access the internet by going to Start > Settings > Network & Internet > Proxy (Windows 10). If the slider under Use a proxy server is “off”, no proxy is being used. If you are told that you need to set a proxy, input the Address and Port details and click Save. If in doubt please consult with your IT department.

On Mac OS the proxy information is located under the System Preferences > Network > Advanced > Proxies tab of the network interface, usually Ethernet or Wi-Fi.

Known issues and limitations

SSH private key with non-ASCII characters password

Even though it is possible to create an SSH key pair using a password containing non-ASCII characters, it seems like those characters are encoded differently between different operating systems.

As an SSH key might be moved to an machine with another operating system, or encoding might change with a new version, it is impossible to guess the correct encoding in any cases. For this reason, we recommend not to use non-ASCII characters to protect ssh private keys.

If this is still desired, there is a configurable option ssh_password_encoding available in the sett config file which defaults to utf_8 (this is correct encoding for keys generated with ssh-keygen on linux / mac). For keys generated with ssh-keygen on Windows 10, cp437 should work to correctly encode non-ASCII chars. Example of config file with SSH password encoding set to cp437:

{
    "ssh_password_encoding": "cp437"
}