Encrypting, transferring and decrypting data with sett
Encrypting files
sett allows the encryption of any combination of individual files and directories.
The files are first compressed into a single data.tar.gz
archive, which is
then encrypted with the public key of one or more recipient(s), and signed with
the sender’s key.
The encrypted data (data.tar.gz.gpg
) is then bundled with a
metadata file - a plain text file that contains information about who is
sending the file and to whom it should be delivered - into a single .zip
file. The specifications of the output .zip
files produced by sett are
described in the sett packaging specifications
section.
The following OpenPGP certificates must be present in the local sett certificate store in order to perform data encryption:
The secret certificate of the data sender (i.e. the person encrypting the data). This is needed for signing the data.
The public certificate of all data recipients (people for whom the data is being encrypted). sett supports multi-recipient data encryption, allowing the encrypted file to be decrypted by multiple recipients.
sett also ensures the integrity of the transferred files by computing checksums on each file that is packaged, and adding this information to the encrypted data. The integrity of each file is verified automatically upon decryption of the file by sett, providing the guarantee that all files were transferred flawlessly.
BioMedIT
Data Transfer Requests: each data transfer into the BioMedIT network must have an authorized Data Transfer Request ID (DTR ID). This ID must be specified at the time the data is encrypted (see below). The ID is added to the encrypted file’s metadata information by sett. A valid and authorized DTR ID value is mandatory for any data transfer into the BioMedIT network. Non-compliant packages will be rejected.
Recipients: each data transfer into the BioMedIT network must be to a recipient assigned to the role of Data Manager for the given project. The recipient’s PGP key must also be approved by the BioMedIT key validation authority. If these conditions are not met, sett will not encrypt the data.
Output file naming scheme
By default, encrypted output files produced by sett are named after the pattern:
<project code>_<YYYYMMDD>T<HHMMSS>_<optional suffix>.zip
where:
<project code>
is the abbreviation/code associated with the project. If no DTR ID value was provided or if Verify DTR is disabled, no project code is added as a prefix to the output file name.<YYYYMMDD>
is the current date (Year, Month, Day).<HHMMSS>
is the current time (Hours, Minutes, Seconds).<optional suffix>
is an optional, custom text that can be added to the file name.
Example: demo_20220211T143311_sib.zip
, here demo
is the project code
and sib
is an optional suffix.
The value for the optional suffix can be permanently set in the Settings tab of the sett-gui, or in the sett configuration file.
Using the sett command line, it is possible to completely override the above
output file naming scheme by passing the --output
option. Overriding the
naming scheme is not possible when using sett-gui.
Encrypting data with sett-gui
To encrypt data:
Go to the Encrypt tab of the sett application.
Select files and/or directories to encrypt: using the Add files and Add directory buttons, select at least one file or directory to encrypt.
After adding files/directories, they will be listed in the top box of the tab (see figure above).
Select data sender: in the drop-down list found under Sender, select your own PGP key (you are the data sender). For most users, there should in principle be only one key in the Sender drop-down menu: their own key.
Note
The Sender key is used to sign the encrypted data, so that the recipient(s) can be confident that the data they receive is genuine.
Select data recipients: add one or more recipients by selecting them from the drop-down list found under Recipients and clicking the + button. Recipients are the people for whom data should be encrypted: their public PGP key will be used to encrypt the data, and only they will be able to decrypt it.
BioMedIT
Only recipients assigned to the role of Data Manager of the project for which data is being encrypted are permitted as data recipients.
DTR ID: Data Transfer Request ID associated to the data package that is being encrypted. Specifying a valid DTR ID is mandatory to transfer data into the BioMedIT network.
For data not intended to be transferred into the BioMedIT network, the DTR ID field can be left empty (or set to any arbitrary value). In this case, Verify DTR must be disabled (in the Settings tab).
BioMedIT
DTR ID field is mandatory. Only files encrypted with a valid and authorized DTR ID value can be transferred into the secure BioMedIT network. For this reason, BioMedIT users should always leave the Verify DTR checkbox enabled.
Purpose: purpose of the data transfer, please select either
PRODUCTION
orTEST
, or leave it empty.BioMedIT
This filed is mandatory.
Output suffix (optional): optional suffix value to appended at the end of the file name. A
_
separator is automatically added and does not need to be part of the suffix.Only regular alphanumeric,
-
and_
characters are allowed in the output suffix.The value for the optional suffix can be permanently set in the Settings tab.
For more details on the output file naming scheme used by sett, please refer to the output file naming scheme section.
Output location: directory where the encrypted file should be saved.
By default, output files are saved to the user’s home directory.
Compression level slider: amount of compression to apply to the input data when packaging it.
Compression values range between 0 (no compression) and 9 (highest compression). Higher compression level result in smaller encrypted output files but require more computing time.
The default compression level of 5 offers a good balance between output compression and time needed to perform the task. An illustration of compression ratio vs. time is given in the sett benchmarks section.
If compression is not required, e.g. because the input data is already in a compressed form, the compression level should be set to 0 in order to speed-up the packaging task. Performing compression outside of sett can be useful when working with large files.
Ignore disk space error: disable disk space check before data encryption.
By default, sett verifies that there is enough free disk space available to save the output file before starting to compress and encrypt data. If this is not the case an error message is displayed and the operation is aborted. Since the compression ratio of the input data cannot be known in advance, sett uses the conservative estimate that the minimum disk space required is equal to the total size of all input files to be encrypted.
If users think this is too conservative, this verification can be disable by turning the Ignore disk space error checkbox on.
Encryption Test: a test run of the data to encrypt can be performed by clicking the Test button. This will check that all the specified input files can be found on disk, and run additional checks if Verify DTR settings is enabled (default).
You are now ready to compress and encrypt the data: click Package & Encrypt. A pop-up will appear, asking for the password associated with the sender’s key. After the password is entered, data compression and encryption will start. Progress and error messages are displayed in the Console box.
When the encryption completed successfully, the Console should display a message that reads: “Completed data encryption” followed by the location and name of the output file, as illustrated in the example below.
At this point, all input files are compressed, encrypted and bundled into a
single .zip
file. Data has not yet been transferred to the intended
recipient.
Encrypting data on the command line
The sett command to encrypt data is the following. Note that the SENDER
and RECIPIENT
values can be specified either as a PGP key fingerprint,
or as an email address.
# General syntax:
sett encrypt --sender SENDER --recipient RECIPIENT --dtr-id DATA_TRANSFER_ID --purpose PURPOSE --output OUTPUT_FILENAME_OR_DIRECTORY FILES_OR_DIRECTORIES_TO_ENCRYPT
# Example:
# long command line options:
sett encrypt --sender alice@example.com --recipient bob@example.com --dtr-id 42 --purpose PRODUCTION --output test_output ./test_file.txt ./test_directory
# short command line options:
sett encrypt -s alice@example.com -r bob@example.com -t 42 --purpose PRODUCTION -o test_output ./test_file.txt ./test_directory --dry-run
Data can be encrypted for more than one recipient by repeating the
flag --recipient
, e.g. --recipient RECIPIENT1 --recipient RECIPIENT2
option:
# In this example, Alice encrypts a set of files for both Bob and Chuck.
sett encrypt --sender alice@example.com --recipient bob@example.com chuck@example.com FILES_OR_DIRECTORIES_TO_ENCRYPT
Adding the --dry-run
option will run the encrypt
command in test mode,
i.e. checks are made but no data is encrypted.
The data compression level used by sett can be manually adjusted using the
--compression-level
option. Compression levels value must be integers
between 0
(no compression) and 9
(highest compression). Higher
compression levels produce smaller output files but require more computing
time, so you may choose a lower level to speed-up compression (e.g.
--compression-level=1
), or a higher level (e.g. --compression-level=9
)
to produce smaller output files. The default level is 5
.
Before encrypting data, sett verifies that there is enough free disk space
available on the local machine to save the encrypted output file (relevant is
the current working directory or target folder pointed by --output
).
If this is not the case an error message is displayed and the operation is
aborted. Since the compression ratio of the input data cannot be known in
advance, sett uses the conservative estimate that the minimum disk space
required is equal to the total size of all input files to be encrypted. If
users think this is too conservative (e.g. because they know that their data
compresses well), this verification can be disable by passing the --force
option.
To automate the encryption process, the --passphrase-cmd
option can be used
to specify an external command that returns the PGP key password to the
standard output.
Important
When using --passphrase-cmd
, make sure that the external command and
the password store are secure.
sett performs DTR verification if verify_dtr
is enabled in settings
(default). For non-BioMedIT-related transfers, verify_dtr
should be
set to false
(--dtr-id
and --purpose
are optional in this mode).
BioMedIT
A valid DTR ID is must be specified via the --dtr-id
option and
purpose via --purpose
.
An optional output suffix can be added to the sett output files by passing
the --output-suffix
option. Alternatively, the value for the optional
suffix can be permanently set in the
sett configuration file. For more details on
the output file naming scheme used by sett, please refer to the
output file naming scheme section.
To completely override the sett output file naming scheme, the --output
option can be used to specify the path and name that the output file should
have.
Transferring files
Data packages can be transferred to remote servers that support one of the following protocols:
SFTP
S3 object storage
Liquid Files
Important
Only files encrypted with sett, or files that follow the sett packaging specifications can be transferred using sett.
By default, sett verifies data packages before initializing file transfers. These checks are required within the BioMedIT network, but can be skipped in other contexts by disabling Verify DTR, Verify package name, and/or Verify key approval checkboxes in the application settings.
Verify DTR: A valid and authorized DTR (Data Transfer Request) ID is required in the package metadata.
Verify package name: Package name must match the pattern
<project_code>_<date-format>_<package_name_suffix>.zip
, where:project_code is the abbreviated project name of the project whose DTR ID was used during data encryption. If no DTR ID value was given at encryption time, or if checking DTR ID is disabled, the pattern against which files named are verified becomes
<date-format>_<package_name_suffix>.zip
.date_format is the date and time when the data package was created, as specified in the sett packaging format.
package_name_suffix is the optional suffix appended to packages names when they are created. When using sett-gui, the suffix value is taken from the Encrypt tab (which itself can be taken from the config file). When using the command line, the suffix value is taken from the sett configuration file.
The objective of this verification is to avoid that users mistakenly include sensitive information in data package file names.
Verify key approval: Verify tha the PGP keys (sender and recipients) have been approved by the central authority.
Transferring files with sett-gui
To transfer encrypted files:
Go to the Transfer tab of the sett application.
Select encrypted files to transfer: click Add files and select a
.zip
file that was generated using the sett application.Multiple files can be transferred at the same time by adding more than one file.
Only
.zip
files produced by the sett application can be transferred.
Select the data transfer Protocol (sftp, s3 or liquid_files). The choice depends on the server to which the data should be sent, but in most cases sftp is used.
Set the connection parameters for the transfer:
User name: the user name with which to connect to the SFTP/liquid files server.
Host URL: URL address of the server where the files should be sent.
Destination directory: absolute path of directory where files should be saved on the server.
SSH key location: name and full path of the private SSH key used for authentication to the SFTP server. This is only required if the SSH key is in a non-standard location. Only RSA keys are accepted.
Do not confuse SSH keys - which are used to authenticate yourself when connecting to an SFTP server during file transfer - with PGP keys - which are used to encrypt and sign data.
SSH key password: password associated with the private SSH key given under SSH key location. If your SSH key password contains characters that are not ASCII characters, and that this results in an error, please see the SSH private key with non-ASCII characters section of this guide.
BioMedIT
For BioMedIT users, the SFTP connection parameters User name, Host URL, and Destination directory will be provided by your local BioMedIT node.
You are now ready to transfer the data. Click Transfer selected files and follow the progress of the transfer using the progress bar and the Console box.
Transferring data on the command line
sett command to transfer data:
# General syntax:
sett transfer --protocol sftp --protocol-args '{"host": "HOST","username":"USERNAME", "destination_dir":"DIR", "pkey":"PRIVATE_RSA_SSH_KEY"}' FILES_TO_TRANSFER
# 'session_token' is STS credentials specific
sett transfer --protocol s3 --protocol-args '{"host": "localhost:9000", "secure": false, "bucket":"BUCKET", "access_key":"ACCESS_KEY", "secret_key":"SECRET_KEY", "session_token":"SESSION_TOKEN"}' <files to transfer>
sett transfer --protocol liquid_files --protocol-args '{"host": "HOST","subject": "SUBJECT", "message": "MESSAGE","api_key":"APIKEY","chunk_size": 100}' FILES_TO_TRANSFER
# Example:
sett transfer --protocol sftp --protocol-args '{"host":"10.0.73.1","username":"alice", "destination_dir":"/data", "pkey":"~/.ssh/id_rsa"}' encrypted_data.zip
Note that if you are using the Windows Command Prompt, the above syntax
must be modified to use double quotes ("
) instead of single quotes
('
), and doubled-double quotes (""
) instead of regular double quotes
("
) around the --protocol-args
. Here is an example:
# Example for Windows command prompt:
sett transfer --protocol sftp --protocol-args "{""host"":""10.0.73.1"",""username"":""alice"", ""destination_dir"":""/data"", ""pkey"":""~/.ssh/id_rsa""}" encrypted_data.zip
For SFTP transfers, an SSH key is required for authentication on the host server. The private SSH key can be provided via 2 mechanisms:
Specifying the location of the key via the
pkey
argument of--protocol-args
. See below for more details.Use an SSH agent to provide the key. The SSH agent is automatically detected by sett and no specific input from the user is needed. In ths case, the
pkey
argument should be skipped.
The --protocol-args
value takes a different set of fields depending on the
protocol being used:
sftp
protocol arguments:- host
Address of remote SFTP server to which files should be transferred, e.g.
"host":"10.0.73.1"
.Connecting to a specific port on the SFTP server can be done using the syntax:
"host":"10.0.73.1:3111"
(here3111
is the port to use). If no port is specified, port22
is used by default.
- username
User name with which to connect to the SFTP service.
- destination_dir
Absolute path of the directory on the SFTP server where files should be transferred.
- pkey
Path + name of the private SSH key used to authenticate with the SFTP server. This argument is only needed if the authentication method used by the SFTP server is SSH key (not needed with OIDC).
This argument can also be skipped if the SSH key is provided via an SSH agent - i.e. if
pkey
is missing, sett will try to find a suitable SSH key from a running agent.
- pkey_password
If a private SSH key is passed via the
pkey
argument, its password should provided via this argument. If the key is not password-protected (⚠ unsecure - not recommended),"pkey_password":""
must be passed. If this argument is missing, the user will be manually prompted to enter the password. If your SSH key password contains non-ASCII characters, and that this results in an error, please see the SSH private key with non-ASCII characters section of this guide.
liquid_files
protocol arguments:- host
Address of remote liquid files server to which files should be transferred.
- api_key
The API key from liquid files. You can get from your account: “Account Settings” -> “API”.
- subject
Optional subject to send together with the package.
- message
Optional message to send together with the package.
Adding the --dry-run
option will run the transfer
command in test mode,
i.e. checks are made but no data is transferred.
To display help for the transfer command: sett transfer --help
.
Decrypting files
The sett application allows the decryption and decompression of files in a single step. However, only files encrypted with the sett application, or files that follow the sett packaging specifications can be decrypted with sett.
The following OpenPGP certificates must be present in the local sett certificate store in order to decrypt data:
The public certificate of the data sender (i.e. the person who encrypted the data). This is needed to verify the authenticity of the data package.
The secret certificate of the data recipient (the person for whom the data has been encrypted).
Decrypting data with sett-gui
To decrypt and decompress files:
Go to the Decrypt tab of the sett application (see figure below).
Select one or more files to decrypt with Add files.
Selecting Decompress will both decrypt and decompress files. Deselecting this option will only decrypt the package, outputting a compressed file named
data.tar.gz
that contains the decrypted data.Location: select a location where to decrypt/decompress files.
Click Decrypt selected files to start the decryption/decompression process. A pop-up dialog box will appear asking for the password associated with the PGP key used to encrypt the files.
Decrypting data on the command line
sett command to decrypt data:
# General syntax:
sett decrypt --output-dir=OUTPUT_DIRECTORY ENCRYPTED_FILES.zip
# Example:
sett decrypt --output-dir=/home/alice/data/unpack_dir /home/alice/data/test_data.zip
To display help for the decrypt command: sett decrypt --help
.
To decrypt data without decompressing it, add the --decrypt-only
option.
If the --output-dir
option is omitted, the data is decrypted in the current
working directory.
If you want to automate the decryption process you can use the
--passphrase-cmd
option with an external command that returns your PGP key
password to the standard output.
IMPORTANT: make sure the external command and the password store are secure.
PGP key auto-download and auto-refresh in sett
To ensure that all PGP keys used for data encryption and signing are trustworthy and up to date, sett is configured by default to work in conjunction with the OpenPGP keyserver keys.openpgp.org and the BioMedIT portal. If you are using sett outside of BioMedIT, you will need to disable the usage of the BioMedIT portal via the sett configuration options.
The keyserver is an online server where users upload their public PGP keys, so that people who need these keys - to encrypt data or verify a signature - can easily obtain them.
PGP keys used by sett (data recipients and sender keys) are automatically refreshed from the keyserver each time they are about to be used. This ensures that an up-to-date version of the keys is used. For instance, if a key is revoked, then the user’s local copy of the key will be automatically updated with this information.
If needed, the auto-download/auto-refresh of PGP keys can be disabled via the Allow PGP key auto-download checkbox in the Settings Tab.
The BioMedIT portal is an online service where users can (among other things) register their PGP key and get it approved for usage within BioMedIT. Just before a key is used, sett connects to the portal to verify that it has been approved.
For non-BioMedIT users, key approval should be disabled via the Verify key approval checkbox in the Settings Tab.
BioMedIT
Allow PGP key auto-download and Verify key approval must be enabled.
Setting up predefined connection profiles for frequent use
To avoid retyping connection settings for every transfer, connection profiles can be saved in sett.
You can configure and save connection profiles using the Transfer tab of the sett-gui interface.
Alternatively, you can modify the sett configuration file. An example config file would look like this:
{
...
"connections": {
"custom_connection": {
"protocol": "sftp",
"parameters": {
"destination_dir": "upload"
"host": "server.name.com",
"pkey": "",
"username":"chuck_norris",
}
}
}
}
Predefined connections can then be selected from the Connection drop-down menu in the Transfer tab of the sett-gui interface.
On the command line, predefined connections can be passed using the
--connection
option, as illustrated here. In that case, the --protocol
and --protocol-args
are no longer needed, unless you wish to override part
of the predefined connection’s settings:
# General syntax:
sett transfer --connection PREDEFINED_CONNECTION_NAME FILES_TO_TRANSFER
# Example:
sett transfer --connection custom_connection encrypted_data.zip
Working with large files
sett does not support multi-threading during file compression/decompression, and therefore the default encryption and decryption workflows can take a long time when working with large files (> 100 GB).
To alleviate this problem, sett provides the option to skip the
compression/decompression steps in both the encryption and decryption
workflows. Users can then compress/decompress their data with an external
compression tool that supports multi-threading (e.g. pigz
on Linux or
7zip
on Windows), before encrypting (or after decrypting) it with
sett’s compression/decompression option disabled.
Disabling compression/decompression in sett-gui:
To disable compression in the encryption workflow, uncheck the Compress input data checkbox in the Encrypt tab.
To disable decompression in the decryption workflow, select Decrypt only in the Select data decryption options section of the Decrypt tab.
Disabling compression/decompression in sett command line:
To disable compression in the encryption workflow, add the
--compression-level=0
option to thesett encrypt
command.To disable decompression in the decryption workflow, add the
--decrypt-only
option to thesett decrypt
command.Examples:
# Data encryption with compression disabled. sett encrypt --sender alice.smith@example.com --recipient bob@example.com --dtr-id 42 --compression-level=0 --output test_output ./test_file.txt ./test_directory # Data decryption with decompression disabled. sett decrypt --decrypt-only --output_dir=/home/alice/sett_demo/unpack_dir /home/alice/sett_demo/test_data.zip
Split and transfer large files
Transferring a single large files can sometimes prove problematic, e.g. if the connection is unstable. When encountering such problems, one workaround is to split the data to encrypt and transfer into smaller chunks.
Windows - using 7-zip
On Windows, the open source utility 7zip can be used to compress and split data.
With 7zip installed, right-click on the folder you want to compress and split: select
7-Zip > Add to archive...
.In the 7zip dialog box, set the following values:
Click on
...
to change the name and location of the output archive file.Set
Archive Format
tozip
.Set
Split to volumes, bytes
to an appropriate size in Megabytes (M) or Gigabytes (G), e.g.100G
for 100 gigabytes chunk size.
The output filenames will be numbered like this:
archive.zip.001
,archive.zip.002
etc.Encrypt these archive files using sett without additional compression, see instructions above.
Once data is transferred and decrypted at the destination, if can be inflated
again. This can be done with the unzip
command line utility:
unzip archive.zip.0xx
, where0xx
is the last file, i.e. the file with the highest number. It contains the file structure.
Note: 7zip can also be used in command line.
"C:\Program Files\7-Zip\7z.exe" a -v100g "archive.zip" "folder/to/be/archived"
The
-v100g
option creates 100 Gigabytes chunks (volumes).Several files/directories to compress can be specified after
"archive.zip"
, and asterisks*
can be used to match multiple files, e.g."data/*.csv"
.
Mac OS and Linux - command line
On Mac OS and Linux, the zip
command line utility can be used to
split and compress data.
In the terminal,
cd
to the directory containing the files/directories to compress.Run the command:
zip -s 100g archive.zip <file(s) or directory(ies)>
. In this example, data will be split into 100 GB chunks, but the size of chunks can be set as appropriate by replacing100g
with the desired value.This will create numbered files named
archive.zip
,archive.z01
,archive.z02
, etc. Thearchive.zip
is the last file created and contains the file structure.Encrypt these archive files using sett without additional compression, see instructions above.
Once data is transferred and decrypted at the destination, it can be inflated
using the command unzip archive.zip
.
Mac OS with GUI - using Keka
On Mac OS, people preferring to use a GUI than command line can use Keka, a compression app with Finder integration. The app is 5 CHF via Mac the App Store or free via their website (donations welcome).
Open Keka, and specify the split size in Megabytes or Gigabytes, e.g.
100 GB
.In the Settings menu, activate the Finder integration.
Right-click on a directory to compress it.
Automating tasks with sett
For users that need to package and transfer data on a regular basis, a number of options exist to simplify the automation of sett tasks. Please refer to the sett automation section for details.
Installing or running sett behind a proxy
If you are running sett behind a proxy, the shell environment variable
ALL_PROXY
or HTTPS_PROXY
must be set. This is the recommended and
global way to specify a proxy.
Note that, while certain programs support a proxy option (e.g. pip
with
--proxy
), there is currently no such option in sett.
Example:
ALL_PROXY=https://host.domain:port sett-gui
Note
If your proxy is a socks proxy, you need to install sett with socks proxy support:
pip install [ --user ] sett[socks]
In this case also replace the schema https://
with socks5://
.
Checking whether you are using a proxy
On Windows, you can check if you are using a proxy server to access the internet by going to Start > Settings > Network & Internet > Proxy (Windows 10). If the slider under Use a proxy server is “off”, no proxy is being used.
If you are told that you need to set a proxy, input the Address and Port details and click Save. If in doubt please consult with your IT department.
On Mac OS the proxy information is located under the System Preferences > Network > Advanced > Proxies tab of the network interface, usually Ethernet or Wi-Fi.
Known issues and limitations
SSH private key with non-ASCII characters password
Even though it is possible to create an SSH key pair using a password containing non-ASCII characters, it seems like those characters are encoded differently between different operating systems.
As an SSH key might be moved to an machine with another operating system, or encoding might change with a new version, it is impossible to guess the correct encoding in any cases. For this reason, we recommend not to use non-ASCII characters to protect SSH private keys.
If this is still desired, there is a configurable option
ssh_password_encoding
available in the sett config file which
defaults to utf_8
(this is correct encoding for keys generated with
ssh-keygen
on linux / mac).
For keys generated with ssh-keygen
on Windows 10, cp437
should work
to correctly encode non-ASCII chars. Example of config file with SSH password
encoding set to cp437
:
{
"ssh_password_encoding": "cp437"
}