Hyperscience Infrastructure Prerequisites

Introduction

Purpose

This article describes the infrastructure prerequisites necessary to install the Hyperscience Application and Trainer. It is oriented toward the System Administrators who will perform this task.

After you complete these steps, you can provision the VMs by following the Hyperscience Virtual Machine Prerequisites article.

Scope

This article is valid for both the Hyperscience Application and Trainer, whether you are installing them for the first time or adding more machines to an existing cluster.

In the most common use case, the following items need to be provisioned and considered as part of the setup:

  • Application

    • Database

    • File Store

    • Load Balancing

    • Authentication

    • Security (SSL/TLS)

  • Trainer

    • Security (SSL/TLS)

Prerequisites

Application

Database

The Hyperscience Application connects to and stores data in a standalone database that you need to provision. All application machines will connect to that database.

We support the databases that are listed in our Infrastructure Requirements article.

Depending on the database you are going to use, follow the relevant guides to prepare the database:

Most of the configuration articles refer to the “.env” file, which stores environment-specific variables like API keys and database credentials outside the source code, making configuration easier and more secure. Keep a record of those variables, as you will be adding them to that file during the installation process.

File store

A file store that holds document images needs to be created. All application machines connect to it and write/read data there. That file store should not be altered once you start using it, and data on it should be persistent. No deletion schedules should be set on it, otherwise there is a risk of data loss.

There are several options available for the file store. For instructions for each option, refer to the following guides:

Depending on the chosen method of storage, record the necessary “.env” variables that will be set during the installation process.

Load Balancing

As part of High Availability and Disaster Recovery (HA/DR) best practices, we recommend deploying the application on multiple machines and using a load balancer to distribute web requests.

For more information on our recommendations, refer to the Load Balancer article.

HS_CSRF_TRUSTED_ORIGINS

In v38 and later, we validate POST requests' “Origin header” against a list of trusted domains. If you are using a load balancer, you need to set the environment variable HS_CSRF_TRUSTED_ORIGINS to the load balancer's domain. Otherwise, "authentication required" errors may occur.

The value should be the complete URL of the load balancer, including the protocol (e.g., https://example-domain.com).  If you are using a subdomain, a wildcard character must be used (e.g., https://*.example-subdomain.com).

If you want to set multiple trusted domains, the value must to be a string that includes all the domains in a comma-separated list:

 HS_CSRF_TRUSTED_ORIGINS=https://example-domain.com,https://*.example-subdomain.com

Authentication

The  application supports both built-in user management and authentication through single sign-on (SSO). In most cases, it is easier to initially set up the application using our built-in user management and then configure SSO if necessary. For more information on the authentication options, refer to their articles:

If SSO would be used, all relevant authentication groups need to be set prior to implementing this option.

Depending on the chosen method of authentication, record the necessary “.env” variables that will be set during the installation process.

Security (SSL/TLS)

Connections to and from the application’s web user interface can be established either via HTTP or HTTPS over ports 80 or 443, respectively.

Depending on the chosen method, you must ensure that firewall rules allow connection to the VMs running the application through those ports.

If you want to use TLS/HTTPS for inbound connection, you must provide a certificate and a key in PEM format.

If you want to use TLS/HTTPS for outbound connections, you need to provision a custom bundle of CA certificates to validate against when establishing an HTTPS or TLS connection. This bundle will replace the default set of trusted CA certificates, so it must include all root certificates used by external services.

Make sure you provision the necessary certificates ahead of the installation process, and keep a record of the values for the “.env” variables listed in the Security article.

Trainer

Database

Local database

The trainer uses its own local PostgreSQL database; there is no need to set up a database for the trainer.

Security (SSL/TLS)

The trainer connects to the application or its load balancer via API using either HTTP or HTTPS over ports 80 or 443, respectively.

Depending on the chosen method, you must ensure that firewall rules allow connection to the VMs running the application and trainer through those ports.

If TLS/HTTPS is to be used for connecting the trainer to the application VM or the load balancer, a custom bundle of CA certificates needs to be provisioned to validate against when establishing the connection. This bundle will replace the default set of trusted CA certificates, so it must include all root certificates used by the application/load balancer.

The trainer only uses outbound connection to the application, as it has no front-end user interface. Therefore, you do not need to set up SSL for inbound connections.

Make sure you provision the necessary certificates ahead of the installation process, and keep a record of the values for the “.env” variables listed in the Security article.