Trainer Installation

Long-running tasks, including the training of models for Semi-structured documents, Find Potential Layouts, and Transcription Accuracy Training, are done on a trainer application instance deployed on a separate VM from the main application. This separation enhances resource and performance isolation between the trainer and the main application. The trainer connects to the main application via the API.

We do not recommend installing the trainer on the same machine as the Hyperscience application — doing so could affect performance.

Requirements

To learn about the technical requirements for the trainer, see Infrastructure Requirements.

Installing the trainer

Use local storage with the trainer. Do not use shared storage, especially if you have multiple trainers of the same version. Using shared storage may cause data to be overwritten and training jobs to fail.

1. Untar the Hyperscience bundle on the trainer server. 

If the HS_PATH is the default (i.e., /mnt/hs), use the following command to clear the .env file, as the Trainer does not need it:

rm .env
touch .env

If the HS_PATH is not the default, do not clear the .env file. Instead, set the HS_PATH variable in the .env file to your custom directory (e.g., HS_PATH=/app/HS). 

Depending on your version of Hyperscience, you need to create a specific local directory under HS_PATH. See what local directory you need to create in the table below.

v30 and earlier

v31.0.0. to v31.0.11

v31.0.12+

v32.0.0 to v32.0.2

v32.0.3+ and other major versions

media

media

trainer_media

media

trainer_media

To create the “media” directory, use the following command:

mkdir -p /mnt/hs/media
chown 1000:1000 /mnt/hs/media

To create the “trainer_media” directory, use the following command:

mkdir -p /mnt/hs/trainer_media
chown 1000:1000 /mnt/hs/trainer_media

2. If SELinux is enabled, run the required SELinux-specific commands.

Enter one of the following sets of commands:

Hyperscience v28 or earlier

chcon -t container_file_t /mnt/hs/media
chcon -R -t container_file_t /mnt/hs/postgresql

Hyperscience v30, v31.0.0 to v31.0.11, and v32.0.0 to v32.0.2

chcon -t container_file_t /mnt/hs/media
mkdir -p /mnt/hs/postgres_trainer_
chcon -R -t container_file_t /mnt/hs/postgres_trainer_

Hyperscience version numbers are formatted as follows:

..

For example, if your application version is 30.0.7, you would enter:

chcon -t container_file_t /mnt/hs/media
mkdir -p /mnt/hs/postgres_trainer30_0
chcon -R -t container_file_t /mnt/hs/postgres_trainer30_0

Hyperscience v31.0.12+, v32.0.3+, and other major versions

chcon -t container_file_t /mnt/hs/trainer_media
mkdir -p /mnt/hs/postgres_trainer_
chcon -R -t container_file_t /mnt/hs/postgres_trainer_

Hyperscience version numbers are formatted as follows:

..

For example, if your application version is 32.0.3, you would enter:

chcon -t container_file_t /mnt/hs/trainer_media
mkdir -p /mnt/hs/postgres_trainer32_0
chcon -R -t container_file_t /mnt/hs/postgres_trainer32_0

3. Run the trainer.

Run the following command on the instance the trainer is deployed to:

sudo bash run.sh trainer  

For example, with values filled in:

sudo bash run.sh trainer https://abcd123.internal 187d88d6s63929cd0ad98

Details for the parameters:

  • - This is the URL of the main application. This must be specified without the trailing slash.

  • - This is the access token for a user provisioned in the main application that has the API Access permission enabled.

    • We recommend creating a service account, or a "user" whose token will provide API access to the trainer and other services.

    • For information on obtaining tokens, see “Managing API Tokens” for your version of Hyperscience ( v35 | v36 | v37 | v38 | v39 | v40 ).

Note that the trainer connects to the main Hyperscience application via the API. Whenever you change the configuration of the connection between the trainer and the main application (e.g. the URL of the Hyperscience application changes, the application load balancer URL changes, a new API authentication token is used, SSL is configured, LDAP is configured, etc.), you will have to restart the trainer by rerunning the command above.

If you would like to configure the trainer to connect to the application over TLS, follow the guidelines in the "TLS configuration for outbound connections" section of Security.

Changes to the application ".env" file for trainer authentication

In v27.0.8+, v28.0.10+, v28.2.3+, and v30+, we have security measures to:

  • enforce the use of a single authentication method, and

  • periodically invalidate API tokens for API users created with external authentication methods. 

Connecting the trainer with a local user’s credentials

If you're using an external authentication method, you can still connect your trainer to the application through a local user. To do so, you need to add the TRAINER_USER variable to your ".env" file. If you are upgrading Hyperscience, adding this variable ensures that your trainer's access to the application will not be interrupted after upgrading. 

Including TRAINER_USER in your “.env” file creates a local user with the username you specify, if it doesn’t already exist.

To add and activate the TRAINER_USER variable:

  1. Add the TRAINER_USER variable to your application's ".env" file, with the username of your trainer's user as its value:

    TRAINER_USER=<username_of_trainer's_user>
  2. Start the application by running the following commands on your application:


    sudo bash run.sh init
    sudo bash run.sh
  3. If you have system_admin privileges:

    1. Log in to the application.

    2. Click on Users, and click on the username you entered in the TRAINER_USER variable.

    3. Find the user's Authentication Token, and click Copy.

    4. Click Done.

  4. Start the trainer by running the following command:

    sudo bash run.sh trainer <application_url> <user's_copied_token>

Once you start the trainer with the new token, your trainer's user will be automatically added to the list of exempted users.

Editing the TRAINER_USER variable

Editing the TRAINER_USER variable will create a new user with the username you enter, if it doesn’t already exist. It will not remove the user previously created through the TRAINER_USER variable.

Connecting the trainer through an external authentication method

If you are connecting the trainer through a user created with your external authentication method, you must add the trainer's user name to the TOKEN_REVALIDATION_EXEMPTED_USERS variable. You should not add the TRAINER_USER variable to your ".env" file. For more information, see External Authentication Methods and API Users.