Infrastructure Overview

Hyperscience is installed on Linux Virtual Machines (VMs) using Docker or Podman containers. Those VMs can be on-premise or in a private cloud. The application is accessible through a web application and API. It requires an application database, a load balancer, and a shared file store.

To meet High Availability and Disaster Recovery (HA/DR) goals, we recommend setting up Hyperscience across two data centers or availability zones. The diagram below demonstrates the recommended setup.

Infrastructure Flow.png

System components

While you can customize your implementation of Hyperscience to meet your organization’s needs, each instance of Hyperscience typically contains the components described below.

Application

When users log into Hyperscience, they are logging into the application. The application offers a user interface for the tasks associated with processing submissions, changing accuracy settings, setting up connectors, and more. In the diagram above, the application runs in the "VMs Running Hyperscience."

If application machines are removed from the instance while it is running, any submission jobs or tasks that were being processed by those machines are reassigned to the available machines in the instance after the timeout expires. Therefore, while there may processing delays, no jobs or tasks fail if machines are removed, nor do any submissions halt.

Trainer

With a trainer, you can train models that will automate document classification and the identification of fields and tables, among other tasks. The trainer connects to the application via the API.

Load balancer

As part of HA/DR best practices, we recommend deploying the application on multiple VMs and using the load balancer to distribute web requests.

File storage

As the application processes submissions, it saves images of their pages in file storage. The file store does contain personally identifiable information (PII), which can be deleted at intervals of your choosing.

Database

The application stores all transcribed and extracted submission data in the database. The data gathered here powers the machine-learning capabilities that allow Hyperscience to automate the data-extraction process. The database also contains application configuration data and other system information.

Logs

The logs store internal debugging data that helps us resolve support requests.

Environments

Hyperscience on-premise and private cloud customers typically run 3 environments - Development, User Acceptance Testing, and Production. In non-production environments, we recommend that customers replicate the production environment in terms of VM specifications, type of NAS, database type, etc. Separate databases, file stores, and VMs are required for each environment.