Scaling Hyperscience

Overview

The Hyperscience Platform consists of several different components, each of which has to be scaled independently in large deployments. For a refresher, see the AWS Reference Design. Some parts, like blocks and the trainer, can benefit from autoscaling. However, for the time being, others need static configuration through the values.yaml file.

In general, the way autoscaling works is that the Hyperscience Platform exposes the number of pending tasks that have yet to be completed. The HyperOperator uses that data to decide whether or not to scale the respective worker - either the Trainer or a certain type of Block. In Kubernetes terms, the HyperOperator dynamically changes the number of replicas of the appropriate resource.

Because the feature spans multiple services, there are some additional requirements to set it up.

Autoscaling prerequisites

IMPORTANT: The Hyperscience deployment only scales Pods used by the application. The scaling of the underlying cluster infrastructure is out of scope of this article.

To make the most out of the autoscaling features, the Kubernetes cluster should also have some form of Node Autoscaling enabled. The Cluster Autoscaler project is a popular, cloud-agnostic choice. For AWS's EKS, Karpenter is worth exploring.

Also, the following versions must be installed:

Helm chart >= 8.7.1
Hyperscience version >= 37.0.5
HyperOperator version >= 5.4.3

As a reminder, please use the upstream value of the operator.tag and do not keep it in your values.yaml. To get an up-to-date version of the Helm chart options, use the following command:

helm repo update
helm show values $HS_HELM_CHART

Trainer

The Trainer component can train models that will automate document classification and the identification of fields and tables, among other tasks. It is not required to be always running and available for the system to process submissions. Note, however, that it is best that the trainer uses a separate node group by setting the appropriate node labels and nodeAffinity rules.

Manual scaling

By default, installing the Helm chart will not create a trainer. To create one, and optionally scale it manually, set the following fields:

trainer:
  tags:
    - container_version: 37.0.8
      replicas: 1

Autoscaling

With trainer autoscaling enabled, the HyperOperator will only start a trainer job when one is scheduled from the Hyperscience application. To enable trainer autoscaling, set the following fields in your values.yaml:

trainer:
  autoscaling:
    enabled: true
  tags:
    - 37.0.8 # Autoscaling will be enabled for this version

For a more in-depth configuration, take a look at the available trainer.autoscaling keys in the default Helm chart values. For example, the number of trainer jobs that can run in parallel defaults to 1.

Blocks

To process a submission end to end, several tasks are queued by the Hyperscience Platform. Depending on the type of the task, a dedicated worker picks it up and processes it. However, the type and number of tasks required varies a lot. It depends on the use case - structured, semi-structured, or unstructured; the flow specification; the number of submissions coming into the system; etc. Guessing those numbers in advance and provisioning the necessary infrastructure is a challenge.

There are two approaches to block scaling - manual and auto.

IMPORTANT: Using both manual and auto scaling for the same block is not supported. The manual configuration takes precedence if both are enabled.

Manual scaling

The number of replicas for each block can be independently configured through an environment variable. Below is an example that increases the scale of the FULL_PAGE_OICR block to 2.

blocks:
  env:
    BLOCK_FULL_PAGE_OICR_SCALE: "2"

Similarly, each block type can be scaled with a parameter that looks like BLOCK__SCALE. Here is a list of block names that can be used as replacements:

CUSTOM_CODE
CUSTOM_ENTITY_DETECTION
DB_ACCESS
DISTRIBUTE_TO_STRUCTURED_DOCUMENTS_2
DOCUMENT_REPLACEMENT
EASY_OCR
EXPORT_EMAIL
EXPORT_UIPATH
FIELD_LOCATOR_2
FINETUNE_2
FULL_PAGE_OICR
HTTP_DOWNLOADER
HTTP_EXPORT
IMAGE_CORRECTION
IMAP_TRIGGER
KOFAX_FOLDER_TRIGGER
MQ_LISTENER
MQ_NOTIFIER
NER
NLC_2
NORMALIZER
OCS_DOWNLOADER
OICR
PYTHON_CODE
S3_DOWNLOADER
SALESFORCE_NOTIFIER
SALESFORCE_TRIGGER
SDM_DOWNLOADER
SEGMENTATION
SLACK_NOTIFIER
SOAP_REQ
TABLE_LOCATOR_2
TEXT_CLASSIFICATION
VPC_2

Autoscaling

Similar to how the trainer scales only when there is work to do, block replicas are scaled only when the Hyperscience Platform reports an increase in pending tasks. To enable block autoscaling, set the following fields in your values.yaml:

blocks:
  autoscaling:
    enabled: true

For a more in-depth configuration, take a look at the available blocks.autoscaling keys in the default Helm chart values. For example, the default limit for maximum replicas is 2, while we suggest setting it to the desired number of nodes (assuming they are 8 core, 32GB memory).

Frontend and backend

For now, autoscaling does not apply to frontend and backend deployments. Their replica count can be modified from the values.yaml file:

app:
  replicas:
    backend: 1
    frontend: 1
    hyperflow_engine: 1
    idp_sync_manager: 1

As a rough recommendation, each of those should be scaled linearly with the number of block workers in the system. For example, if there are N blocks, the following ratios are suggested:

app:
  replicas:
    backend: N / 80
    frontend: N / 30
    hyperflow_engine: N / 20
    idp_sync_manager: N / 6

If autoscaling is enabled, then N will be equal to the number of processing blocks used by the flow, multiplied by the value of blocks.autoscaling.max_replicas. If not, N will be the sum of all _SCALE variables.

Node distribution and high availability

When the number of worker, frontend, and backend pods starts piling up, it's best if some guidance is given to Kubernetes on how to distribute them across the nodes. TopologySpreadConstraints comes to the rescue. It helps spread the load more evenly and protect from downtime in case of node failure:

topologySpreadConstraints:
  node:
    enabled: True
    maxSkew: 1
    whenUnsatisfiable: DoNotSchedule
  zone:
    enabled: True
    maxSkew: 5
    whenUnsatisfiable: ScheduleAnyway

Another feature that helps reduce downtime is the Pod Disruption Budget. It is enabled by default with some sensible values, but they can be modified or disabled completely:

app:
  # Can be either `frontend`, `backend`, `hyperflow_engine` or `idp_sync_manager`
  backend:
    podDisruptionBudget:
      enabled: true
      minAvailable: 1
      maxUnavailable: ""

Fine-tuning

Setting all the configuration values correctly in this dynamic system will be an iterative process and highly individualized for every installation. Below is a rough guideline on how many pages per hour can the system process, based on the size of the underlying infrastructure. The table assumes 8 core, 32GB memory machines, a db.m5.xlarge database (4 vCPU and 16GB memory) and approximately 20 fields per page. For more specific recommendations, please contact your Hyperscience representative.

Document distribution	2 VMs	4 VMs	8 VMs
100% structured	1500	4000	8000
50% structured 50% semi-structured	1000	2000	4500
100% semi-structured	800	1500	3000