Overview
The Hyperscience Platform consists of several different components, each of which has to be scaled independently in large deployments. For a refresher, see the AWS Reference Design. Some parts, like blocks and the trainer, can benefit from autoscaling. However, for the time being, others need static configuration through the values.yaml
file.
In general, the way autoscaling works is that the Hyperscience Platform exposes the number of pending tasks that have yet to be completed. The HyperOperator uses that data to decide whether or not to scale the respective worker - either the Trainer or a certain type of Block. In Kubernetes terms, the HyperOperator dynamically changes the number of replicas of the appropriate resource.
Because the feature spans multiple services, there are some additional requirements to set it up.
Autoscaling prerequisites
IMPORTANT: The Hyperscience deployment only scales Pods used by the application. The scaling of the underlying cluster infrastructure is out of scope of this article.
To make the most out of the autoscaling features, the Kubernetes cluster should also have some form of Node Autoscaling enabled. The Cluster Autoscaler project is a popular, cloud-agnostic choice. For AWS's EKS, Karpenter is worth exploring.
Also, the following versions must be installed:
Helm chart >=
8.7.1
Hyperscience version >=
37.0.5
HyperOperator version >=
5.4.3
As a reminder, please use the upstream value of the operator.tag
and do not keep it in your values.yaml
. To get an up-to-date version of the Helm chart options, use the following command:
helm repo update
helm show values $HS_HELM_CHART
Trainer
The Trainer component can train models that will automate document classification and the identification of fields and tables, among other tasks. It is not required to be always running and available for the system to process submissions. Note, however, that it is best that the trainer uses a separate node group by setting the appropriate node labels and nodeAffinity rules.
Manual scaling
By default, installing the Helm chart will not create a trainer. To create one, and optionally scale it manually, set the following fields:
trainer:
tags:
- container_version: 37.0.8
replicas: 1
Autoscaling
With trainer autoscaling enabled, the HyperOperator will only start a trainer job when one is scheduled from the Hyperscience application. To enable trainer autoscaling, set the following fields in your values.yaml
:
trainer:
autoscaling:
enabled: true
tags:
- 37.0.8 # Autoscaling will be enabled for this version
For a more in-depth configuration, take a look at the available trainer.autoscaling
keys in the default Helm chart values. For example, the number of trainer jobs that can run in parallel defaults to 1.
Blocks
To process a submission end to end, several tasks are queued by the Hyperscience Platform. Depending on the type of the task, a dedicated worker picks it up and processes it. However, the type and number of tasks required varies a lot. It depends on the use case - structured, semi-structured, or unstructured; the flow specification; the number of submissions coming into the system; etc. Guessing those numbers in advance and provisioning the necessary infrastructure is a challenge.
There are two approaches to block scaling - manual and auto.
IMPORTANT: Using both manual and auto scaling for the same block is not supported. The manual configuration takes precedence if both are enabled.
Manual scaling
The number of replicas for each block can be independently configured through an environment variable. Below is an example that increases the scale of the FULL_PAGE_OICR
block to 2.
blocks:
env:
BLOCK_FULL_PAGE_OICR_SCALE: "2"
Similarly, each block type can be scaled with a parameter that looks like BLOCK__SCALE
. Here is a list of block names that can be used as replacements:
CUSTOM_CODE
CUSTOM_ENTITY_DETECTION
DB_ACCESS
DISTRIBUTE_TO_STRUCTURED_DOCUMENTS_2
DOCUMENT_REPLACEMENT
EASY_OCR
EXPORT_EMAIL
EXPORT_UIPATH
FIELD_LOCATOR_2
FINETUNE_2
FULL_PAGE_OICR
HTTP_DOWNLOADER
HTTP_EXPORT
IMAGE_CORRECTION
IMAP_TRIGGER
KOFAX_FOLDER_TRIGGER
MQ_LISTENER
MQ_NOTIFIER
NER
NLC_2
NORMALIZER
OCS_DOWNLOADER
OICR
PYTHON_CODE
S3_DOWNLOADER
SALESFORCE_NOTIFIER
SALESFORCE_TRIGGER
SDM_DOWNLOADER
SEGMENTATION
SLACK_NOTIFIER
SOAP_REQ
TABLE_LOCATOR_2
TEXT_CLASSIFICATION
VPC_2
Autoscaling
Similar to how the trainer scales only when there is work to do, block replicas are scaled only when the Hyperscience Platform reports an increase in pending tasks. To enable block autoscaling, set the following fields in your values.yaml
:
blocks:
autoscaling:
enabled: true
For a more in-depth configuration, take a look at the available blocks.autoscaling
keys in the default Helm chart values. For example, the default limit for maximum replicas is 2, while we suggest setting it to the desired number of nodes (assuming they are 8 core, 32GB memory).
Frontend and backend
For now, autoscaling does not apply to frontend and backend deployments. Their replica count can be modified from the values.yaml
file:
app:
replicas:
backend: 1
frontend: 1
hyperflow_engine: 1
idp_sync_manager: 1
As a rough recommendation, each of those should be scaled linearly with the number of block workers in the system. For example, if there are N blocks, the following ratios are suggested:
app:
replicas:
backend: N / 80
frontend: N / 30
hyperflow_engine: N / 20
idp_sync_manager: N / 6
If autoscaling is enabled, then N will be equal to the number of processing blocks used by the flow, multiplied by the value of blocks.autoscaling.max_replicas
. If not, N will be the sum of all _SCALE
variables.
Node distribution and high availability
When the number of worker, frontend, and backend pods starts piling up, it's best if some guidance is given to Kubernetes on how to distribute them across the nodes. TopologySpreadConstraints comes to the rescue. It helps spread the load more evenly and protect from downtime in case of node failure:
topologySpreadConstraints:
node:
enabled: True
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
zone:
enabled: True
maxSkew: 5
whenUnsatisfiable: ScheduleAnyway
Another feature that helps reduce downtime is the Pod Disruption Budget. It is enabled by default with some sensible values, but they can be modified or disabled completely:
app:
# Can be either `frontend`, `backend`, `hyperflow_engine` or `idp_sync_manager`
backend:
podDisruptionBudget:
enabled: true
minAvailable: 1
maxUnavailable: ""
Fine-tuning
Setting all the configuration values correctly in this dynamic system will be an iterative process and highly individualized for every installation. Below is a rough guideline on how many pages per hour can the system process, based on the size of the underlying infrastructure. The table assumes 8 core, 32GB memory machines, a db.m5.xlarge
database (4 vCPU and 16GB memory) and approximately 20 fields per page. For more specific recommendations, please contact your Hyperscience representative.
Document distribution | 2 VMs | 4 VMs | 8 VMs |
---|---|---|---|
100% structured | 1500 | 4000 | 8000 |
50% structured 50% semi-structured | 1000 | 2000 | 4500 |
100% semi-structured | 800 | 1500 | 3000 |