System Attributes Questionnaire and Recommendations

To help us support you as efficiently as possible, we would like you to provide the following information about your system. This article also gives recommendations and explains our reasons for gathering this data.

Application servers / virtual machines (VMs)

The specifications of your application machines—and the number of machines in your instance—determine your system's overall throughput. Our recommendations for these specifications depend on your anticipated use of Hyperscience.

The application machines also serve the Hyperscience application to users at your organization.

Number of application machines
Application machine CPUs
Application machine RAM
Application machine OS
Is SSL enabled on the load balancer?

Sharing these specifications allows us to estimate your system's potential throughput and troubleshoot reduced performance. It also helps us determine reasons for changes in the application's responsiveness.

Trainer servers / VMs

The Hyperscience solution relies on the training of models that identify and transcribe data, as well as classify the documents that you ingest into the application. This training occurs in your trainer machines.

The specifications of these machines determine how quickly your models can be trained, particularly if you are processing Semi-structured documents. Our sizing recommendations depend on the type and volume of documents you plan on processing through Hyperscience.

Number of trainer machines
Trainer machine CPUs
Trainer server RAM
Trainer server OS

When you share these specifications with Hyperscience, we can compare your trainer's potential performance with the actual speed of your training tasks.

Database

The application database stores all extracted submission data, which the models use to increase the automation of submission processing.

Specifications

Like the specifications for application and trainer machines, our recommendations for your database specification are based on your required system throughput.

Number of database machines
Database machine CPUs
Database instance RAM
Database machine OS
DBMS Edition (e.g., MS Standard, MS Enterprise)
Database server patch version
Do you have a lower/non-production environment? If so, what are its specifications (e.g., number and type of servers / VMs (physical servers or private cloud), clustering, networking details, etc.)?

Sharing the specifications of your database machines helps us estimate your system's potential throughput. It also helps us troubleshoot issues where the system isn't making full use of the machines' available space.

Database maintenance

Your answers to these questions let us know what troubleshooting tools are available to us. They also help us anticipate the overall health of your database.

Is there a database-monitoring tool in use, either internal or external (e.g., SolarWinds Database Performance Analyzer, Datadog Database Monitoring, Percona, etc.)?
Is database maintenance completed regularly (e.g., index rebuilds, DBCCs)? If so, how frequently?
Database maintenance is used to keep the database clean and well organized, which enhances performance and prevents losses in functionality. We recommend scheduling and running database maintenance on regular intervals. For example, index rebuilds, database consistency checks, and updates should be configured and scheduled.

Backups and security

Your responses to these questions allow us to determine the data-recovery options available in the event of a disaster or security threat.

Are the database backups configured and running?
We recommend running regular database backups. During any kind of disaster, a recovery process and backups help to restore computing devices and data after files have been destroyed or removed. Database backups are crucial to data-loss prevention, which can fully interrupt business operations.
Your backup strategy should be tailored to the needs of your business. For example, if it is acceptable to lose data in the event of a disk failure, you may not need to perform frequent backups. If your database must be available twenty-four hours a day, seven days a week, the database needs to be backed up frequently. The frequency of your backups and types of backups you perform is determined in large part by the needs of your business and your defined Recovery Point Objective (RPO).
Are database backups stored offline (e.g., in AWS S3 buckets or other cloud storage, onsite backup servers, digital optical drives, conventional RAID arrays)?
Sending backups offsite ensures systems and servers can be reloaded with the latest data in the event of a disaster, accidental error, or system crash. It also ensures that there is a copy of pertinent data that is not stored onsite.
Is your database encrypted? If so, please elaborate.
Let us know the type of encryption you are using:
- Encrypting your data at rest, which means encrypting it while it is stored on whatever file storage is used.
- Encrypting data in transit, which means encrypting data while it travels through private or public network communication channels.
- Transparent Data Encryption (TDE) is another method employed by both Microsoft and Oracle to encrypt database files. TDE offers encryption at the database's file level. PostgreSQL does not provide native TDE, though third-party tools are available in the market.
Because you own the database, you are responsible for storage-level encryptions.
While it’s best practice to have database-level encryption for compatibility considerations, Hyperscience products do not expect any encryption on levels above physical storage, such as the database level. Hyperscience products do not support column-level encryption at this time.
The Hyperscience product widely supports encryption on data in transit, such as SSL and TLS.
Is High Availability configured?
High Availability (HA) is a characteristic of a system that aims to ensure an agreed-upon level of operational performance (usually uptime) for a higher than normal period.
The following are the key principles of High Availability:
- Eliminate any single point of failure: Adding redundancy so that the failure of any one part of the system does not lead to the collapse of the entire system.
- Reliable crossover: In a redundant system, the crossover point itself becomes a single point of failure. Fault-tolerant systems must provide a reliable crossover or automatic switchover mechanism to avoid failure.
- Detection of failures: If the above two principles are proactively monitored, then a user may never see a system failure.
Microsoft SQL Server, Oracle, and PostgreSQL provide various options to configure a High Availability solution. Your strategy will be determined in large part by the needs of your business and defined RPO.
Is there a disaster recovery site in place?
A disaster recovery site is a place that a company can temporarily relocate to following a security breach or natural disaster. A disaster recovery site ensures that a company can continue operations until it becomes safe to resume work at its usual location or a new permanent location.
We recommend designing a disaster recovery plan with a defined IPO and Recovery Time Objective (RTO).
Is there an RPO (Recovery Point Objective) defined?
RPO refers to the maximum amount of data you can afford to lose. It is aligned with your backup and recovery strategies. For example, if you backup the database every hour, you can lose a maximum of one hour of data. In this case, an RPO of 10 minutes will not meet your needs.
Is there an RTO (Recovery Time Objective) defined?
RTO is a measure of how long your organization can afford to have its databases inaccessible before normal conditions are restored. We recommend defining your system and disaster-recovery process in such a way to recover in less time than the RTO.