This article explains the v35 upgrade process and lets you know what you can expect after upgrading.
Changes to the upgrade process for v35
In v35, models created in v33 and v34 are compatible with v35. As a result of this change, you do not need to connect a v35 trainer to a v33 or v34 application when upgrading to v35.
While the model upgrade process has been decoupled from the application upgrade process when upgrading from v33 or v34, you still need to train new versions of v33 and v34 models. This compatibility feature makes it possible to complete this training after the application upgrade process is complete, reducing the amount of work and time needed during the application-upgrade process.
More details about models and their impact on the upgrade process can be found in the sections below.
Preparing to upgrade
Review what’s new — Before upgrading to v35, we recommend reading the release notes for v35 and any versions between your current version and v35. These release notes describe the changes and new features we’ve introduced in each version of the application. For links to these release notes, see the Release Notes section of our site.
Review our upgrade documentation — For an overview of the upgrade process, review our upgrade documentation:
Upgrade Considerations and Known Issues, particularly the version-specific considerations
Make sure your infrastructure meets the requirements for v35 — Over the past few versions, we’ve made the following changes to our infrastructure requirements:
Deprecated support for PostgreSQL 9.5 — Beginning with v30, we’re no longer supporting the use of PostgreSQL 9.5 databases.
Deprecated support for Oracle 12.x — Beginning with v34, we’re no longer supporting the use of Oracle 12.x databases.
Databases:
Trainer VM CPU cores: Beginning with v31, we strongly recommend having 16 cores for each CPU in a trainer VM if you are processing Semi-structured documents. If you have only 8 cores for these CPUs, you can expect 60-70% longer training times and an increased risk of crashes during training, particularly on datasets with longer, denser documents.
Trainer RAM: Beginning with v31, we strongly recommend that your trainer have 64GB of RAM, which will maximize the performance of the 16-core CPUs described above.
Beyond these requirements, your Hyperscience representative can recommend specific infrastructure improvements, based on your anticipated use of Hyperscience v35.
Increased upgrade times – In v33, we introduced various performance improvements that required the creation of more database indexes and migrating data stored in the application’s database. These changes also impact versions following v33.
On-premise customers who process more than one million pages per month may see an increase in the time it takes to upgrade their application services to v34 and should plan up to an extra day in their upgrade window. The impact of these changes is minimal for customers processing lower volumes of pages.
Our v35 database migrations are executed after running ./run.sh init for the first time within the new bundle, and migrations must be completed before the Hyperscience application can be used. To reduce the size of your database and the time required to upgrade, you can configure a shorter Submission Record Deletion period in Settings. To learn more about Submission Record Deletion, see PII Data Deletion.
The upgrade process
The steps involved in upgrading to v35 depend on what version of Hyperscience you're currently using.
Upgrading from v28
If you are upgrading from v28, you cannot upgrade directly to v35, as you cannot train a v35 trainer on the v28 application.
You also need to drain your instance of submissions created in v28 or earlier before upgrading to v32 on your way to upgrading to v35. See Draining submissions for more information.
Follow these steps to upgrade to v35:
Train a v31 trainer on your v28 application.
When all the training tasks have finished, upgrade your application to v31 on all machines running the application. You do not need to upgrade to v30 before upgrading to v31.
To learn more about upgrading to v31, see Upgrading from v28 to v31. The key infrastructure and feature considerations explained in the article also apply to v35. Following the outlined steps will establish a foundation for a successful overall upgrade process from v28 to v35.
At this point, you should drain your instance of submissions created in v28 or earlier, if you haven’t already.
Train a v33 trainer on your v31 application.
When all the training tasks have finished, upgrade your application to v32 on at least one machine running the application.
As soon as possible, upgrade your application to v33 on all machines running the application. In order to avoid a possible reduction in model performance, we recommend minimizing the amount of time your instance runs v32.
Upgrade your application to v35 on all machines running the application.
If you plan on creating or training models in v35, install a v35 trainer.
For best results, we recommend upgrading to the latest versions of v30, v31, v32, and v33 before upgrading to v35.
When upgrading to v30, your v30 version needs to be v30.0.13 or later.
When upgrading to v32, your v32 version needs to be v32.0.8 or later.
If you are not using the latest versions of each version in your upgrade path, make sure the versions you upgrade to are compatible with each other:
Compatibility between V30 and V31 versions | Compatibility between V31 and V32 versions |
---|---|
V30.0.14 and earlier ➜ V31.0.0 and later | V31.0.5 and earlier ➜ V32.0.0 and later |
V30.0.15 and later ➜ V31.0.12 and later | V31.0.6 to V31.0.11 ➜ V32.0.1 and later |
| V31.0.12 and later ➜ V32.0.3 and later |
Draining submissions
Before upgrading, all submissions created in three versions before your target version must be completed. The table below outlines the compatibility between submissions and versions of Hyperscience.
Before upgrading to… | …submissions from this version must be completed: |
---|---|
v35 | v32 |
V34 | V31 |
V33 | V30 |
V32 | V28 (there is no V29) |
For example, before upgrading to v35, any submissions created in v32 need to be completed in v32, v33, or v34.
There is no filter in the application to determine which submissions were created in which versions, but you can look for older submissions in one of the following ways:
Use the submission date and the dates in your upgrade history as a guide — For example, submissions created in v28 or earlier have a Submission Date that predates your upgrade to v30 or v31.
Check Administration > Jobs — You can use the Legacy Jobs view of the Jobs table to find submissions created in older versions of Hyperscience. For example, before upgrading to v32, look for jobs with Created dates that predate your upgrade to v30 or v31 and contain submission tasks. These submissions were created in v28 or earlier and need to be completed before upgrading to v32.
Ensure submissions from incompatible versions have a Completed status before upgrading. If they are not completed when you upgrade to your target version, they will likely be halted, and you will need to resubmit or recreate them in order for them to be processed.
Upgrading from v30
When upgrading from v30, you need to upgrade to v31, v32, and v33 before upgrading to v35.
Before upgrading to v35, follow the steps in Draining submissions to make sure that all submissions in your system that were created in v28 or v30 are completed.
Train a v32 trainer on your v30 application.
When all the training tasks have finished, upgrade your application to v31 on at least one machine running the application.
As soon as possible, upgrade your application to v32 on all machines running the application. In order to avoid a possible reduction in model performance, we recommend minimizing the amount of time your instance runs v31.
Train a v33 trainer on your v32 application.
When all the training tasks have finished, upgrade your application to v33 on at least one machine running the application.
Upgrade your application to v35 on all machines running the application.
If you plan on creating or training models in v35, install a v35 trainer.
For best results, we recommend upgrading to the latest versions of v31, v32, and v33 before upgrading to v35.
When upgrading to v32, your v32 version needs to be v32.0.8 or later.
If you are not using the latest versions of each version in your upgrade path, make sure the versions you upgrade to are compatible with each other:
Compatibility between V30 and V31 versions | Compatibility between V31 and V32 versions |
---|---|
V30.0.14 and earlier ➜ V31.0.0 and later | V31.0.5 and earlier ➜ V32.0.0 and later |
V30.0.15 and later ➜ V31.0.12 and later | V31.0.6 to V31.0.11 ➜ V32.0.1 and later |
| V31.0.12 and later ➜ V32.0.3 and later |
Upgrading from v31
When upgrading from v31, you need to upgrade to v32 and v33 before upgrading to v35.
Before upgrading to v35, follow the steps in Draining submissions to make sure that all submissions in your system that were created in v28 and v30 are completed before upgrading to v33. If you have submissions that were created in v31, make sure those submissions are completed in either v31 or v33 before upgrading to v35.
Follow these steps to upgrade from v31 to v35:
Train a v33 trainer on your v31 application.
When all the training tasks have finished, upgrade your application to v32 on at least one machine running the application.
As soon as possible, upgrade your application to v33 on all machines running the application. In order to avoid a possible reduction in model performance, we recommend minimizing the amount of time your instance runs v32.
Upgrade your application to v35 on all machines running the application.
If you plan on creating or training models in v35, install a v35 trainer.
For best results, we recommend upgrading to the latest version of v32 and v33 before upgrading to v35. If you are not using the latest versions of each version in your upgrade path, make sure the versions you upgrade to are compatible with each other:
Compatibility between V31 and V32 versions |
---|
V31.0.5 and earlier ➜ V32.0.0 and later |
V31.0.6 to V31.0.11 ➜ V32.0.1 and later |
V31.0.12 and later ➜ V32.0.3 and later |
Upgrading from v32
Before upgrading to v35, follow the steps in Draining submissions to make sure that all submissions in your system that were created in v30, v31, or v32 are completed.
Follow these steps to upgrade from v32 to v35:
Train a v33 trainer on your v32 application.
When all the training tasks have finished, upgrade your application to v33 on at least one machine running the application.
Upgrade your application to v35 on all machines running the application.
If you plan on creating or training models in v35, install a v35 trainer.
For best results, we recommend upgrading to the latest version of v33 during the upgrade process.
Upgrading from v33
Before upgrading to v35, follow the steps in Draining submissions to make sure that all submissions in your system that were created in v32 are completed. This step is only required if you are still using v32 flows.
Upgrade directly to v35 by deploying v35 on all machines running the application. You must skip upgrading to v34 in the upgrade process, as this version will invalidate your Field Locator and Table Locator models. To learn more about why you must upgrade directly to v35, see the “Upgrading from v33 to v35” section of the Upgrade Considerations and Known Issues article.
If you plan on creating or training models in v35, you also need to install a v35 trainer after upgrading the application.
Upgrading from v34
Before upgrading to v35, follow the steps in Draining submissions to make sure that all submissions in your system that were created in v32 are completed.
To upgrade to v35, deploy v35 on all machines running the application. If you plan on creating or training models in v35, you also need to install a v35 trainer after upgrading the application.
What to expect after upgrading
After upgrading to v35, you may notice some changes in your Flow Library and slowness in processing Structured documents. You may also need to complete some additional steps to restore full functionality to your instance, as described in the sections below.
Machine Classification
After upgrading, your first submissions with Structured documents might take up to a few hours to complete. The Machine Classification Block uses precomputed data to classify Structured documents. After upgrading, this precomputed data is invalidated. The system regenerates this data the first time a submission goes through Machine Classification after upgrading.
If processing submissions with Structured documents takes longer than expected, you should check the logs from the Activity Log section of the Submission Output page. Verify that the Machine Classification task is the one that takes more time than expected.
Flows
When upgrading from v30, v31, v32, v33, or v34 the system will automatically move your flows to v35 and add new ones for v35. No changes will be made to your pre-v35 flows as part of this process. Submissions will continue to be processed through your default flow (likely "Document Processing," "Document Processing (V31)," “Document Processing (V32),” “Document Processing (V33),” or “Document Processing (V34)” by default). This flow will be active and deployed in v35.
Any flows and Custom Code Blocks created in v33 or v34 will continue to work in v35. However, we cannot guarantee that flows created in v30, v31, or v32, including the “Document Processing” and “Document Processing Notifications” flows, will work in v35, and we do not officially support the use of these flows.
When migrating your processing to “Document Processing V35,” we recommend testing the migration in a lower environment (e.g., development, UTA) first. After this testing, you can replicate the migration in your production environment.
To migrate your processing to “Document Processing V35”:
Go to your lower environment.
Duplicate the newly-created “Document Processing” flow by following the steps in Managing Flows.
Configure the duplicated flow, using the configuration settings and notifications of the “Document Processing” flow you are using to process submissions in your production environment.
Re-train any Identification, Classification, and Transcription models associated with the previous version's flows by clicking the Run training button on the model details page for each model. Doing so ensures that those models are trained on the latest version of the application and are compatible with the new version's flows.
Assign a release to the duplicated “Document Processing” flow by following the steps in Assigning a Release to a Flow.
Deploy the duplicated “Document Processing” flow by following the steps in Managing Flows.
To test the duplicated “Document Processing” flow, manually upload and process a few submissions through this flow.
If you use an integration to upload submissions, change your integration’s target flow UUID to the UUID of the duplicated “Document Processing” flow. You can copy the UUID of your duplicated “Document Processing” flow by clicking the Copy link at the bottom of the Flow Settings sidebar on the left-hand side of the Flow Studio.
Disable the old “Document Processing” flow by following the steps in Managing Flows.
Repeat steps 2-7 in your production environment, or export the newly created flows and models from your lower environment and import them to your higher environment.
If you are using the “Document Processing Notifications” flow, set up “Submission State Notifications V35” to meet your needs.
If you are actively using custom flows created in v30, v31, or v32, work with your Hyperscience representative to recreate these flows in v35.
System-generated flows
When you upgrade to v35, new "Document Processing" and "Submission State Notification" flows appear alongside the flows that were in your previous version of Hyperscience.
Document Processing V35
A new, uncustomized "Document Processing V35" flow will be added and will be disabled. This flow is a modified version of the default "Document Processing" and “Document Processing (V3x)” flows and contains features introduced in v35. You can work with your Hyperscience representative to determine if this updated flow would be beneficial for your business.
If you decide to use the new flow, you will need to add Input Blocks and Output Blocks to it, as they are not included by default.
Document Processing Notifications
The system will also move your "Document Processing Notifications" flows to v35, which contain Notification Blocks that execute mid-flow. Each of these flows remain unchanged during the upgrade. In addition, a new “Submission State Notifications for Document Processing V35” flow is created during the upgrade process. Contact your Hyperscience representative if you need to enable or disable connections in these flows after upgrading to v35.
Importing flows
Starting in v33, flows are exported as ZIP files. These ZIP files contain the flow’s JSON file and the Python files for any Custom Code Blocks in the flow.
In the unlikely event that you need to import a v33, v34, or v35 flow to an instance using a previous version of Hyperscience, you need to manually upload your flow’s Python files after importing your flow.
More information can be found in Managing Flows.
Submission Initialization Block
In v32 and v33, we added settings that allow you to configure your submission retrieval store in the Flow Studio.
S3 Submission Retrieval Store
If:
you are using an S3 bucket as your submission retrieval store,
you are not authenticating through IAM roles, and
you are planning to use the “Document Processing (V33),” “Document Processing (V34),” or “Document Processing V35” flow after upgrading,
you can use the S3 Submission Retrieval Store field of the Submission Initialization Block to enter the values for AWS access key ID and secret access key. You do not need to edit your “.env” file.
If you are not using an S3 bucket for your submission retrieval store, or if you are authenticating to S3 through IAM roles, this field should contain an empty object (i.e., {}).
To learn more about entering your credentials in the S3 Submission Retrieval Store field, see the Submission Initialization section of Flow Blocks.
S3 Submission Retrieval Endpoint URL
If:
you use a submission retrieval store that is not in the public cloud (i.e., its URL does not point to s3.amazonaws.com — for example, a government cloud or an S3-compatible internal setup), and
you are planning to use the “Document Processing (V33)” or “Document Processing (V34),” or “Document Processing V35” flow,
make sure you can see your S3 submission retrieval endpoint URL in the Submission Initialization Block. It should appear in the S3 Submission Retrieval Endpoint URL field.
If this value is missing or incorrect, you can edit it directly in the Submission Initialization Block. You do not need to edit your “.env” file.
If the bucket you’re using as your submission retrieval store is in a public cloud (as opposed to a government cloud or an S3-compatible internal setup), this field should be blank.
OCS Configuration
If:
you use an OCS instance as your submission retrieval store, and
you are planning to use the “Document Processing (V33),” “Document Processing (V34),” or “Document Processing V35” flow,
you can use the OCS Configuration field of the Submission Initialization Block to enter the host URL, username, password, and SSL certification information for your OCS instance. You do not need to edit your “.env” file.
To learn more about entering your credentials in the OCS Configuration field, see the Submission Initialization section of Flow Blocks.
Generic Web Storage (HTTP/HTTPS) Configuration
If:
you use a generic web storage solution as your submission retrieval store, and
you are planning to use the “Document Processing (V33),” “Document Processing (V34),” or “Document Processing V35” flow,
you can use the Generic Web Storage Configuration field of the Submission Initialization Block to enter the username, password, and SSL certification information for your storage service. You do not need to edit your “.env” file.
To learn more about entering your credentials in the Generic Web Storage Configuration field, see the Submission Initialization section of Flow Blocks.
Additional considerations when upgrading from v28, v30, or v31
If you’re not currently using v32, you should be aware of the following changes that we made in v32, as they can affect your use of v34.
Flow-specific transcription models
If you are processing Structured documents, you have the option of training flow-specific transcription, or finetuning, models in v32. By default, flows created in previous versions use the system-level transcription model. To learn more, see Managing Transcription Models Across Flows.
If you do not run recalibration on a v32 trainer on a v31 application during the upgrade process, you may experience delays in processing submissions. Recalibration runs on any flows that are live when the upgrade occurs. We recommend completing this recalibration as a best practice for upgrades, even if you are not planning to use flow-specific transcription models.
Message Queue Listener connections
Starting in v32, the Hyperscience application includes an updated version of the Message Queue Listener connector. This version fixes several performance and reliability issues found in previous versions. It also changes the behavior of the connector when processing malformed or erroneous input, delegating the job of surfacing errors to the message broker. To ensure the message broker is ready to handle these errors, a Dead Letter Queue (DLQ) should be set and configured so messages that can’t be processed by the input connector are removed from the queue and moved to a different queue for automatic or manual supervision.
Details on how to set up a DLQ and create a policy for it can be found in these articles:
SQS — Amazon's Amazon SQS dead-letter queues
IBM MQ — IBM's Dead-letter queues
ActiveMQ — ActiveMQ's Message Redelivery and DLQ Handling
RabbitMQ — RabbitMQ's Dead Letter Exchanges