V30 Release Notes

30.0.16 (25 Feb 2022)

Submission Processing

Fixed

Supervision task duration and halted submissions – We’ve fixed an issue that caused submissions to halt when Supervision tasks had negative durations.

Infrastructure

Updated

Killing database sessions using local time – Previously, using local time in an Azure SQL Managed instance caused the system to measure blocker and sleeping database sessions as having lasted more time than they actually lasted. As a result, the system killed such database sessions as soon as they moved to a blocker or sleeping state. A fix for this issue is included in v30.0.16.

Security

Updated

Upgrading Python dependencies – To fix some security issues related to arbitrary code execution, we have upgraded the following Python dependencies:

  • Django: 2.2.26 ➔ 2.2.27

  • ipython: 6.4.0 ➔ 7.31.1

  • jupyter-console: 5.2.0 ➔ 6.4.0

30.0.15 (27 Jan 2022)

Flows

Updated

Adding endTime to flows exported from the Jobs page – We've added the endTime field to flow JSON files exported from the Jobs page of the application.

Storing flow-related JSONs in the database – We now store flow-related JSON files as compressed binary large objects (BLOBs) in the database.

Output Connections

Updated

Improved error handling for OAuth2 URLs – When an invalid OAuth2 Authorization URL is entered in the settings for a HTTP Rest Block or HTTP Notifier Output Block, the system shows an HTTP 404 error message. Previously, the error message did not clearly state that the URL was invalid. 

Databases

Updated

Domain accounts for SQL Server connections – You can now connect Hyperscience to SQL Server using domain account login credentials. 

Showing error messages only once for certain users – Users who do not have the VIEW SERVER STATE or ALTER ANY CONNECTION permissions now see each database error message only once.

Fixed

Type-conversion errors in Oracle databases – We've fixed a type-conversion issue in Oracle databases that caused a large number of trace files to be created. These files quickly filled system databases in some instances.

Upgrades 

Fixed

Preventing submissions from halting after upgrading – We've fixed an issue that caused in-progress submissions to halt while in Machine Transcription after upgrading.

Note that the changes included in v30.0.15+ and v31.0.12+ make the following upgrade path necessary:

  • v30.0.15+ should only be upgraded to v31.0.12+.

  • v31.0.12+ should only be upgraded to v32.0.3+.

30.0.14 (3 Dec 2021)

Data Types

Fixed

Creating custom field data types (CFDTs) from patterns - We’ve fixed an issue that prevented users from creating a pattern CFDT. 

Submission Processing

Updated

Task polling for blocks – To reduce database queries, we've implemented a task-polling mechanism for blocks, which lets the system know the resources each block has available to complete tasks.

Input Connections 

Fixed

ERROR logs for Message Queue (MQ) Listener connections – We’ve resolved an issue that caused ERROR logs to be created in some flows that were working properly. The system now creates INFO logs in these situations.

Output Connections

Updated

Logging details in UiPath error responses – We now log error messages from UiPath error responses, which contain details that can be helpful in debugging. Previously, we logged only the status codes in these messages.

API payload for UiPath Notifier Output connections – We've updated the API payload for UiPath Notifier Output connections to be consistent with the connections’ v28 payloads and the payloads of other output notifiers.

Installations

Updated

PostgreSQL 12.9-alpine Docker image – Our installations now include PostgreSQL 12.9-alpine Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

Database 

Updated

Moving data from Structured classification to the file store – To increase database efficiency, we now store data generated during the automatic classification of Structured documents in the file store rather than the database. 

Deleting records of completed tasks at designated points – To reduce the amount of data stored in the database, we now delete records of completed flow tasks after their results are saved.

File Storage

Updated

Support for AWS Signature Version 2 in S3 file stores – We've added support for AWS Signature Version 2 in requests sent to S3 file stores.

API 

Updated

Support for Base64-encoded JSON data in submission creation – You can now send submission data in JSON format when creating submissions via the Submission Creation endpoint. To do so, include the Content-Type: application/json header in your request. When sending requests with this header, note that the request body has a different format than requests sent as multipart/form-data or application/x-www-form-urlencoded.

30.0.13 (8 Nov 2021)

NOTE: Before upgrading to v30.0.13, check your ".env" file for the following variables:

BLOCK_SCALE_VPC
BLOCK_THREADS_VPC

If these variables are in your ".env" file, add the following variables:

BLOCK_SCALE_VPC_2
BLOCK_THREADS_VPC_2

Set the values of these variables to match the values for BLOCK_SCALE_VPC and BLOCK_THREADS_VPC, respectively.

Data Types

Fixed

Existing ML Configuration drop-down list in IE 11 – We've resolved an issue that prevented the options in the Existing ML Configuration drop-down list from appearing when creating a data type in IE 11.

Creating a data type from a list of duplicate values – We've fixed an issue where users were allowed to create data types from a list of values containing duplicate values.

Training

Updated

Use of segments found during submission processing – The trainer now uses text segments detected during submission processing when training field locator models.

Submission Processing

Fixed

Task synchronization across flow blocks – We've fixed a task-synchronization issue across flow blocks that caused submissions to fail, particularly in high-volume instances.

Message Queue Connections

Fixed

SSL CipherSuites and IBM Message Queue (MQ) connections – We've fixed an issue that caused input and output connections to IBM MQ servers to fail when SSL CipherSuites were enabled.

Installations

Updated

PostgreSQL 12.8-alpine Docker image – Our installations now include PostgreSQL 12.8-alpine Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

TVE Instances

Updated

Memory usage – We've introduced memory-management changes in TVE instances to reduce total memory usage.

Databases

Updated

Changes to LISTEN/NOTIFY and polling – To improve scalability, we’ve removed the LISTEN/NOTIFY health check for the database-provided notification mechanism in production instances, and we're enabling polling for job queues and block-process manager channels. The database-notification health checks remain enabled in TVE instances.

Storage of text-segment data – We now group text-segment data by page, improving scalability and system performance. 

Moving large JSON files and flow backups to the file store – To increase database efficiency, we now store the following data in the file store rather than the database:

  • large JSON files from completed tasks

  • large JSON files from short-lived tasks

  • backups of completed flows

Security

Fixed

Next parameter in URLs after successful logins – We've fixed an issue in our built-in login page that allowed attackers to execute code or redirect users through the next URL parameter after successful logins.

OPTIONS methods in API requests We've disabled the use of OPTIONS methods in requests to our API endpoints, preventing unauthorized users from retrieving information about the application.

30.0.12 (29 Oct 2021)

Layouts

Updated

Removal of Routing option for fields – Because we deprecated the Routing feature in a previous version, we've removed the Routing option for fields in the Layout Editor and in the Fields and Customizations tab of the Layout Details page.

Submissions Table

Fixed

Unexpected error when filtering by halted submissions – We’ve fixed an issue that resulted in an unexpected error when filtering the Submissions table by halted submissions.

Submission Processing

Fixed

S3 File Storage credentials and submission processing – Previously, if an instance had an S3 file store configured with missing or incorrect credentials, submissions containing files in the file store were still processed to completion. With the fix included in v30.0.12+, the system halts these submissions, as it cannot retrieve the files from the file store.

Compatibility for tasks without workflow_info parameter – In a previous version, to make task polling more efficient, we added a workflow_info parameter to task outputs. We’ve fixed an issue that caused old tasks in the database without this parameter to result in an error.

Flows

Updated

Performance optimizations for flows – We've made the following improvements to the processing of flows:

  • Created customized database indexes and queries

  • Capped the number of tasks in that can be processed in a single query

  • Ensured that block-process managers do not quit when their connections to the database are lost

  • Made changes to custom flows to reduce the risk of partial processing

  • Modified sharding functions so they are informed by the health records of each individual machine

  • Added checks for sub-processes 

Fixed

Block-process manager performance with a large number of resources – We've fixed a resource-detection issue in block-process managers that caused delays when a large number of input resources were present (e.g., images in a Folder Listener's folder).

Classification

Updated

Improved release-loading times for Structured document classification – We've reduced the amount of time needed for release information to be sent to Structured document classifiers in Machine Classification Blocks.

Clean-up script for Structured document classification – We've added a script to clean up caches of data used in the machine classification of Structured documents.

Custom Code Blocks

Fixed

In-memory import of Python scripts for Custom Code Block tasks – Previously, when importing Python scripts, the scripts were saved locally on file. We’ve fixed an issue that sometimes caused the file to be empty. With the fix included in 30.0.12+, we use an in-memory import of Python scripts instead of saving the scripts locally on file.

PII Data Deletion

Fixed

PII data deletion in transformed outputs – We've resolved an issue that prevented PII data from being deleted from outputs transformed by post-processing Custom Code Blocks.

Infrastructure

Updated

Trainer machines and sharding – We've updated the system's sharding functions to exclude trainer machines. This change prevents flow tasks from being sent to trainer machines, improving efficiency in both the trainer and the application.

Health check for LISTEN/NOTIFY – We've added a check for LISTEN/NOTIFY to our system health check. This check also sends an alert when Service Broker is disabled in instances with MSSQL databases.

Passing scaling variables to all blocks – To override default scaling, the system now passes all scaling variables to all blocks. 

Improved management of database sessions – We’ve improved database session management in instances with MSSQL databases. Specifically, the system will kill sessions that:

  • are blocking other database sessions, 

  • are unusable, or

  • have been idle for a long time.

Cron jobs polling – To increase performance and reduce dependency on Service Broker, we’ve updated the cron jobs channel to use polling instead of LISTEN/NOTIFY. 

Trainer

Updated

Ability to add Field ID and Table ID training tasks with no trainer attached – Users can now create Field ID and Table ID training tasks for Semi-structured layouts even if there is no trainer attached. This functionality gives users the ability to shut down the trainer machine when the training tasks queue is empty.

Databases

Updated

Maximum age of database connections – We've increased the default maximum age for database connections, reducing both the need to create new connections and the overall load on the database.

Fixed

Saving layout changes on MSSQL databases We’ve fixed an issue for MSSQL databases that caused saving changes to existing layouts to result in system slowness.

Submission Retrieval Stores

Fixed

Handling failed downloads from AWS S3 submission retrieval stores – We've fixed an issue that prevented flows from failing when it could not download files from S3 submission retrieval stores.

30.0.11 (11 Oct 2021)

Installations 

Updated

PostgreSQL 12.7 Docker image – We’ve reverted the Docker image included in our installations from PostgreSQL 12.8 (updated in v30.0.10) to 12.7.

Infrastructure

Updated

Database polling with a single thread – We've updated our database-polling function to use a single thread, which reduces unnecessary operations and improves responsiveness in high-volume instances.

Databases 

Fixed

Recovering from failed connections – We've fixed an issue that prevented the system from reconnecting to the database after connectivity errors or idle periods. This issue resulted in processing failures after lost connections.  

MSSQL queries and system performance – In previous v30 versions, execution plans in MSSQL databases generated inefficient queries, causing spikes in CPU usage. These spikes resulted in halted submissions, processing delays, and reduced application responsiveness. A fix for this issue is included in v30.0.11.

Submission Retrieval Stores

Updated

Support for AWS S3 ".env" file variable – We've restored support for the SUBMISSION_RETRIEVAL_STORE_S3_ENDPOINT_URL variable in ".env" files.

30.0.10 (1 Oct 2021)

Submission Processing

Fixed

Supervision task duration and halted submissions – We’ve fixed an issue that caused submissions to halt when Supervision tasks had negative durations. 

Installations

Updated

PostgreSQL 12.8 Docker image – Our installations now include PostgreSQL 12.8 Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

Databases

Fixed

MSSQL and system responsiveness – We've fixed an issue that caused some database queries to perform an indexed scan when only an indexed seek was needed. The issue resulted in system slowness in instances with MSSQL databases. 

Message Queue Listener

Fixed

Concurrency of Message Queue (MQ) Listener tasks – We've made changes to how the system completes MQ Listener tasks, including the elimination of intermediate data structures and delays in message processing. These improvements also prevent issues related to updating and selecting entries in the databases.

30.0.9 (17 Sep 2021)

Field ID Supervision

Fixed

Retrieval of fields to be identified – We've refactored how the system retrieves the list of fields to be identified in a document, improving efficiency by reducing the amount of irrelevant data retrieved.

Table Identification Models

Fixed

Importing a table model after updating a Semi-structured layout – We've resolved an issue that caused the table model-import process to fail after updating a Semi-structured layout that contained the model's table. Specifically, an HTTP 500 response would be returned after adding a new column to the table, locking the new version, and making it live. In v30.0.9+, the import process completes successfully, and the system opens the model's page in the Model Library to show any incompatible columns.

Viewing models and Semi-structured layouts with tables – We've fixed an issue that caused "Unexpected errors" to occur when viewing the table models and Semi-structured layouts affected by the issue described above.

Layout Tags

Fixed

Source of routing tags – We've fixed an issue that caused the system to read layout tags from individual layout variations instead of layouts (i.e., groups of layout variations).

Infrastructure

Fixed

System task scheduling – We've resolved a task-scheduling issue that prevented routine system tasks, such as reporting-data aggregation and model training, from running when expected. 

Please note that the system will not immediately repopulate reporting data after upgrading to v30.0.9.  

Trainer 

Fixed

Connection to application – We've made improvements to the trainer-application connection to prevent the trainer from losing contact with the application indefinitely. If the connection is lost, the trainer will timeout the connection and automatically attempt to reconnect.

30.0.8 (1 Sep 2021)

Submission Processing

Updated

Rotation detection on gray backgrounds – We’ve added a step to submission pre-processing that improves rotation detection on gray backgrounds.

30.0.7 (18 Aug 2021)

Permissions

New

Trainer API User permission group – Because we have API endpoints that are meant to be accessed only by the trainer, we've added a Trainer API User permission group that restricts access to these endpoints. This group contains only the Trainer API Access permission. 

Upon upgrading to v30.0.7, all users with the API Access permission will be added to the Trainer API User permission group to avoid potential disruptions in access. To maximize the security benefits of the new group, we recommend removing all users from the Trainer API User group other than the trainer's user. 

Security

New

Excluding X-Extract-Backend-Server HTTP headers from application responses – You can now exclude the X-Extract-Backend-Server HTTP header from URLs returned by the application. To do so, add the NGINX_DISABLE_BACKEND_SERVER_HEADER variable to your ".env" file and set its value to yes.

For more information, see Security.

Fixed

Invalidation of user sessions after logout – We’ve resolved an issue that sometimes prevented the invalidation of sessions after users logged out of the application. 

Installations

Updated

PostgreSQL 12.7 Docker image – Our installations now include PostgreSQL 12.7 Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

HTTP Rest Blocks

Fixed

Successful processing with 4xx and 5xx errors – We've resolved an issue that prevented block failure when a block's HTTP request resulted in a 4xx or 5xx error.

30.0.6 (4 Aug 2021)

Databases

Updated

Support for Microsoft SQL Server (MSSQL) 2019 – We've added support for MSSQL 2019. Service Broker must be enabled when connecting a MSSQL database to the Hyperscience application.

Quality Assurance

Fixed

Updating QA sample rates – We've resolved an issue that prevented changes to the Field Identification Sample Rate and Transcription Sample Rate Flow settings from updating the system's sample rates.

Submission Processing

Fixed

State cleanups and processing times – We've fixed a state-cleanup issue that caused layouts to be reloaded frequently, resulting in processing delays.

Layout Editor

Fixed

Enabling Auto-Clone – We've fixed an image-resolution issue that sometimes caused the Layout Editor to crash when the Auto-Clone feature was enabled.

30.0.5 (14 Jul 2021)

Submission Processing

Fixed

Halted jobs in v30.0.2 – We've resolved a database-connectivity issue that caused a large number jobs to be halted in v30.0.2.

In order to implement this fix, you will need to add the FORMS_DB_CONN_MAX_AGE variable to your ".env" file after upgrading to v30.0.4 or later. This variable sets the maximum length of time that database connections can be kept and reused. For more information, see Database Overview.

Submissions Table

Fixed

Changing the number of submissions per table page – We've fixed a database-collation issue that caused errors when users changed the number of submissions shown per Submissions table page.

30.0.4 (12 Jul 2021)

Telemetry

Updated

Audit mode enabled by default – We've updated Telemetry to be enabled in audit mode by default, allowing you to preview telemetry data at any time before giving Hyperscience permission to collect it. Prior to this version, audit mode required the addition of a variable to the ".env" file. 

Initial data transfer to Hyperscience – Previously, when Telemetry was first enabled, the system would send all available telemetry data to Hyperscience. Beginning in v30.0.4, the initial transfer contains data from the past 24 hours only.

Exclusion of user names – The system now filters the user names of logged-in users from telemetry data before the data is stored or shared.

30.0.3 (22 Jun 2021)

Supervision and Quality Assurance

Updated

Pressing shortcut keys and timing of results – We've changed how the system responds to the pressing of shortcut keys so that results occur when the keys are released, not when they're pressed. In particular, this change prevents task input from being submitted before the keyer has finished the task.

Quality Assurance

Updated

Ordering of QA tasks – We've changed the ordering of QA tasks in the QA card on the Task Overview page so that tasks for the newest submissions will be at the top of the queue. This change ensures that tasks for the most current submissions will be completed first, increasing the relevancy of obtained QA data.

Fixed

Consensus for currency fields – We've resolved an issue that prevented formatting from being applied to currency fields during consensus tasks, causing the system to reject the input and force the user to retranscribe the value indefinitely.

Table Identification

Fixed

Template tool and empty pages – We've fixed an issue that caused the application to become unresponsive when the user switched to the template tool while viewing an empty submission page.

Submissions and Documents

Fixed

Clearing the date range on Submissions and Documents pages – We've fixed an issue that caused three requests to be sent to the system upon clearing the date range filter on the Submissions and Documents pages.

Security

Fixed

Access to client secrets for Input Blocks and Output Blocks – We've fixed an issue that allowed any authenticated user to access the client secrets for their account via the API. Now, only system admins and the service accounts for the blocks and have access to this information.

SAML

Fixed

Synchronization of SAML groups – We've resolved an issue that prevented SAML groups from being synchronized with Hyperscience permission groups in certain instances, causing users to have permissions from SAML groups they no longer belonged to.

30.0.2 (7 Jun 2021)

Input Connections

Fixed

Resubmitting submissions via Amazon SQS input connections – Previously, resubmitting halted SQS-ingested submissions with external IDs created duplicate documents in the original submissions. We've included a fix for this issue in v30.0.2+, and reingesting these submissions will create separate submissions with the same numbers of documents. 

Ingesting malformed JSON – We've resolved an issue that caused the system to go into an infinite loop when malformed JSON was ingested via a Message Queue Listener input connection. 

Output Connections

Fixed

Delivery of notifications to RabbitMQ queues – We've fixed an issue in the binding of RabbitMQ exchanges to message queues, which prevented notifications from being delivered. 

System & Health

Updated

Submissions, Flows, Jobs, and Trainer Tasks metrics – We've updated the metrics and descriptions in the Submissions, Flows, Jobs, and Trainer Tasks cards to clarify the data these cards provide. In particular, we now count the number of new submissions created during the covered time period, along with the number of submissions that were created or halted during that period, regardless of when they were created.

Fixed

Health statistics and system performance – We've fixed an issue in our gathering of health statistics that slowed down system performance, particularly in high-volume instances of Hyperscience.

Custom Data Types

Fixed

Normalization of field values in Structured layouts – We've fixed an issue that caused normalization errors to occur when Structured layouts contained fields with pattern-based custom data types.

Submission Processing 

Fixed

Document pre-processing tasks for unmatched Structured pages – We've fixed an issue that caused the system to complete document pre-processing tasks for unmatched Structured pages, slowing system performance.

Submissions Table

Fixed

Filtering by layout tags – We've fixed an issue that caused the Submissions table to be empty after filtering submissions with the Tags Document filter.

Clearing the date range – We've resolved an issue that caused requests to time out when the data range filter above the Submissions table was cleared.

Dashboard

Fixed

Disabled "Perform Tasks" links – We've fixed an issue in the dashboard (Tasks > Overview) that caused the appearance of disabled "Perform Tasks" links to be inconsistent between the Supervision Tasks and Quality Assurance Tasks cards.

30.0.1 (21 May 2021)

Upgrades 

Fixed

TLS verification and upgrades – We've resolved an issue that caused customers with HL_TLS_VERIFY_ENABLED set to false to have unstable systems after upgrading to v30.

PostgreSQL versions and upgrading – Previously, when upgrading from v28 to v30, running v28 and v30 trainers side by side failed if PostgreSQL 9.5 was used In v28 and PostgreSQL 12 was used in v30. A fix for this issue is included in v30.0.1+.

Block Settings

Fixed

Saving changes to integer values – We've fixed an issue that prevented the system from saving changes to settings with integer values (e.g., Port number). 

Input Connections

Fixed

Message Queue (MQ) connections and Submission Processing Deadlines – We've fixed an issue that caused Submissions created through an MQ connection to have the system-default Submission Processing Deadline rather than the one set for the connection.  

Testing AWS SQS connections – We've fixed an issue that caused AWS SQS connection tests to be successful when invalid Username/Access key ID values were entered. 

Output Blocks

Fixed

Initialization of IBM MQ notifications – We've resolved an issue that caused IBM MQ notifications to fail after their flows were redeployed.  

MQ notification logs read as error messages – We've fixed an issue that caused the information logs generated by MQ Output Blocks to be read as error messages by some downstream systems.

Output consistency with legacy connectors – We've changed the output format of Amazon SQS, IBM MQ, and HTTP Export connections to match the output formats of their respective legacy Output Connectors (v28 and earlier). 

Submission Processing

Fixed

Page collation in submissions with multiple documents – Previously, when submissions had several documents that matched to the same layout, pages from different documents were sometimes identified as belonging to the same document. A fix for this issue is included in v30.0.1+. 

AVX2 support  We've resolved an issue that caused document pre-processing tasks to crash and submissions to halt with application VMs that don't support AVX2 instructions.

Document Processing

Fixed

Efficiency of document pre-processing tasks (Semi-structured only) – We've increased the efficiency of document pre-processing tasks for Semi-structured documents. 

Task Processing

Updated

Timeouts for block processes – We've reduced the amount of time it takes for a block process to timeout, allowing for faster retries.

Dashboard

Updated

Additional counts for Supervision Tasks and Quality Assurance Tasks  The Supervision Tasks and Quality Assurance Tasks cards now have page counts for Document Classification tasks and field counts for Identification and Transcription tasks.

Submissions Table

Updated

Tooltip positioning – We've changed the positioning of tooltips to appear above the content you hover over rather than below it. Doing so allows you to see and click on the contents in the table. 

Fixed

Loading time with large submissions – We've reduced the loading time of the Submissions table when it contains submissions with more than 700 pages.  

Submission Output Page

Updated

View Transformed Output  We've added a View Transformed Output option to the Actions drop-down menu, allowing you to view the version of the Submission's JSON output that has been transformed by Custom Code Blocks.

Fixed

Accessing Submission output page for completed submissions – We've fixed an issue that caused errors to occur when users viewed the Submission output page for a completed submission. 

Task Queue

Updated

Name of “Entries" Task Queue column  We've changed the name of the “Entries” column in the Task Queue table to “Fields” to indicate its contents more clearly. 

Supervision

Updated

Handling of Supervision tasks – We've increased efficiencies around the closing of Supervision tasks, along with the loading of tasks that occurs when clicking links in the Supervision Tasks and Quality Assurance Tasks dashboard cards.

Field ID Supervision

Updated

Alignment of multiline field names in Field ID Supervision (Semi-structured only) – We've changed the alignment of multiline field names in Field ID Supervision tasks from right aligned to left aligned.

Identification Supervision

Fixed

Attribution of identification source – We've fixed an issue that caused the Manual Identification Block to sometimes pass the incorrect identification source to the Submission object. 

Signature Fields

Fixed

Transcription Supervision tasks for Signature fields (Semi-structured only) – We've resolved an issue that caused the system to generate Transcription Supervision tasks for identified Signature fields in Semi-structured documents. 

Logs

Fixed

Secret values in block logs – We've resolved an issue that sometimes caused secret values to be stored in block logs when exceptions occurred.

Configuration

Fixed

Reading the entire contents of the ".env" file – We've resolved an issue that prevented the final variable in the ".env" file from being read if it wasn't followed by a new-line character. 

SAML

Updated

Removal of SAML_STAFF_PERMISSONS_ROLES variable – We've removed support for the SAML_STAFF_PERMISSONS_ROLES ".env" file variable, as the LDAPGroup configuration serves the same purpose.

Fixed

Time difference between identity provider (IDP) and Hyperscience – To prevent internal server errors caused by the IDP server's time being later than the instance's server time, we've added a time skew to SAML configurations.

Databases

Updated

Offloading flow payloads – After a flow is completed, its payload will be moved from database storage to file storage.

Fixed

Efficiency of database queries – We've increased the efficiency of database queries to prevent database CPU usage from reaching 100%.

30.0.0 (23 Apr 2021)

V30 of Hyperscience contains the features described below, as well as all of the enhancements and fixes introduced in prior versions. Before upgrading, we encourage you to read the release notes for versions that follow your current version of Hyperscience.

Flows

New

With this release, we're changing the architecture of our solution to make it more flexible, modular, and integrated with your organization's data sources. The introduction of Flows allows you to define multiple workflows to accommodate your lines of business. Each of these workflows, or flows, can be customized to meet the needs of each of your teams. The flexibility of flows lies in blocks, which are self-contained programs that run in the sequence you choose.

With blocks, you are no longer limited to the standard "Classification, Identification, and Transcription" process. Instead, you can choose to have manual or machine versions of these steps, or you can choose to have both in your flows:

V28V30Blocks.png

Blocks also take the place of input and output connectors, and our new API blocks give you the ability to connect to your organization's other data sources. You can add the data from the API to your submissions' output, or you can use it to validate submissions before they reach your downstream systems. When coupled with our new Routing blocks, you can create paths in your flow based on the decision criteria you specify, allowing you to send complete, accurate data to the appropriate downstream systems. If none of our pre-built blocks meet your needs, we can create a custom code block and add it to your flow.

Upon upgrading to v30, you will have a standard flow that looks like the one below, which will be used to process all of your submissions: 

ExampleFlowInFlowStudio.png

If you would like to make changes to this flow, contact your Hyperscience representative for assistance.

Settings in v30

Due to the modular nature of flows, many settings that previously applied to an instance as a whole are now specific to individual blocks. 

Limitations in v30

Because the flows feature represents a fundamental shift in the Hyperscience solution, there are some limitations in v30 that you should be aware of before upgrading. 

Key limitations include the following:

  • You will need to reach out to your Hyperscience representative if you would like to add new flows or make changes to your current ones.

  • These settings and configurations will continue to apply to all of your flows, with flow-specific customizations expected in future releases: 

    • One Live release

    • A single transcription automation training (or "finetuning") model

    • A single Target accuracy setting

Connectors

Updated

Changes to Connector functionality – As part of our transition to flows, we've replaced Connectors with Input Blocks and Output Blocks. This change has caused the following changes to Connector functionality:

  • Notification Filters and Source Tags – With the introduction of Routing Blocks, we are deprecating the Notification Filter and Source Tag features. Previously configured notification filters based on source tags or submission properties will continue to work after upgrading to v30, but they will be removed in a future version.

  • Installation and settings – In the past, Connectors needed to be uploaded to your Hyperscience instance and were configured in a dedicated part of the application. In v30, you do not need to upload Input or Output Blocks to your instance, as they are built into the application. Like other types of blocks, these blocks can be configured in Flow Studio. However, you will need Hyperscience's assistance to edit certain properties of Input and Output Blocks.

  • FileNet and ActiveMQ – FileNet and ActiveMQ Output Blocks are not included in v30. If you would like to add these blocks to your instance, contact your Hyperscience representative.

Known Issues

Message Queue Output Block configuration  A Message Queue notification may fail after its block's configuration is changed and its flow is redeployed. This issue affects all Message Queue notification types, and we are working on a fix for it.

User Interface

Updated

To reflect our updated offerings and to enhance our user experience, we've rebranded and refined the application user interface.

Here are some of the notable changes you can expect to find in v30:

Navigation updates

Upon opening v30 of the application, you will notice these changes in the left-hand sidebar:

  • We've renamed the Work Queue section of the application to "Tasks."

  • We added a section for Flows.

  • We changed the order of the sections, with Tasks and Submissions now appearing at the top of the list.

Direct links to tasks

We’ve added a "Tasks" column to the Submission and Documents tables, containing direct links to Supervision tasks.

Flow column and filter for submissions and documents

On the Submissions > Submissions and Submissions > Documents pages, we've added Flow columns and filters, allowing you to find content processed through a certain flow.

Notifications and User Profile now in top navigation

We've moved the Notifications and the User Profile from the left-hand sidebar to the top-right corner of the page, making them more discoverable and easily accessible to users.

Submission documents no longer on the Submissions table

We've removed the drop-down list of documents for individual entries in the Submissions table. In its place, we've added a link to the number of documents in the Documents column. Upon clicking the link, the user is taken to the Submissions > Documents page, with the Documents table filtered to show the submission's documents.

Statuses of submissions and documents

We've updated the possible statuses of submissions and documents to the following:

  • "Processing"

  • "[Block Name]" if there are pending Supervision tasks

  • "Completed"

  • "Halted"

Cards on the Tasks >  Overview page

Notable changes to the cards on the Tasks > Overview page (formerly Work Queue > Overview) include:

New "Supervision Tasks" and "Quality Assurance Tasks" cards

On the Tasks > Overview page (formerly Work Queue > Overview), we've added "Supervision Tasks" and "Quality Assurance Tasks" cards, showing the number and types of open tasks. Users can perform tasks directly from the cards by clicking the "Perform Tasks" links, and they can see how many tasks are overdue. Users also have the option of clearing QA tasks of a certain type.

Renamed and enhanced "In Progress" and "Active Supervision Users" cards

We've renamed the "In Progress" and "Active Supervision Users" cards to "In Queue" and "Active Workers," respectively. Users can filter the "In Queue" card's content by flow, and the "Active Workers" card shows a list of all active users performing supervision tasks.

Ability to select multiple tasks for assignment

Users can now select multiple tasks in the Task Queue, allowing them to claim the selected tasks for completion. After completing the tasks, the user is taken back to the Task Queue. 

All tasks now in a single Task Queue

Instead of dividing tasks by type, we've removed the Page Sorting, Identification, and Transcription tabs and created a unified view of tasks in the Task Queue. Users can still filter by task type, and they can also choose which columns to show in the task table.

Machine Learning

New

Field Locator 2.2 (Semi-structured only) – We've upgraded our default training model for Field Identification and Table Identification to Field Locator 2.2. Field Locator 2.2 offers the following benefits:

  • Increased automation and straight-through-processing – Field Locator 2.2 has been shown to increase the percentage of documents that are processed without human involvement in Field Identification, with 30-60% of documents being completed with fully automated Field processing.

  • Improved sub-segment extraction – You can now extract information from a portion of a text more easily, increasing automation in use cases where only certain values in a body of text should be transcribed. For example, you can identify a date within a paragraph as a field to be transcribed. You can also refine field location predictions in identification documents, where multiple data points can appear in a small area, to exclude surrounding text.

  • Faster training – Training jobs will take less time to complete with Field Locator 2.2 than they did with previous versions.

  • Greater tolerance for annotation inconsistencies – Instead of location-based matching, Field Locator 2.2 uses text to predict the location of fields. This change allows keyers to annotate fields more quickly, since they do not need to be as precise in their annotations to maintain automation rates. We also expect that keyers will need to complete fewer Model Validation Tasks with this upgrade.

  • Enhanced performance for documents outside of the training set – In past versions of Field Locator, the system relied entirely on the provided training documents when locating fields. With Field Locator 2.2, the system is better able to locate fields on documents that it has not been trained on. For example, if you've trained your model to locate fields on invoices, and you start processing invoices that are formatted differently from the training documents, the system will be able to recognize fields on the new invoices. 

The input limits for Field Locator 2.2 are the same as those for Field Locator 2.0. While we still recommend providing 400 training documents for each dataset, we only require 120 documents for Field Locator 2.2. 

If you are upgrading from a version of Hyperscience earlier than v28, you will need to submit more training documents than you did previously to achieve optimum model performance. 

Layouts that used Field Locator 2.0 or earlier will remain on their current versions of Field Locator after upgrading the application to v30. However, you can configure these layouts to use Field Locator 2.2.

Updated

Improved speed of document classification – We've improved the speed of certain pre-processing tasks, which has allowed our system to classify both Structured and Semi-structured documentations more quickly.

Enhancements to text pre-processing – We've made improvements to how our system detects and separates portions of text, helping to optimize model performance and improve our predicted bounding boxes in Field and Table Identification tasks.

Serviceability

New

System & Health page – The new System & Health page (Administration > System & Health) lets you monitor the health of your system and its components.

  • At a glance, you can check whether your application machines and trainer machines are operating as expected. Clicking the link in the "Application Machines" or "Trainer Machines" cards reveals additional machine-specific status details.

  • You can see if there are errors related to submissions, flows, jobs, or trainer tasks. The cards for each of these elements also contain links that take you to the relevant pages in the application.

  • The System & Health page also contains the Connector logs that were formerly on the Connectors page, which has been removed in this version.

  • In addition to Connectors, we've removed the About page and included version information at the bottom of the System & Health page.

Jobs

Updated

Changes to Jobs page functionality – With the introduction of Flows, we’ve made the following changes to the functionality of the Jobs page:

  • Submissions and Flows – After upgrading to v30, submissions will be associated with flows rather than jobs. Just as you retried jobs for failed submissions in the past, you also retry flows to reattempt the processing of failed submissions. In rare circumstances, there may be halted jobs associated with submissions created before the upgrade. You can retry these jobs by clicking on Legacy Jobs in the drop-down list on the Jobs page and retrying the jobs as you normally would.

  • Filters and retrying jobs or flows – Instead of retrying all halted jobs or flows in the instance, the system will only retry jobs or flows that meet the current filter criteria.

Languages

New

Support for Dutch submissions – We now support automation on Structured and Semi-structured documents written in Dutch. The Dutch language model allows our system to extract data from Dutch documents, and keyers can complete Transcription Supervision and Transcription QA tasks by entering Dutch text.

Submission Processing

New

Optimized task assignment and submission deletion – Previously, customers with a backlog of PII-wiped submissions may have experienced slowness in the application, preventing them from assigning or completing tasks. These delays were caused by inefficiencies in our task-assignment processes and our management of PII-wiped submissions. Because the PII-wiping process does not delete submissions, the system still considered PII-wiped submissions as eligible for task assignment. To remedy these issues, we're introducing the following improvements in this release:

  • Optimized mechanisms for task retrieval and assignment

  • An optional "Submission record deletion policy" setting that enables the automatic deletion of PII-wiped submissions 

In addition, the "PII data deletion" setting will no longer be enabled by default. Any changes you make to either the "PII data deletion" or "Submission record deletion policy" will be applied retroactively to processed submissions still in the system.

Submission record deletion policy

If you enable the new "Submission record deletion policy" setting, the system will delete PII-wiped submissions that have already passed the Transcription Automation Training window configured in the Transcription settings. The time interval you enter in this setting will be added to either that training window or the PII data deletion window, whichever is longer.

Permissions

New

Knowledge Worker permission group – With our new Knowledge Worker permission group, users assigned to this group can access both the Tasks view and the Submissions view.

Updated

Task Restrictions for Submissions page tasks – Previously, Task Restrictions were applied to tasks accessed in the Work Queue (now the Tasks section of the application), but not to tasks on the Submissions page. With this release, Task Restrictions will be applied to all tasks, regardless of whether users access them from the Tasks pages or the Submissions page.

Block-level Task Restrictions – You can now assign Task Restrictions to tasks generated by specific Manual Identification or Manual Transcription blocks. This feature extends the capabilities of Task Restrictions, which allow you to limit the users who can perform tasks on selected submissions or layouts.

Reporting

New

Telemetry – You can now choose to automatically send usage data to Hyperscience. In this way, Telemetry eliminates the need for you to send monthly Usage Reports to your Hyperscience representative. If you enable this feature in Telemetry settings, data will be sent daily from your instance to a Hyperscience-hosted repository via a mutually authenticated TLS connection. Telemetry data consists of information from the Usage Report, which now includes aggregated, anonymized product analytics data. No proprietary or personally identifiable information (PII) will be shared with Hyperscience.

This information enables us to make more informed decisions about potential product improvements. We invite you to review this data before sharing it with Hyperscience.  

Layouts

Updated

Removal of “Routing” field option – We have temporarily removed the “Routing” option for layout fields, as we are not supporting the autorouting feature in v30. Fields with the “Routing” option selected were automatically routed to specific user groups for Supervision and QA. We expect this option to become available again in an upcoming version.

Field Identification

Updated

Updates to one-click bounding boxes – In v30, we've made the following enhancements to our one-click bounding box feature:

  • In addition to Field ID Supervision tasks, you can now use one-click bounding boxes to indicate the location of fields during Field ID QA and Field ID Model Validation Tasks (MVTs).

  • Previously, our one-click bounding boxes contained entire lines of text. In v30, you can now create a one-click bounding box for an individual word in a line by clicking on that word. These word-level bounding boxes can be created during Field ID Supervision, Field ID QA, and Field ID MVTs.

Table Identification

Updated

New keyboard shortcuts for Table ID Supervision (Semi-structured only) – We've added keyboard shortcuts that allow keyers to move through the cells in a table using only their keyboards:

  • Ctrl + E selects the next cell requiring supervision.

  • Shift + Ctrl + E selects the previous cell requiring supervision. 

Fixed

Mac right-click keyboard shortcut in the Template Tool (Semi-structured only) – We've fixed an issue that prevented the Template Tool's contextual menu from appearing when keyers used the Ctrl + CLICK shortcut on Macs.

Table Transcription

Fixed

"Manual review of table output" and Table Transcription tasks – We've resolved an issue that prevented Table Transcription tasks from appearing in the Work Queue (now the "Task Queue") when the "Manual review of table output" setting was enabled.

Installations

Updated

PostgreSQL 12.5 Docker image – Our installations now include PostgreSQL 12.5 Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

Configuration

Updated

VM CPU core recommendation (Semi-structured only) – While we still support 8 cores for each CPU in a VM running Hyperscience, with v30's Field Locator 2.2, we recommend having 16 cores for each CPU in a trainer’s VM. If you use 8 cores, you can expect 60-70% longer training times and an increased risk of crashes during training, particularly on datasets with longer, denser documents.

Using the Health Check Status endpoint  The Health Check Status endpoint is designed to help you monitor the health of your system’s components. If any component tested by the Health Check Status API is in an error state, the endpoint will return a 503 error code. For example, if you enter the Health Check Status endpoint as your load balancer's health check URL, an issue in one server will cause all servers to return an error code to the load balancer. This response will prevent traffic from being routed to your entire system, even if healthy servers are available.

As described in the API section of these release notes, we're returning data about additional components in Health Check Status responses. While we do not recommend using this endpoint as your load balancer's health check URL in any version of our application, the additional component data returned in v30 increases the likelihood that a component error will prevent traffic from being routed to your system. For this reason, we strongly recommend that you do not use this endpoint as a health check endpoint for individual servers.

Databases

Updated

PostgreSQL 9.5 no longer supported – We no longer support the use of PostgreSQL 9.5 in application databases. We will still support PostgreSQL 10, 12.4, and 12.5.

API

New

Field definition attribute for Duplicates – To support our Duplicate field extraction feature, we've added duplicate to the attributes included in field_definition_attributes. To learn more, see our API documentation.

Page property for document page number – We’ve added the document_page_number field to the Page object model. If a page matches to a layout, the document_page_number field in its Page object indicates which page of the document it represents. More details can be found in our API documentation.  

Endpoint for retrieving transformed Submission data – As part of the Post-Processing Customization feature, we’ve added an endpoint that retrieves transformed Submission data. For more information, see our API documentation

Updated

Removed endpoints – We've removed the Settings Import and the Transcriptions-Only endpoints.

New data returned in Health Check Status endpoint – The Health Check Status endpoint now returns TRAINER_CONNECTIONS, TRAINER_TASK_STATS, MACHINES, and SUBSYSTEMS properties, offering you more insight into the health of individual system components. These properties are in addition to the STORE_XXX, STORE_$XXX, and DB properties previously returned.

For more information on using this endpoint, see the Configuration section of these release notes.