V31 Release Notes

31.0.18 (27 Jun 2022)

Databases

Fixed

PostgreSQL queries and system performance – We’ve fixed an issue with executing unnecessary PostgreSQL queries related to flows that caused spikes in CPU usage. 

Security

Updated

Changing the javax.el library to jakarta.el and updating it – To address a security vulnerability, we’ve changed the Maven javax.el library to jakarta.el and updated it to 3.0.4.

Updating jackson-databind and jackson-dataformat-cbor – To fix security issues, we’ve updated jackson-databind to 2.13.2.2 and jackson-dataformat-cbor to 2.13.2.

31.0.17 (11 May 2022)

User Experience

Updated

Custom login warning message – You can now create a custom login warning message, which is displayed when a user first logs in to your Hyperscience application. The message is shown as a dialog box that the user can dismiss by clicking the dialog's Continue button.

To add this message to your login experience, add the LOGIN_WARNING_BANNER_TEXT variable to your “.env” file, and enter your message as the variable’s value.

Connections

Updated

"Audience" setting for HTTP Notifier and HTTP REST API Blocks – We've made the Audience setting optional in HTTP Notifier and HTTP REST API Blocks.

Security

Fixed

Updating OpenSSL and OS packages – When creating an installation bundle for a new version of Hyperscience, we now use the latest available version of OpenSSL, and we update relevant OS packages.

31.0.16 (13 Apr 2022)

Submissions Table

Fixed

Adding the Manual Classification option to the Status filter – We’ve added the Manual Classification option back to the Submissions table’s Status filter. 

Data Types

Updated

Creating custom field data types (CFDTs) from patterns with “(space)” – Previously, if a user selected the (space) option in the Define Normalization dialog box when creating CFDTs from patterns, the “space” symbol was not visualized in the list of characters stripped in output. With the update included in this version, the “space” symbol is now visualized as “(space)” in the list of stripped characters.

Reporting

Fixed

Daylight Savings Time and KeyerProjection.csv – We’ve fixed an issue related to Daylight Savings Time in KeyerPerformance.csv, which is part of the Keyer Projection report (Reporting > User Performance). The issue caused KeyerPerformance.csv to include data that is outside of the report’s date range. 

Security

Fixed

Audit log editing permissions – We've fixed an issue that allowed System Admins to edit the audit log.

Updating com.google.code.gson to 2.8.9 – To fix a security issue, we’ve updated com.google.code.gson to 2.8.9.

Updating lxml to 4.7.1 – To fix an issue with Cross-site scripting (XSS), we’ve updated lxml to 4.7.1.

TVE Instances

Fixed

Database schema in the SQL Explorer tool – We’ve fixed an issue that prevented the database schema from being displayed in the SQL Explorer tool in TVE instances.

31.0.15 (15 Mar 2022)

Submission Processing

Updated

“Upload Submissions” dialog box and default layouts – The Upload Submissions dialog box no longer defaults to any particular Semi-structured layout. 

Models

Fixed

Model-import error messages in IE 11 – Previously, if an attempt to import a model in IE 11 was not successful, the error message shown was not completely contained in the dialog box. A fix for this issue is included in v31.0.15, and the user no longer needs to scroll the dialog box horizontally to view the complete message.

S3 Submission Retrieval Store

Updated

Support for Signature Version 2 in S3 Submission Retrieval Store flow setting – We’ve added support for Signature Version 2 (SigV2) in the S3 Submission Retrieval Store flow setting.  

Installations

Updated

PostgreSQL 12.10-alpine Docker image – Our installations now include PostgreSQL 12.10-alpine Docker images.

31.0.14 (8 Mar 2022)

Machine Learning

New

Support for single-character fields – We’ve made improvements to our machine learning models to support single-character fields.

Upgrades

Fixed

Processing Structured documents and upgrading the application – We’ve fixed an issue that caused Structured submissions to halt when upgrading from v28 to v31.

31.0.13 (3 Mar 2022)

User Experience

Fixed

Font size of the “Archive layouts” dialog box’s text – We’ve made the “Archive layouts” dialog box’s text size consistent.

Table Identification

Updated

Table Identification for large documents – Previously, you were always able to scroll through all pages of a document during Table Identification. As a result, Table Identification actions took between 2 and 5 seconds to load for documents with more than 18 pages. In v31.0.13, we’ve added a button (mceclip0.png) to the top toolbar that, when activated, restricts scrolling between pages and loads a single page at a time. Clicking this button improves the loading time for Table Identification actions for large documents. You can still navigate between pages via shortcut keys.

Fixed

Row prediction for large documents – We’ve fixed an issue that caused row prediction to fail for documents that have a large number of pages.

Flows

Updated

Adding endTime to flows exported from the Jobs page – We've added the endTime field to flow JSON files exported from the Jobs page of the application.

Flow Blocks

New

PDF Decrypt flow block – We’ve added a PDF Decrypt flow block that utilizes the QPDF command-line tool to decrypt PDF submissions prior to processing. 

Input and Output Connections

Updated

“Headers to Include” setting for Email Listener – We’ve added a Headers to Include setting for the Email Listener input connector. This setting allows you to include headers from emails ingested into Hyperscience via the Email Listener connector. 

Databases

Updated

Domain accounts for SQL Server connections – You can now connect Hyperscience to SQL Server using domain account login credentials. 

Deleting records of completed tasks at designated points – To reduce the amount of data stored in the database, we now delete records of completed flow tasks after their results are saved.

Infrastructure

Updated

Killing database sessions using local time – Previously, using local time in an Azure SQL Managed instance caused the system to measure blocker and sleeping database sessions as having lasted more time than they actually lasted. As a result, the system killed such database sessions as soon as they moved to a blocker or sleeping state. A fix for this issue is included in v31.0.13.

Security

Fixed

Viewing HTML submissions – We’ve fixed an issue with viewing HTML submissions that allowed Cross-site scripting (XSS) in IE 11.

31.0.12 (14 Jan 2022)

Table Identification

Fixed

Inconsistent Table Identification QA task creation – Previously, Table Identification QA tasks would sometimes be created when they should not have been, and at other times, the system was prevented from creating these tasks as expected. A fix for this issue is included in v31.0.12, and the system creates Table Identification QA tasks consistently.

Table Transcription

Fixed

“Next empty cell” keyboard shortcut and hidden columns – We've resolved an issue in Google Chrome on Windows where if the right-hand sidebar hid a table column during Table Transcription and a keyer pressed Control + E, the keyer wasn't moved to the next empty cell.

Permissions

Fixed

Submission-level task restrictions coming from the API – We’ve fixed an issue that prevented submission-level task restrictions that were set via the API from being applied to Supervision tasks.

Trainer

Updated

Changing the trainer’s mount directory – We’ve changed the trainer’s mount directory from media to trainer_media to prevent the trainer and the application from sharing the same mount directory.

Input Connections

Fixed

Hiding “Exchange” and “Routing Key” settings for RabbitMQ Listener  – We’ve hidden the Exchange and Routing Key settings for RabbitMQ Listener. Previously, these settings were displayed for RabbitMQ Listener but were unnecessary.

31.0.11 (3 Jan 2022)

Data Types

New

New currency and email data types – We've added the following default data types to the system:

  • Currency data types for trailing signs – These data types allow keyers to express negative currency values by surrounding the value in parentheses or by entering a negative sign or "CR" after the value:

    • Currency Trailing Sign - X,XXX.XX

    • Currency Trailing Sign - X.XXX,XX

    • Separated Currency Trailing Sign - X,XXX XX

    • Separated Currency Trailing Sign - X.XXX XX

      For example, with these data types, 200-, 200CR, and (200) are all equivalent to -200.

  • Email data type for international email addresses – The Email Address International data type uses a regular expression rather than a language model for validation. During normalization, all letters are made upper case, and no other changes are made. With this data type, email addresses are transcribed agnostic to the layout language selected, which leads to increased performance on documents that contain emails from multiple countries and languages.

Reporting

Updated

Threading and scheduling of data-aggregation tasks – We've improved how the system distributes and schedules data-aggregation tasks for reports. With these changes, report-creation background processes can now be scheduled outside of business hours, and report generation is now faster and more reliable.

Output Connections

Updated

Improved error handling for OAuth2 URLs – When an invalid OAuth2 Authorization URL is entered in the settings for a HTTP Rest Block or HTTP Notifier Output Block, the system shows an HTTP 404 error message. Previously, the error message did not clearly state that the URL was invalid.

Databases

New

PgBouncer – We now include PgBouncer in our installation bundles, which allows you to minimize database usage by limiting the number of database connections the application can create. PgBouncer is supported for PostgreSQL databases only and is recommended for instances that have more than 100 CPU cores across all application machines. 

Updated

Moving transformed outputs to the file store – To reduce the amount of data in the database, we've moved transformed submission outputs to the file store.

Fixed

Type-conversion errors in Oracle databases – We've fixed a type-conversion issue in Oracle databases that caused a large number of trace files to be created. These files quickly filled system databases in some instances.

API 

Updated

Support for Base64-encoded JSON data in submission creation – You can now send submission data in JSON format when creating submissions via the Submission Creation endpoint. To do so, include the Content-Type: application/json header in your request. When sending requests with this header, note that the request body has a different format than requests sent as multipart/form-data or application/x-www-form-urlencoded.

31.0.10 (3 Dec 2021)

Table Identification

Fixed

Row indexing and halted submissions – We've fixed a row-indexing issue that caused some row indices to have negative values. These invalid values caused affected submissions to be halted.

31.0.9 (30 Nov 2021)

Data Types

Fixed

Creating custom field data types (CFDTs) from patterns - We’ve fixed an issue that prevented users from creating a pattern CFDT. 

Table Identification

Fixed

Cells and rows during table-identification review - We’ve fixed an issue that sometimes caused row cells to be misaligned during table-identification review. For example, in the following image, cells from row 5 are located in rows 6 and 7.

ImageFor31.0.9.png

Output Connections

Updated

Logging details in UiPath error responses – We now log error messages from UiPath error responses, which contain details that can be helpful in debugging. Previously, we logged only the status codes in these messages.

Trainer

Fixed

Running the trainer on a non-English Ubuntu operating system - We’ve fixed an issue that prevented users from running the trainer on an Ubuntu operating system that is configured with a non-English language.

31.0.8 (24 Nov 2021)

Submission Processing

Updated

Task polling for blocks – To reduce database queries, we've implemented a task-polling mechanism for blocks, which lets the system know the resources each block has available to complete tasks.

Logs

Updated

Error logs for missing task handlers – We've removed an error-level log for missing Table Identification Supervision task handlers that was not actionable by users, helping to reduce non-actionable content in the log.

Changing some error logs to info logs – We've moved some error logs to the info level because users could not take action on them, nor did they represent failure scenarios.

Image Deletion

Fixed

Adding image-deletion tasks to list of permitted tasks – We've resolved an issue that prevented image-deletion tasks from being permitted by the system in some cases. This issue caused unnecessary data to accumulate in the file store. 

Installations

Updated

PostgreSQL 12.9-alpine Docker image – Our installations now include PostgreSQL 12.9-alpine Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

31.0.7 (19 Nov 2021)

Layouts

Updated

Enabling Find Potential Layout task creation with no trainer attached – The system can now create Find Potential Layout tasks even if there is no trainer attached. This functionality gives users the ability to shut down the trainer machine without affecting the creation and queuing of trainer tasks.

Classification

Fixed

Models for classifying Semi-structured and Additional documents – We've resolved an issue that prevented the system from using records generated by classification models for Semi-structured and Additional documents.

Submission Processing

Updated

Improvements to submission processing – We've made the following enhancements to the system's processing of submissions:

  • Changed the sorting of records in the database to prevent deadlocking

  • Optimized processes for submission deletion, PII data deletion, and data cleanup to allow several jobs to run in parallel

Table Identification

Updated

Excluding documents with many text segments from Table Identification QA – You can now prevent the system from generating Table Identification QA tasks for documents whose text-segment count exceeds the limit you specify. For assistance in setting this limit, contact your Hyperscience representative.

Comparing annotations during Table Identification QA consensus – We've removed redundant annotation-comparison tasks in Table Identification QA consensus processes.

Fixed

Table Identification QA tasks and processing delays – We've fixed an issue that caused system resources to be consumed by Table Identification QA tasks. This issue prevented other tasks from being completed, resulting in submission-processing delays.

Output Connections

Updated

API payload for UiPath Notifier Output connections – We've updated the API payload for UiPath Notifier Output connections to be consistent with the connections’ v28 payloads and the payloads of other output notifiers.

Authentication

Fixed

Creating users with third-party authentication providers – Previously, if an error occurred when creating a user through a third-party authentication provider, that user would not be able to log in, even after the error was resolved. A fix for this issue is included in v31.0.7 and later.

File Storage

Updated

Support for AWS Signature Version 2 in S3 file stores – We've added support for AWS Signature Version 2 in HTTP requests sent to S3 file stores.

Upgrades

Fixed

Classification of in-progress Structured submissions after upgrading from v28.3.2 – An issue in v31.0.6 caused Structured submissions not yet classified to be marked as No Layout Found when upgrading from v28.3.2. A fix for this issue is included in v31.0.7 and later.

31.0.6 (12 Nov 2021)

NOTE: Before upgrading to v31.0.6, check your ".env" file for the following variables:

BLOCK_SCALE_VPC
BLOCK_THREADS_VPC

If these variables are in your ".env" file, add the following variables:

BLOCK_SCALE_VPC_2
BLOCK_THREADS_VPC_2

Set the values of these variables to match the values for BLOCK_SCALE_VPC and BLOCK_THREADS_VPC, respectively.

User Experience

Fixed

November 2021 calendars in date filters – We’ve resolved an issue related to Daylight Savings Time that caused November 7 to appear twice in November 2021 calendars in date filters.

Layouts

Updated

Removal of Routing option for fields – Because we deprecated the Routing feature in a previous version, we've removed the Routing option for fields in the Layout Editor and in the Fields and Customizations tab of the Layout Details page.

Layout Editor

Fixed

Switch to a different Structured layout variation from the Layout Editor – We’ve fixed an issue that prevented users from switching between different Structured layout variations when using the Layout Editor’s drop-down menu in the right-hand sidebar.

Data Types

Fixed

Existing ML Configuration drop-down list in IE 11 – We've resolved an issue that prevented the options in the Existing ML Configuration drop-down list from appearing when creating a data type in IE 11.

Creating a data type from a list of duplicate values – We've fixed an issue where users were allowed to create data types from a list of values containing duplicate values.

Training

Updated

Use of segments found during submission processing – The trainer now uses text segments detected during submission processing when training field locator models.

Processing and scheduling of Field Identification QA and Model Validation Tasks (MVTs) – We’ve reduced the number of database queries related to Field Identification QA and MVTs, improving system efficiency in high-volume instances.

Flows

Updated

Performance optimizations for flows – We've made the following improvements to the processing of flows:

  • Created customized database indexes and queries

  • Capped the number of tasks in that can be processed in a single query

  • Ensured that block-process managers do not quit when their connections to the database are lost

  • Made changes to custom flows to reduce the risk of partial processing

  • Modified sharding functions so they are informed by the health records of each individual machine

  • Added checks for sub-processes 

Fixed

Duplicate Folder Listener connections across flows – We’ve fixed a flow-validation issue that prevented the system from creating error messages when identical Folder Listener connections were added to different flows. Having Folder Listener connections with the same settings across flows causes resource-allocation issues and is not permitted.

Duplicating “Document Processing (V31)” flows and S3 settings – We’ve resolved a flow-validation issue that caused a error message to appear when duplicating the “Document Processing (V31)” flow. This error was due to the duplication of the S3 Submission Retrieval Store setting across flows, even if the field was set to the default value in the original flow.

Block-process manager performance with a large number of resources – We've fixed a resource-detection issue in block-process managers that caused delays when a large number of input resources were present (e.g., images in a Folder Listener's folder).

Submission Processing

Fixed

Task synchronization across flow blocks – We've fixed a task-synchronization issue across flow blocks that caused submissions to fail, particularly in high-volume instances.

Supervision task duration and halted submissions – We’ve fixed an issue that caused submissions to halt when Supervision tasks had negative durations. 

Halted submissions in Processing state – We’ve fixed an issue that caused the submission of non-existing files from S3 file stores to result in halted submissions that had a Processing status. With the fix included in v31.0.6+, the system completes these submissions and returns an exception.

Submissions Table

Fixed

Unexpected error when filtering by halted submissions – We’ve fixed an issue that resulted in an unexpected error when filtering the Submissions table by halted submissions.

Classification

Updated

Clean-up script for Structured document classification – We've added a script to clean up caches of data used in the machine classification of Structured documents.

Improved release-loading times for Structured document classification – We've reduced the amount of time needed for release information to be sent to Structured document classifiers in Machine Classification Blocks.

Transcription

Updated

Entering transcriptions containing more than 2000 characters – You can now remove the 2000-character limit for text field transcriptions. When this limit has been removed, keyers can enter more than 2000 characters for a field during Transcription Supervision and QA tasks, including Flexible Extraction

Note that entering longer transcriptions may affect application responsiveness. To learn more, and to remove the character limit in your instance, contact your Hyperscience representative. 

Table Identification

Fixed

Table Identification QA and multiple-page submissions – We've resolved an issue that caused table cells to be omitted during Table ID QA and in the resulting data when multiple pages in a document contained rows. 

​​Distinguishing cells in identical locations across pages – We've fixed an issue that prevented the system from distinguishing cells in identical locations on different pages. This issue primarily occurred in low-quality images and with long segments of text.

Creation of Table Identification QA tasks after consensus – We’ve resolved an issue that caused the system to generate Table Identification QA tasks after consensus was reached.

Resource allocation for Table Identification QA tasks – We’ve fixed an issue that caused Table Identification QA resources to be deadlocked in some situations. This issue caused jobs in affected instances to fail.

Custom Code Blocks

Fixed

In-memory import of Python scripts for Custom Code Block tasks – Previously, when importing Python scripts, the scripts were saved locally on file. We’ve fixed an issue that sometimes caused the file to be empty. With the fix included in 30.0.12+, we use an in-memory import of Python scripts instead of saving the scripts locally on file.

Folder Listener

Fixed

Time zones for Warm-Up Interval comparisons – To resolve Warm-Up Time Interval calculation issues, the system now uses a file's last-modified time in the local time zone rather than UTC when determining if the file is eligible for processing.   

Message Queue Connections

Fixed

ERROR logs for Message Queue (MQ) Listener connections – We’ve resolved an issue that caused ERROR logs to be created in some flows that were working properly. The system now creates INFO logs in these situations.

SSL CipherSuites and IBM Message Queue (MQ) connections – We've fixed an issue that caused input and output connections to IBM MQ servers to fail when SSL CipherSuites were enabled.

Installations

Updated

PostgreSQL 12.8-alpine Docker image – Our installations now include PostgreSQL 12.8-alpine Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

Infrastructure

Updated

Trainer machines and sharding – We've updated the system's sharding functions to exclude trainer machines. This change prevents flow tasks from being sent to trainer machines, improving efficiency in both the trainer and the application.

Cron jobs polling – To increase performance and reduce dependency on Service Broker, we’ve updated the cron jobs channel to use polling instead of the database-provided notification system. 

Improved management of database sessions – We’ve improved database session management in instances with MSSQL databases. Specifically, the system will kill sessions that:

  • are blocking other database sessions, 

  • are unusable, or

  • have been idle for a long time.

Health check for database-provided notification mechanisms – We've added a check for database-provided notification mechanisms (e.g., LISTEN/NOTIFY) to our system health check. This check also sends an alert when Service Broker is disabled in instances with MSSQL databases.

TVE Instances

Updated

Memory consumption and Field Locator models – We've fixed an issue that caused the loading of Field Locator models to consume an excessive amount of memory. While this fix applies to all installations, the memory-usage issue was most noticeable in TVE instances.

Health Statistics

Updated

Disabling the gathering of health data – You can now disable the automatic gathering of health data by adding the HS_HEALTH_STATISTICS_ENABLED and HS_WFE_HEALTH_STATISTICS_ENABLED variables to your “.env” file and setting them to false. Setting these variables to false may help to improve system responsiveness in high-volume instances. However, doing so also affects the data shown on the System & Health page in the application.

Trainer

Updated

Ability to add Field ID and Table ID training tasks with no trainer attached – Users can now create Field ID and Table ID training tasks for Semi-structured layouts even if there is no trainer attached. This functionality gives users the ability to shut down the trainer machine when the training tasks queue is empty.

Databases

Fixed

Saving layout changes on MSSQL databases We’ve fixed an issue for MSSQL databases that caused saving changes to existing layouts to result in system slowness.

Updated

Changes to database-notification health checks and polling – To improve scalability, we’ve removed the health check for the database-provided notification mechanism in production instances, and we're enabling polling for job queues and block-process manager channels. The database-notification health checks remain enabled in TVE instances.

Storage of text-segment data – We now group text-segment data by page, improving scalability and system performance. 

Executing system tasks in memory – System tasks are now executed in memory rather than in the database, helping to reduce overall database load.

Moving large JSON files and flow backups to the file store – To increase database efficiency, we now store the following data in the file store rather than the database:

  • large JSON files from completed tasks

  • large JSON files from short-lived tasks

  • backups of completed flows

  • data generated during the automatic classification of Structured documents

Maximum age of database connections – We've increased the default maximum age for database connections, reducing both the need to create new connections and the overall load on the database.

31.0.5 (25 Oct 2021)

Models

Fixed

Compatibility of Field Locator models across v31 versions In v31.0.0-v31.0.4, the application can only use Field Locator models trained in v31.0.0-v31.0.4. For example, if you are running v31.0.0 or v31.0.4 of the application, you can use Field Locator models trained in v31.0.3, but not models trained in v31.0.5 or later. With the update included in v31.0.5 and later, a Field Locator model trained in any version of v31 will be compatible with the application.

Security

Fixed

Redirecting local users from the login URL – We’ve fixed an issue that allowed logged-in users to be redirected to external websites from the login URL when using local authentication.

31.0.4 (8 Oct 2021)

Input Connections 

Fixed

Processing times and high-volume sources – We've fixed an issue that caused processing delays for submissions ingested from high-volume sources (e.g., the folder scanned by a Folder Listener connection). 

Infrastructure

Updated

Database polling with a single thread – We've updated our database-polling function to use a single thread, which reduces unnecessary operations and improves responsiveness in high-volume instances.

Databases

Fixed

Recovering from failed connections – We've fixed an issue that prevented the system from reconnecting to the database after connectivity errors or idle periods. This issue resulted in processing failures after lost connections.  

MSSQL queries and system performance – In previous v31 versions, execution plans in MSSQL databases generated inefficient queries, causing spikes in CPU usage. These spikes resulted in halted submissions, processing delays, and reduced application responsiveness. A fix for this issue is included in v31.0.4.

Submission Retrieval Stores

Updated

Support for AWS S3 ".env" file variable – We've restored support for the SUBMISSION_RETRIEVAL_STORE_S3_ENDPOINT_URL variable in ".env" files.

31.0.3 (6 Oct 2021)

Table Identification Models

Fixed

Automation levels for high target accuracies – We've fixed an issue that caused projected and actual automation levels to be lower than what table models were capable of delivering. Also, automation rates could sometimes drop drastically for high target accuracies. After upgrading to v31.0.3, new trainings will update the Projected Automation vs. Target Accuracy graph. If you are not retraining a model, the model's performance will still improve, but the graph will not be updated.

Table Identification Tasks

Updated

Table Identification introduction text – We’ve updated the text on the Table Identification introduction page to present the best practices for using the Template tool. 

Fixed

Negative row indices – We've fixed an issue that caused a row to have a negative index if one of its cells was moved outside of the row's boundaries during Table ID Supervision. This negative index prevented the table's QA data from being saved in the database. 

Reporting

Updated

Table Identification data in Performance Distribution report – We've added a Table Identification filter to the Performance Distribution report (Reporting > User Performance).  

Message Queue Listener

Fixed

Concurrency of Message Queue (MQ) Listener tasks – We've made changes to how the system completes MQ Listener tasks, including the elimination of intermediate data structures and delays in message processing. These improvements also prevent issues related to updating and selecting entries in the databases.

Notifications

Fixed

Behavior of has_layout_tag Routing Filter functions – We've fixed an issue that caused the system to read layout tags incorrectly, which affected the behavior of has_layout_tag functions in Routing Filters.

Databases

Fixed

MSSQL and system responsiveness – We've fixed an issue that caused some queries to perform an indexed scan when only an indexed seek was needed. The issue resulted in system slowness in instances with MSSQL databases. 

31.0.2 (15 Sep 2021)

Field ID Supervision

Fixed

Retrieval of fields to be identified – We've refactored how the system retrieves the list of fields to be identified in a document, improving efficiency by reducing the amount of irrelevant data retrieved.

Table Identification

Fixed

Generating Supervision tasks for unidentified tables – We've fixed an issue that prevented the system from generating Table ID Supervision tasks for tables that it could not identify any columns for. Because no table columns were identified by the system or by keyers, the system generated Table ID QA tasks for empty tables, resulting in errors.

Table Identification Supervision

Fixed

Identifying cells below its row's boundary – We've resolved an issue that caused submissions to halt when keyers identified cells below the boundaries of their rows.

Table Transcription Supervision

Fixed

Shuffling of table columns – We've fixed an issue that caused table columns to appear out of order on the Document Output page after Transcription Supervision.

Bounding boxes for deleted rows – We've fixed an issue that caused row bounding boxes to remain visible after being deleted during Table Transcription Supervision.

Submission CSVs

Fixed

Field Identification Source in Table-level Data – We've resolved an issue that caused a field's normalized transcription to appear in the Field Identification Source column of Table-level Data CSVs.

Reporting

Updated

All Users Performance Summary report – We've made the following updates to the All Users Performance Summary report (Reporting > User Performance):

  • Added a filter for Field Identification and Table Identification data

  • Added the following columns to the Table Identification view:

    • Table ID Sampled Accuracy

    • Table ID Sample Size

    • Table ID Supervision

    • Table ID QA

  • Deleted the Table ID Entries column

Model Training

Updated

Continuous model training and target accuracies – When automatically deploying new models, the system will now use the greatest of the flow-specific target accuracies from all live flows. Formerly, it used the instance-level target accuracy configured in v30 or earlier.

Model Deployment

Fixed

Closing the Deploy Model dialog box – We've fixed an issue that caused the Model Deployed dialog box to appear when the Deploy Model dialog box was closed without deploying.

Trainer 

Fixed

Connection to application – We've made improvements to the trainer-application connection to prevent the trainer from losing contact with the application indefinitely. If the connection is lost, the trainer will timeout the connection and automatically attempt to reconnect.

Folder Listener

Fixed

Metadata and file deletion – We've fixed an issue that prevented files in invalid subfolders from being deleted if metadata was enabled. In these situations, the system would delete the metadata file, but not the image of the document.

31.0.1 (3 Sep 2021)

Flows

Fixed

Parsing of field values – We've improved how the system reads string fields and detects JSON-like field values in flow settings.

Releases

Fixed

Layout variations with disabled fields – We've fixed an issue that caused an error to occur when creating a release that contained layout variations with disabled fields.

Case Collation

Fixed

"Add to an existing case" dropdown list in IE 11 – Previously, when manually uploading a submission using Internet Explorer 11, clicking the Add to an existing case dropdown list did not show the list of available cases. A fix for this issue is included in v31.0.1+.

Field Identification

Fixed

F4 keyboard shortcut – We've fixed an issue that prevented the F4 keyboard shortcut from changing the label location during Field Identification Supervision, QA, and model-validation tasks (MVTs).

Table Identification

Fixed

Zooming out and in after row predictions are displayed – We've fixed an issue that caused an error to occur after zooming out and then zooming in on a table whose row locations had been predicted by the system.

Field Normalization

Fixed

Normalization passes when it should fail – In submissions that contained fields with custom data type patterns, an issue caused normalization to be successful when manually transcribed values broke the pattern. However, normalization should have failed in these situations. A fix for this issue is included in v31.0.1+.

Submission Output

Fixed

Exceptions for fields that are not present – We've fixed an issue that caused Illegible Field exceptions to be added to fields not found by locator models.

Attribution of manual identified and transcribed fields – We’ve resolved an issue that caused the system to mark fields that were both identified and transcribed manually as being machine identified. 

Document Output

Fixed

Table output on the Document Output page – Previously, when a document had Field ID Supervision tasks and a table automatically identified by a Table ID model, the table would not be visible on the Document Output page. A fix for this issue is included in v31.0.1+, and the system now shows the table output.

Installations

Updated

PostgreSQL 12.7 Docker image – Our installations now include PostgreSQL 12.7 Docker images.

Because previous installations included earlier versions of PostgreSQL Docker images, upgrading TVE instances with v28 or earlier to v30 or later requires a database migration. This migration is not required in production instances.

TVE Environments

Updated

Availability of SQL Explorer – We've reintroduced the SQL Explorer beta feature for TVE environments only.

Input Connections

Updated

Testing connections before saving settings – You can now test connections before saving their settings. Previously, testing connections prior to saving resulted in failed tests.

Folder Listener

Fixed

Warm-Up Interval and future timestamps – We've changed the implementation of the Warm-Up Interval option to prevent the deletion of files with future timestamps. Instead of deleting these files, the system will add a message to the log explaining that the file will not be processed until its timestamp matches the server clock's time.

RabbitMQ Input and Output Connections

Fixed

Default exchanges for RabbitMQ input and output connections – We've resolved an issue that caused the system to pass the default " exchange value as null.

Output Connections 

Updated

Routing Filter tooltip – We've updated the Routing Filter tooltip with a more detailed description of how the filter works.

Amazon SQS Output Connections

Updated

Group ID for FIFO Queues field for Amazon SQS – We've added a Group ID for FIFO Queues field to the Amazon SQS connection settings, which ensures that the system sends messages in the correct order.

Infrastructure

Updated

New flags for run.sh – We've added support for the following run.sh flags:

  • --restart — restarts the Docker container,

  • --stop — stops the Docker container, and 

  • --clean — recreates the Docker container upon stopping and restarting it.

File Storage

Fixed

S3 File Storage credentials and submission processing – Previously, if an instance had an S3 file store configured with missing or incorrect credentials, submissions containing files in the file store were still processed to completion. With the fix included in v31.0.1+, the system halts these submissions, as it cannot retrieve the files from the file store.

S3 Submission Retrieval Store

Updated

AWS credentials in flow settings – When setting up your S3 submission retrieval store, enter your AWS credentials in the S3 Submission Retrieval Store field in your "Document Processing (V31)" flow's settings. This field replaces the s3_downloader field in /admin/sdm/systemsecret/

Trainer

Updated

Effect of run.sh on the media directory – Running the run.sh command on the trainer machine now deletes the trainer's media directory. This update prevents the accumulation of unnecessary files in the trainer machine.

Security

New

Excluding X-Extract-Backend-Server HTTP headers from application responses – You can now exclude the X-Extract-Backend-Server HTTP header from URLs returned by the application. To do so, add the NGINX_DISABLE_BACKEND_SERVER_HEADER variable to your ".env" file and set its value to yes.

For more information, see Security.

31.0.0 (17 Aug 2021)

SaaS

New

SaaS implementations of Hyperscience – We're proud to announce that we now offer Hyperscience as a highly available SaaS solution. SaaS deployments of Hyperscience include the same platform as on-premise and private cloud deployments, giving you the benefits of Hyperscience without having to manage installations, upgrades, system resources, and overall environment setup.

In addition to managing the configuration of your SasS environments, we will offer the same level of support that you have come to expect from Hyperscience. You can also continue to rely on us for your Hyperscience onboarding and training needs.

Highlights of our SaaS offering include:

  • SOC 2 compliance

  • Data encryption while at rest and in transit

  • Industry-standard security controls, data-access protection, and data isolation

  • Automatic data backups

  • High Availability with redundancy of components across multiple availability zones

  • Okta authentication out of the box, with the option to integrate with SAML or OpenID Connect providers

  • A client library to integrate with our API and streamline API authentication

  • API account-management tools built into the platform

In v31, SaaS implementations will be hosted on AWS in the US East region. If your organization is able to use a service that hosts and processes data in the United States, you can take advantage of our SaaS offering. 

Flows

Updated

Changes to flow blocks – To expand the capabilities of flows—and increase your control over those capabilities—we're adding new blocks to our block library and updating some existing blocks.

We're introducing the following blocks in v31:

  • Database Blocks – enable you to connect your flows to your organization’s databases

  • Flexible Extraction Blocks – designed to work with Custom Code, Routing, and Manual Transcription Blocks to allow your keyers to validate or transcribe fields at specific points in your flows. These blocks can also be paired with API and Database Blocks for data lookups, as well as with Manual Identification Blocks to transcribe fields that cannot be manually identified.

  • Machine Collation Blocks – allow you to automatically collect related documents, files, and pages together under a single case ID

We've made updates to the following blocks:

  • Input Blocks and Output Blocks – business users can now add and remove connections, as well as manage client secrets in the Flow Studio

  • Manual Classification – Manual Classification Blocks can now create Classification Supervision tasks for unmatched Structured pages

  • Custom Code Blocks – software engineers at your organization can now modify code in existing Custom Code blocks by updating Python scripts

Note that you will still need Hyperscience's assistance to add new blocks to your flows.

More information about our new and updated blocks can be found in these release notes.

Flow-specific reporting – You can now filter reporting data by flow, whether it's shown on the Reporting page or retrieved  from our Reporting API endpoints.

More details about flow-specific metrics can be found in the Reporting section of these release notes. 

Input Blocks and Output Blocks

Updated

Modifying client secrets – Previously, client secrets in Input Blocks and Output Blocks could not be edited without support from Hyperscience. To give you more control over your flow's settings, users with “Edit Connections” permissions can now update client secrets in Input Block and Output Block settings. These users can also use the API endpoint dedicated to updating and creating secrets, which is described in the API section of these release notes. 

Adding and removing connections – To give you more control over your flows, we've updated the settings of Input Blocks and Output Blocks to allow you to add and remove connections. You also can enable or disable any connections you add as your business needs change.

Custom Code Blocks

Updated

Modifying Custom Code Blocks – In v31, we're giving our customers the ability to modify the code in Custom Code Blocks, which previously could only be done by Hyperscience employees. This expanded access to custom code allows your organization to update the logic for validations and data transformations as your business needs change. Because Custom Code Blocks contain Python code, we recommend giving custom coding permissions only to software engineers. 

While you can now edit existing Custom Code Blocks, you will still need Hyperscience's assistance to add new Custom Code Blocks to your flows. They can also support you in editing Custom Code Blocks, if needed. 

Note that modifying custom code in SaaS instances will still require assistance from Hyperscience.

Database Blocks

New

With the introduction of Database Blocks, you can connect your flow to your organization's MSSQL, PostgreSQL, or Oracle databases to validate data or add to a submission's output. You can also use data from Database Blocks in other blocks to increase the efficiency of your keyers and your overall Hyperscience implementation.

Potential use cases for Database Blocks include:

  • Adding data from your database to a submission's output to make downstream processing more streamlined

  • Using your database data to determine which downstream system a submission should be routed to

  • Automatically populating certain fields with data from your databases, as part of our data lookup and validation capabilities described in the Supervision section of these release notes

Classification

New

Manual Classification for Structured documents – Previously, pages in Structured documents that weren't matched to a layout were marked as "No Layout Found," and no further work could be done on those pages in Hyperscience. To reduce these kinds of "dead ends" in our platform, we've expanded the functionality of Manual Classification Blocks to allow keyers to classify unmatched Structured pages. 

The process of classifying Structured pages closely resembles that of classifying Semi-structured pages. In the Manual Classification task for a Structured submission, the keyer will match each unclassified page to a Structured layout. In v31, the system will create an individual document for each of these pages. 

After Manual Classification is completed for a Structured submission, you can send the resulting document to Flexible Extraction, where a keyer can transcribe its fields.

Supervision

New

Data Lookup and Validation with Flow Blocks – You can now use Custom Code, API, and Database Blocks to control the fields that are sent to Supervision and the logic that determines whether they should be sent.

Supported use cases in v31 include:

  • Sending a "Required" field—and only that field—to Field ID Supervision, and then marking the document as "Not In Good Order" if it is missing. If the field is present, the rest of the fields are sent to Field ID Supervision.

  • Looking up a field's value in a database and sending it to Transcription Supervision if its value cannot be validated

  • Automatically populating specific fields, eliminating the need for keyers to transcribe them during Transcription Supervision

Note that the logic must be configured in a Custom Code Block and cannot be modified directly in the platform. Also, automatically populated fields will not be included in Quality Assurance tasks and will not be used in model training. 

Flexible Extraction Blocks and Supervision tasks – To increase the exception-handling capabilities of our solution, we're making Supervision more robust with the addition of Flexible Extraction. Flexible Extraction Blocks are designed to be used in tandem with Custom Code, Routing, and Manual Classification Blocks to allow keyers to transcribe data at multiple points in a flow. In Flexible Extraction Supervision tasks, keyers will be able to review a full document to transcribe its fields or verify the content of fields flagged for validation. 

In v31, Flexible Extraction supports the following use cases:

  • Supervision for Structured documents that the system couldn't match to layouts through Machine Classification. Flexible Extraction, coupled with v31's Machine Classification for Structured documents, allows you to extract data from these documents directly in Hyperscience for the first time.

  • Supervision for documents that require validation based on the logic in Custom Code Blocks

We've also added reporting metrics for Flexible Extraction tasks, as described in the Reporting section of these release notes.

Note that data from Flexible Extraction is not used in model training unless the data is sampled for Transcription QA.

Updated

Table ID Supervision user experience (Semi-structured only) – We've made the following changes to Table ID Supervision tasks to increase usability and keyer efficiency:

  • We've removed the Column Tool, meaning that all Table ID Supervision tasks will be completed with the more flexible Template Tool.

  • We've made the column tags transparent, allowing you to see the column headers underneath.

  • We've made it easier to adjust the heights of rows.

  • Cells are not bound by rows, enhancing our support of skewed forms.

  • If the machine is not very confident in its prediction for a particular cell, that cell will be highlighted in yellow during Table ID Supervision.

  • You can now select and edit multiple cells. When you select multiple cells, their row indices will appear in a bolded font.

  • While we do not support the automation of nested tables or multiple table schemas in a document, you can manually process these tables by combining Machine Identification Blocks with Flexible Extraction, Manual Transcription, and Custom Code Blocks. Contact your Hyperscience representative for more information.

Cases 

New

Cases support – Previously, the only way to connect related documents in Hyperscience was to include them in the same submission. With the introduction of the Case data model and the Machine Collation Block, you can group documents, pages, and files together under a case ID. Case IDs may be manually assigned or assigned automatically within a Custom Code block.

Machine Collation Block

In the "Document Processing (V31)" flow included in v31, the Machine Collation Block appears directly after the Input Block so that external case IDs specified during submission upload are added immediately upon submission creation.  

With Hyperscience's assistance, you can also place a Machine Collation Block anywhere in your flow where you want to assign external or Hyperscience-generated case IDs to documents. When assigning case IDs mid-flow, you will need to pair the Machine Collation Block with a Custom Code Block to specify the rules for adding documents to cases.

Case Details page

After you've created a case, you can view a summary of its contents by going to its Case Details page. This page lists the documents, unmatched pages, and fields contained in the case. From the Case Details page, you can remove documents and unmatched pages from the case, and you can add notes about the case for other knowledge workers to reference. 

The page also shows the status of each document. If the case has incomplete Supervision tasks, knowledge workers can click Perform tasks at the top of the page to complete the tasks. 

All Case Details pages can be found on the new Cases page in the Submissions section of the application.

API

We've added a Cases API endpoint to allow you to programatically access a summary of all data in Hyperscience associated with a particular case ID. We've also added case IDs to the Submission data model. To learn more about these changes, see the API section of these release notes. 

Machine Learning

New

Straight-through processing of tables (Semi-structured only) – We've introduced a trainable row model that enables the system to make cell-level location predictions. These predictions reduce the number of required Table ID Supervision tasks and make straight-through processing of tables possible for the first time. Note that automation will be lower for tables whose cells do not form a uniform grid. 

Table Model Validation Tasks (MVTs) and Quality Assurance (Semi-structured only) – We've expanded Table MVTs and Table Identification QA tasks to include all tables, regardless of whether their cells form a uniform grid. These changes allow keyers to create more ground-truth data for table models. 

While reporting for Table Identification QA is not available in the API in v31, data on automation, accuracy, and supervision volume can be found on the Reporting page of the platform.

Integrations

New

Bizagi – Mutual customers of Hyperscience and Bizagi, a leading business process modeling (BPM) solution, will be able to use our Bizagi connector to maximize their use of both platforms.

If your organization leverages Bizagi in its automation efforts, you can find the Hyperscience connector in the Bizagi Xchange and configure it in Bizagi. During that process, you map your Bizagi models to Hyperscience data models, and then you set up API connections to Hyperscience. These connections allow you to process Bizagi cases in Hyperscience and then send them back to Bizagi. If a document requires additional supervision after being processed through Hyperscience, Bizagi will generate a URL for the Supervision task and send it to the connector for handling.

As of August 17, 2021, the Bizagi connector is not yet available in the Bizagi Xchange. We will add a link to it here when it is released. 

Note that the Bizagi connector is not compatible with SaaS implementations of Hyperscience in v31.

Languages

New

Support for Italian submissions – We now support automation on Structured and Semi-structured documents written in Italian. The Italian language model allows our system to extract printed and handwritten data from Italian documents, and keyers can complete Transcription Supervision and Transcription QA tasks by entering Italian text.

User Experience

Updated

Accessibility improvements – To make our platform usable by as many people as possible, we have improved the user experience for users of assistive technologies.

Examples of our work in this area include:

  • Updating the following common components to be accessible: modals, search bars, radio buttons, checkboxes, and drop-downs lists

  • Adding ARIA attributes to many elements in our platform

  • Improving our use of header styles (e.g., <h2>, <h3>)

  • Adding context to page titles to assist users of screen readers

Search improvements – We've enhanced the search experience on the Submissions and Documents tables, as well as the Task Queue. In addition to IDs, you can now search by file name in these areas of the platform.

Security

Updated

Security enhancements – We've improved the security of our platform in the following ways:

  • Prevented system admins from being able to add jobs in /admin

  • Removed the ability to view API tokens in /admin

    • System Admins can see the API token for any user in the application. All other users can only view their own API token.

  • Ensured session cookies are invalidated upon the closing of the user's browser

  • Prevented the creation of potentially insecure CSV files

  • Ensured users cannot bypass authentication when sending direct requests to API root URLs (e.g., /api/v5/)

Fixed

Reset Authentication – Previously, if a user was created through LDAP or SAML, resetting that user's account via the “Reset Authentication” option could permanently revoke the user's access to the application. In v31 and later, the “Reset Authentication” option resets a user's API token and logs them out of their active sessions, but they can log in again through their authentication provider. 

Permissions

New

Trainer API User permission group – Because we have API endpoints that are meant to be accessed only by the trainer, we've added a Trainer API User permission group that restricts access to these endpoints. This group contains only the Trainer API Access permission. 

Upon upgrading to v31, all users with the API Access permission will be added to the Trainer API User permission group to avoid potential disruptions in access. To maximize the security benefits of the new group, we recommend removing all users from the Trainer API User group other than the trainer's user. 

New permissions – We've added the following permissions:

  • Edit Secrets

  • Import Flows

  • View Cases

  • Edit API Accounts

  • View API Accounts

  • Trainer API Access

  • Complete Flexible Extraction

Updated

Supervision and QA permissions – We've made the following changes to our Supervision and QA permissions:

  • We've renamed the following permissions to match their task names:

    • Complete Classification QA is now Complete Document Classification QA.

    • Complete Document Organization is now Complete Document Classification.

  • To give you more control over which QA tasks your keyers can complete, we've replaced the generic Complete QA permission with new Complete Identification QA and Complete Transcription QA permissions. 

  • Because Field ID Supervision and Table ID Supervision tasks are part of the same overall Identification task type, we've combined the Complete Field ID and Complete Table Field ID permissions into a single Complete Identification permission.

Reporting

New

Flow-specific metrics – The following metrics on the Reporting page can now be filtered for individual flows:

  • Automation

  • Field Output Accuracy

  • System Throughput

  • Sampled Errors

  • Time to Completion

  • Manual Working Time

  • Machine Working Time

  • Supervision Volume

Flexible Extraction metrics – We’ve added the following Flexible Extraction metrics to our reports:

Keyer Projection Report

  • KeyerPerformance.csv

    • Flexible Extraction Time Spent (Seconds)

    • Flexible Extraction Fields Extracted

    • Flexible Extraction Field Characters Keyed

    • Flexible Extraction Table Cells Extracted

    • Flexible Extraction Table Cell Characters Keyed 

  • HourlyReportingTaskOverview.csv

    • "Flexible Extraction" added to valid Task Type values 

  • HourlyReportingSubmissionOverview.csv

    • Users Performing Flexible Extraction

    • Time Spent in Flexible Extraction (Seconds)

    • Flexible Extractions in Starting Work Queue

    • Flexible Extractions Added to Work Queue

    • Flexible Extractions Completed

    • Flexible Extractions in Ending Work Queue

Manual Working Time Report

  • Flexible Extraction - Active

  • Flexible Extraction - Waiting

Servers

Updated

Support for RHEL 7.9 – We officially support the use of RHEL 7.9 in both Hyperscience v30 and v31.

Deprecated Features

SQL Explorer – We have removed support for the SQL Explorer beta feature.

API

New

Client Library – If you are deploying Hyperscience with our SaaS offering, you will need to use our client library to authenticate your API requests. The library will be available in C#, Java, and Python. To use the library, download it from our public repository when it becomes available, and follow the instructions provided in the library's readme files.

We will post a link to the repository here when the library is released.  

Cases endpoint – We've added a Cases endpoint that allows you to retrieve information about the documents, pages, and files included in a case.

Client Secret endpoint – We've created a Client Secrets endpoint that gives you the ability to manage client secrets for Input Blocks and Output Blocks via the API.  

Updated

Submissions endpoint – Submission JSON files now contain the case IDs associated with the Submissions’ documents, allowing you to send case information to your downstream systems. Note that the Cases array is included in the standard Submission JSON output, but not yet in the JSON retrieved from the Transformed Submission endpoint.

Known Issues and Limitations

See Known Issues and Limitations in V31 for a list of known areas for improvement in v31.0.0. We expect to resolve the issues in future versions of Hyperscience.