V27 Release Notes

27.2.0

Machine Learning

Updated

Performance of checkbox models – We’ve made enhancements to our checkbox transcriptions, helping to reduce errors and increase automation for these fields.

27.1.0

Task routing

New

Automatic Routing for Supervision and QA Tasks – This new feature increases efficiency during processing by automatically routing Supervision and Quality Assurance tasks to users within a certain permission group. Extractions from a ‘routing’ field, or fields, on a layout are used to determine the task restriction that should be applied to the submission during processing.

27.0.8

Security

Fixed

User authentication and API access – This version introduces the following security measures:

Single authentication method – The system will verify that only one authentication method is enabled. You can use our local, built-in user-management feature or one of our supported external authentication providers. Authentication tokens created through other methods will be invalidated.
- To ensure that you have only one method enabled—and prevent your users from being logged out upon upgrading—follow the steps outlined in Upgrade Considerations and Known Issues.
If an external authentication provider is enabled:
- If you add the TRAINER_USER variable to your “.env” file, you can still use a local user’s credentials to connect the trainer to the application. For more information, see Installing the Trainer.
- You can enable automatic token revalidation by adding the TOKEN_REVALIDATION_ENABLED variable to your instance’s “.env” file and setting it to true.
- If this variable is set to true, you may need to create a list of users who are exempt from token invalidation. To learn more, see External Authentication Methods and API Users.
- Authentication tokens for local users will be invalidated.
- Automatic token revalidation – If you enable this feature, the system will revalidate API tokens for users every 12 hours. You can also choose to exempt specific API-only users from token revalidation if those users need continuous access to the API without logging in via a browser.

27.0.7

Submission Processing

Fixed

Minor rotation and MSSQL databases - We've fixed an issue that caused jobs to halt when the “Minor rotation” Image Correction setting was enabled in instances with MSSQL databases.

API

Fixed

Permissions for Test Snippet API endpoint - In order to prevent unauthorized access to the container running the Hyperscience application, we’ve added permission checks for requests sent to api/snippets/test_snippets.

27.0.6

Layout alignment

Updated

Improvements for pages with repeated layout sections - This update improves field alignment in documents where a section of a layout repeats on a page.

Connectors

Fixed

Processing jobs for large documents - We fixed an issue that caused processing jobs for very large documents to halt if the documents were ingested by a connector.

27.0.5

Data logging

Updated

Limit on data saved from the Field ID model - In order to avoid a potential leakage of information, we've limited the amount of information from our Field ID model that is saved in log files.

27.0.4

Database configuration

Updated

MSSQL OpenSSL options - We added additional configurability to the OpenSSL cipher options for MSSQL server databases.

27.0.3

Database configuration

Updated

Oracle SSL encryption - We added support for SSL/TLS encryption. Please note that SSL/TLS authentication aka 2-way or mutual TLS is not supported.

27.0.2

Transcription Supervision

Updated

Keyboard shortcuts - We added the option to change the keyboard shortcuts for adding a new line and for navigating to a previous field during Transcription Supervision. This setting (called “Legacy Supervision shortcuts”) can be configured in the Beta Features area and will be disabled by default.

“New line” shortcut
- New shortcut - Shift + Enter
- Previous shortcut - Alt + \
“Previous field” shortcut
- New shortcut - Shift + Backspace
- Previous shortcut - Shift + Enter

API

New

Endpoint – We created an additional HTTP endpoint that enables users to download the registered and DPI-preserved page image.

27.0.1

Updated

Output connectors – Now you can configure output connectors to send a notification that a submission has been completed, without sending all of the associated extraction data.

API

Table data output in API v4 – Added support for table data output in API v4.
Field data type API output – Now the API will return the user-visible FDT name for all layout types as a standard behavior.

Fixed

Permissions – Fixed a bug where users were not able to complete Table ID tasks without the “View Submissions” permission.
Transcription automation training – Fixed a bug where the accuracy thresholds would be set to “N/A” after making changes to the period of records to use.

27.0.0

Legacy ingestion and output notifications options removed

In version 25, we introduced the Hyperscience connectors framework and released a number of connectors to replace the legacy image ingestion and output notifications options that were configured via the .env file. As of Hyperscience Release 27, the legacy ingestion and output notification options have been removed. Any system that still uses the legacy options should plan a migration to connectors along with their upgrade to Hyperscience Release 27.

Permissions

New

Complete Classification Model Validation - Allows user to view and complete Classification Model Validation tasks.

Removed

View QA Reports - Control access to QA records.

New

Table Identification Automation

Now you can train a model to automatically identify tables that follow a standard grid format. In the past, the table identification process was fully manual.

3-step workflow – Identify table column fields, split the fields into rows, and review the resultant grid before sending the table for Transcription.
Table ID model – Train a machine learning model to automatically identify the table column fields.
Model validation tasks – Boost model performance with a Supervision task that asks a human to correct errors.
Tables reporting – View an automation report that shows how many columns the machine predicted correctly, a throughput report showing how much table data is being processed, and a volume report showing how much data is passing through Supervision tasks.

Notes about this feature:

Table ID automation only applies to documents that contain one table and follow a standard grid format.
A standard grid format refers to tables where the data falls neatly within the boundaries of each cell, as defined by the rows and columns of the table, and the information contained has a 1:1 relationship with a corresponding column and row.
Removed the “Automation row detection” setting since this is default behavior in the updated workflow.
Table automation requires API v5 in order to view transcription output.

Connectors

Routing of Extracted Data – Admins can now specify which downstream system to send extracted data to using source tags that can be used to filter status change notifications to the desired output connector.
FileNet Output Connector – Admins can now define any number of connections to save extracted data and images to the FileNet content platform.
Input and Output Scripting (Beta) – Write a custom script that will execute at submission ingestion or just before downstream data output. An input/output script must be associated with a respective input/output connector and can be used to adjust submission parameters (e.g. change submission priority based on submission source).

Reporting – Keyer Projection Report

You are now able to download individual keyer performance as a CSV.

Support for ZIP files

Submissions containing ZIP files (documents or images) will be processed along with any other supported file type.

Updated

Enhanced Support for Custom Field Data Types

List CFDT improvements:

Mutable list CFDTs – When you edit a list CFDT, the changes will propagate across all relevant layouts (including the live layouts).
Increased maximum number list CFDT allowed – Relaxed restrictions from 1,500 to 200,000 items.
Audit log record – View the filename in the audit log whenever a user creates or updates a list CFDT by uploading a CSV file or via the UI.

Pattern CFDT improvements:

Normalization – Normalize pattern CFDTs using character stripping.

General improvements:

Faster editing – Perform a “copy-to-new” action so that you can edit a list or pattern CFDT without needing to create an entirely new one from scratch.
Search & filters – Find CFDTs using the filters and search bar within the Data Types tab.

Layout Editor Redesign

We redesigned the user interface for the Structured and Semi-structured Layout Editor and added new functionality to enhance the user experience for creating layouts.

Improvements to the Structured Layout Editor

UX enhancements:

Autosave drafts – Layout drafts will save automatically and display the last saved date and time in the top-right corner of the screen.
Redaction tool – Apply “white-out” on a newly-created layout to clean-up the image before defining fields for extraction.
Bounding boxes – Hover over a bounding box to display a tooltip with its data type.
Bulk edit bounding boxes – Change the size and location of bounding boxes, in addition to the existing bulk editing functionality of mass changing data types, Supervision, and field configuration options.
Automatic field cloning [Beta] – Draw a bounding box and the system will automatically detect and draw additional bounding boxes around fields with similar geometry.

UI enhancements:

Enhanced sidebar – View all fields (highlighted in blue) and all layout IDs (highlighted in green) across all pages, select an item in the sidebar to focus on the respective item on the layout image.
Search & filters – Within the sidebar, search for any data type or field name that has been defined in the layout and filter the results as desired.
Toolbar – Access all of the Layout Editor tools in the toolbar at the top of the screen.
Keyboard shortcuts – Click on the keyboard button in the toolbar to view the new keyboard shortcuts.

Improvements to the Semi-structured Layout Editor

UX improvements:

Simplified field definition – Define table column fields in the new “Tables” box and normal fields in the “Fields” box. This updated interface results in a more organized view of all of the defined fields.

UI improvements:

Field organization – Use the handle icon on the far right of each field to re-order them and the trash can icon to delete a field.

Task Restrictions

Previously, Document Restrictions allowed admins to limit which users could perform tasks based on the page's layout match.
Now, Task Restrictions (renamed from Document Restrictions) expands this functionality by enabling admins to define restrictions based on a submission's source (e.g. route the submission to Data Keyer Group 1 if the source is RabbitMQ input connector) or via the restriction API parameter by providing the name of the restriction at the time of submission creation.

Document Organization

Index multiple values – enter multiple values for a single metadata field.
Revisit previous steps – check your work or go back to a previous step to correct mistakes.

Submission Activity Log

Admin users can now view a log of all user activity related to a submission on the Submission Output Page.

Single Import/Export Interface

There is now a single interface to import/export user-configurable objects like settings, releases, and models.

Improvements:

One location for everything – Go to the Administration page to access the new Import/Export tab where you can manage Releases, Classification models, Field ID models, Table ID models, the Field Dictionary, and application settings.
Support for large downloads – Hyperscience will create a ZIP file when you download a large release and send a notification when the download is complete.
Export any release – Download a live release or any of the locked releases.

API Version 5

We are introducing a new version of the API.

Improvements:

Table Objects
- Table, Table Row, and Table Cell objects have been added to reflect our increased support for tables.
- Table Objects are now returned as a document_tables property of Documents and are described in details in the API docs.
- Multiple tables in a single Document will be returned as separate table object with a separate list of rows.
- Tables can be iterated in row-order through their rows property.
- Along with this change, the field object row_index property is now removed. Fields relating to tables should be queried through Table objects.
Submission Objects
- The submission parameter document has been renamed to file.
- The submission parameter single_image_per_document has been renamed to single_document_per_page and its usage has been clarified in the API docs.
- Submission creation now has an optional restriction parameter that can be specified multiple times. A restriction is the name of a Task Restriction that should be applied to the submission. This was introduced in v26.1.
- State change notifications now contain the same set of keys, no matter submission state.

Document Objects
- fields and derived_fields, previously properties of Pages, are now moved under Documents. They have also been renamed to document_fields and derived_document_fields respectively.

Field Objects
- page_id is added and can be used to reference the Page Object to which the field belongs.

To learn more, see the API documentation. All new changes are marked with the v27 tag.

Document Output Page Redesign

We have made it easier to understand document output data with a reimagined interface that is more intuitive and organized.

Improvements:

Selecting a field on the page image will now draw a line connecting to the respective item in the field list
Items in the field list have been logically rearranged to display page location and to enable easier filtering
Table data output is now displayed in a separate interface at the bottom of the page

Note that tables that were transcribed in version 26.0 of Hyperscience may output the table fields twice.

Model Management Redesign

We have made it easier to view model statuses and compatibility with future versions of Hyperscience. These changes affect the table of models shown on Library > Models and the Model Details page shown when selecting a model from the table.

Improvements:

Compare the projected automation for both current and candidate models (Field ID and Table ID)
Deploy candidate models if performance is superior to the current model
Assess model compatibility for future versions of the system

Search and Filters Redesign

We have made several usability enhancements to the search and filters functions for each of the tables across the Work Queue, Library (Layouts, Releases, Data Types, Field Dictionary, and Models tabs), and Submissions area (Submissions, Documents, No Layout Found tabs).

Improvements:

Filter interface is more comprehensive and easier to use
Easily accessible search bar at the top of each table
Faster ability to perform actions on multiple items within a table

Machine Learning Enhancements

Model training:

Supervision data in Field ID model training – Model training now includes Field ID Supervision data along with QA data.
CFDTs in Transcription automation training – Custom field data types are now included in transcription automation training.

Data types:

Separated currency – Support for currency formats where two digits after a decimal are mandated, yet the space or printed vertical line on the underlying template may not always be a reliable indicator of this separation. If no decimals are written, the last two digits are assumed to be the decimals.
- Separated Currency – X,XXX XX
- Separated Currency – X.XXX XX

Model performance:

Checkbox model performance – Improved accuracy on printed X’s and added support for X’s that fall outside the boundaries of a checkbox.
De-skew performance – Faster manual table extraction process.
Non-blank performance – Enhanced automation on names, addresses, and emails.

Fixed

Date Normalization - Fixed the bug where future dates were being normalized to the wrong century.
Input connectors – Fixed a bug for all input connectors where processing might cease upon reading an empty line.

Trainer

Fixed a bug that would cause the Trainer to fail when scheduling Classification jobs.
Fixed a bug that would cause the Trainer to fail when encountering an unexpected error format in a job.
Fixed a bug where training jobs for future versions of the system would be blocked by training jobs for the current version.

Reporting

Fixed a bug on the User Performance chart where data points with the same x-axis value but different y-axis value would display the same metrics.
Fixed a bug on the User Performance chart where the “Average Accuracy” column would appear empty when exporting the data in a CSV file.
Fixed a bug on the System Sampled Error chart where selecting a large date range would generate an error.