28.3.4
Security
Fixed
Removal of Apache Log4j 1 – This update removes Log4j 1 from our installations. In place of Log4j, the system now uses Java's Logging framework.
28.3.3
Submission Processing
Fixed
Halted jobs – We've resolved a database-connectivity issue that caused a large number jobs to be halted in previous versions of v28.
In order to implement this fix, you will need to add the FORMS_DB_CONN_MAX_AGE variable to your ".env" file after upgrading to v28.3.3 or later. This variable sets the maximum length of time that database connections can be kept and reused. For more information, see Database Overview.
Reporting
Updated
Threading and scheduling of data-aggregation tasks – We've improved how the system distributes and schedules data-aggregation tasks for reports. With these changes, report-creation background processes can now be scheduled outside of business hours, and report generation is now faster and more reliable.
Connectors
Updated
Deprecation of Connector scripting – We've removed the Connector scripting beta feature from the application.
Image and PII Data Deletion
Updated
Threading and scheduling of data-deletion tasks – We've improved how the system processes and synchronizes the deletion of images and PII data.
Database queries and image deletion – We've updated the queries involved in image deletion to use indices rather than scans, reducing deletion times.
Retroactive deletion of PII data – If you enable PII Data deletion in Settings, PII data from submission that pre-date the date specified in your PII Deletion policy is deleted. Previously, the PII policy in place at the submission's creation time determined when its PII data was deleted.
Note that the PII Data deletion setting is disabled by default.
Infrastructure
Updated
Retrying scheduled tasks – If a scheduled task crashes or fails, it will not be retried immediately. Instead, the system will retry it at its next scheduled run. By ensuring that tasks are retried as scheduled, this update prevents issues from occurring upon application restart.
Improved management of database sessions – We’ve improved database session management in instances with MSSQL databases. Specifically, the system will kill sessions that:
have been blocking other database sessions for a long time, or
have been idle for a long time.
Cron jobs polling – To increase performance and reduce dependency on Service Broker, we’ve updated the cron jobs channel to use polling instead of database-provided notification mechanisms (e.g., LISTEN/NOTIFY).
Health check for database-provided notification mechanisms – We've added a check for database-provided notification mechanisms (e.g., LISTEN/NOTIFY) to our system health check. This check also sends an alert when Service Broker is disabled in instances with MSSQL databases.
TVE Instances
Updated
Memory usage – We've introduced memory-management changes in TVE instances to reduce total memory usage.
Databases
Updated
Changes to database-notification health checks and polling – To improve scalability, we’ve removed the health check for the database-provided notification mechanism in production instances, and we're enabling polling for job queues and block-process manager channels. The database-notification health checks remain enabled in TVE instances.
Fixed
Database backups and PII data deletion – We've addressed an issue related to PII data deletion that caused database backups to be unnecessarily large.
Saving layout changes on MSSQL databases – We’ve fixed an issue for MSSQL databases that caused saving changes to existing layouts to result in system slowness.
28.3.2
Security
Updated
Viewing API tokens – We've removed the ability to view API tokens in /admin. System Admins can see the API token for any user in the application, and all other users can only view their own API token.
Django REST Framework (DRF) Browsable API – We've disabled DRF's Browsable API feature for all versions of our API.
Fixed
Creating users via API – We've fixed an issue that allowed users not in a permission group to create users via the API without an API token.
Permissions
Updated
Task Restrictions for Submissions page tasks – Previously, Task Restrictions were applied to tasks accessed in the Work Queue, but not to tasks on the Submissions page. With this release, you can apply Task Restrictions to all tasks, regardless of whether users access them from the Work Queue or the Submissions page. To do so, set the APPLY_TASK_RESTRICTIONS_TO_SUBMISSIONS_VIEW variable in your “.env” file to true.
Infrastructure
Fixed
Memory consumption and trainer API tasks – We've fixed an issue that caused the expiration of trainer API tasks to consume an excessive amount of memory.
System task scheduling – We've resolved a task-scheduling issue that prevented routine system tasks, such as reporting-data aggregation and model training, from running when expected.
Please note that the system will not immediately repopulate reporting data after upgrading to v28.3.2.
SAML
Fixed
Synchronization of SAML groups – We've resolved an issue that prevented SAML groups from being synchronized with Hyperscience permission groups in certain instances, causing users to have permissions from SAML groups they no longer belonged to.
Submission Processing
Fixed
Supervision tasks for submissions with Additional documents – Previously, if a submission contained an Additional and non-Additional document and both required supervision on at least one field, the system would sometimes generate Supervision tasks for only one of the documents. The system would also halt the submission after the completion of the document's Supervision tasks. A fix for these issues is included in v28.3.2 and later.
Releases
Fixed
Slowness when copying to a new release – We've resolved an issue that caused system slowness after Copy to New Release was clicked on a Release Details page.
28.3.1
Reporting
Fixed
Updates to reporting data – We've fixed an issue in v28.3.0 that prevented reporting tables from being updated, causing incorrect data to be shown on some Work Queue Overview cards.
28.3.0
Submission Processing
Fixed
Scalability of task processing and retrieval – Previously, customers with a high volume of submissions experienced delays in task processing. To resolve this issue, we've made our task-retrieval methods more efficient, and we've reduced the number of record updates required to process tasks.
Transcriptions
Fixed
Stability of transcription engine – We've resolved issues in the transcription engine that caused it to crash in certain rare circumstances.
28.2.4
Security
Updated
Migrating API users to a single authentication method – In v28.2.3, we introduced security measures that enforced the use of a single authentication method. As part of that update, we provided customers with a SQL script that would migrate their users to the authentication method enabled for their instance in order to prevent users from being logged out after upgrading. This hotfix automatically migrates API users with token-based authentication, eliminating the need for customers to run that SQL script to prevent these users from being logged out. Users accessing Hyperscience via a browser will be migrated when they log in for the first time after the upgrade.
Connectors
Fixed
Configuring Task Restrictions for input connections – We’ve fixed an issue that prevented users from selecting a user group from an input connection's "Task Restrictions" drop-down list.
Deselecting options in Submission Processing Settings – In previous versions, users could not save a connection after deselecting a boolean option, or checkbox, in its Submission Processing Settings. A fix for this issue is contained in v28.2.4.
Jobs
Fixed
Image-deletion jobs – We've fixed a task-sequencing issue that caused image-deletion jobs to halt, even though the images had already been deleted.
Classification
Fixed
Layout matching resulting in multiple documents – Previously, if pages in a submission were incorrectly matched to a layout's pages, the system would sometimes create two documents with the same layout when it should have created only one. In these cases, each of the resulting documents contained only a portion of the submission's pages. A fix for this issue is contained in v28.2.4+.
API
Fixed
Downloading Submission files – We've resolved an issue that prevented customers from downloading Submission files via the API.
28.2.3
Security
Fixed
User authentication and API access – This version introduces the following security measures:
Single authentication method – The system will verify that only one authentication method is enabled. You can use our local, built-in user-management feature or one of our supported external authentication providers. Authentication tokens created through other methods will be invalidated.
To ensure that you have only one method enabled—and prevent your users from being logged out upon upgrading—follow the steps outlined in Upgrade Considerations and Known Issues.
If an external authentication provider is enabled:
If you add the TRAINER_USER variable to your “.env” file, you can still use a local user’s credentials to connect the trainer to the application. For more information, see Trainer Installation.
You can enable automatic token invalidation by adding the TOKEN_REVALIDATION_ENABLED variable to your instance’s “.env” file and setting it to true.
If this variable is set to true, you may need to create a list of users who are exempt from token invalidation. To learn more, see External Authentication Methods and API Users.
Authentication tokens for local users will be invalidated.
Automatic token invalidation – If you enable this feature, the system will invalidate API tokens for users every 12 hours. You can also choose to exempt specific API-only users from token invalidation if those users need continuous access to the API without logging in via a browser.
Trainer
Fixed
Trainer container space – We’ve corrected an issue that caused core dumps in a trainer's Docker container to fill the container space, resulting in failed training jobs.
Transcriptions
Fixed
Transcription of layout identifiers – We've resolved an issue that caused some layout identifiers to be transcribed with incorrect ML configurations, resulting in blank transcriptions.
Submission Processing
Fixed
MSSQL and resources for large submissions – We've fixed an issue in instances with MSSQL databases that caused submissions to halt If they contained a large number of documents.
Releases
Fixed
Releases and Oracle databases – We've corrected an issue that prevented releases with Structured and Semi-structured layouts from being created in instances with Oracle databases.
Upgrades
Fixed
Upgrading from v28.0.x – We've resolved an issue that prevented upgrades from v28.0.x to v28.2.x in instances with scheduled trainer tasks for recalibration, finetuning, or auto-thresholding. To benefit from this fix, all customers upgrading from v28.0.x to v28.2.x should upgrade to v28.2.3 or later.
28.2.2
Layouts
Fixed
Maintaining field dictionary links in new instances – In v28.2.0 and v28.2.1, there was no automated method to maintain links between fields and field dictionary entries when importing layouts and dictionaries to a new instance. This version allows you to run a script on the new instance to link fields to matching dictionary entries. You can run the script on any VM running the Hyperscience application.
Layout Management
Fixed
Copying to a new release – Previously, the "Copy to New Release" functionally did not work correctly in some cases. The issues caused some new releases to have inconsistent data, preventing them from being used or edited. Fixes for these issues are contained in v28.2.2.
Configuration
Fixed
Compatibility with Docker cgroup drivers – Hyperscience v28.2.1 and later use Docker's --cgroup-parent container option. If Docker is configured to use systemd as the cgroup driver, some Hyperscience application containers may fail to start. This version contains a fix for the issue.
28.2.1
Supervision
Fixed
Processing of submissions with blank pages – We've resolved the following issues related to submissions with blank pages:
An issue that prevented users from performing Field Identification tasks for documents with blank pages
An issue that caused predicted Table Identification bounding boxes to have undefined coordinates
Jobs with rejected documents – We've fixed an issue that caused jobs to halt if they contained documents that were rejected during Transcription Supervision.
Submission Processing
Fixed
Tracking of submission pages – We've enabled the tracking of submission pages in bulk, which increases system performance for submissions with a large number of pages.
Submission Imaging
Fixed
High-resolution pages in multipage submissions – We've resolved an issue that caused a submission’s images to become blurred if its PDF contained both high-resolution pages (e.g., with photos) and lower-resolution pages. With this fix, the system determines the correct resolution for each individual page rather than the submission as a whole.
Machine Learning
Fixed
Training with documents containing blank pages – We've resolved an issue that caused training to fail with some multipage documents that contained at least one blank page.
Classification training with incomplete submissions – We've fixed an issue that caused jobs to halt if a page used for Classification training had completed processing, but other documents in its submission were still in progress.
Reporting
Fixed
Automation Rate calculation – We've resolved an issue that resulted in the Automation Rate metric in Identification Reports to be greater than 100% in some instances.
Upgrading
Fixed
Automation rates after recalibration – Models initially trained with v28.2.0 of the application underperformed due to an issue in the application's recalibration component. This issue caused a drop in automation after the models were used in a new instance of the application. We've fixed the issue in this version, and we encourage all customers using affected versions to upgrade to v28.2.1 before upgrading to v29.
Configuration
Updated
Maximum CPU usage – We've added support for the HS_MAX_CPU_PERC environment variable, which allows you to set Hyperscience's maximum CPU usage. The value represents a percentage of CPU resources and can be an integer less than 100.
Layout Management
Fixed
Bounding boxes and beta features – Prior to this release, if the “Automatic field cloning” and “Bounding box one-click mode” beta features were enabled, drawing a bounding box could result in an error. Specifically, the error would occur after the field was created, made inactive, and then made active. We've included a fix for this issue in v28.2.1.
"Commit Changes" button in the Layout Editor – We've resolved an issue that caused the “Commit Changes” button to be disabled after a field was added without a name, made inactive, and then deleted.
Bounding boxes with “Bounding box one-click mode” enabled – We've fixed an issue that prevented a one-click bounding box from being added after a field was added, made inactive, and then made active.
Linked Field Dictionary fields in restored versions – We've resolved an issue that caused fields in restored versions to remain linked to their Field Dictionary entries after their properties no longer matched.
Loading time of Field Dictionary Usage data – We've fixed an issue that caused Usage information for a Field Dictionary entry to be slow to load in accounts with many layouts.
Adding versions and clearing comments – We’ve resolved an issue that caused the Comments field for a new version to be pre-populated with comments from the last-committed version.
Field customization information for imported releases – We've added information about applied field customizations to the Release details pages for imported releases.
"Currently Editing" card and imported versions – We've fixed an issue that caused a layout's "Currently Editing" card to remain active after a different version of the layout was imported.
28.2.0
Layouts
New
Layout Management – Layout creation and management determines how quickly you can scale your implementation of Hyperscience. If your organization has many iterations of the same form that are created frequently, the ability to easily manage and create layouts becomes increasingly important. Over time, your layout library can easily contain thousands of layouts, especially if you are implementing Hyperscience to process documents for multiple lines of business. Issues like competing changes when editing layouts, difficulty tracking changes made to layouts, and creating a new layout version whenever edits need to be made further compound the cost of ownership of layouts.
By making it easier to create layouts for similar-looking forms and organizing them more intuitively, the new Layout Management system in Hyperscience greatly reduces the cost of ownership associated with managing layouts, allowing customers with different operational models and reasons for complexity to all scale effectively.
To make these improvements possible, Layout Management introduces the following key features:
Layout variations – You can add variations to a Structured layout, eliminating the need to create a new layout and draw bounding boxes for forms that are very similar to ones you've already created layouts for.
Grouping of layout variations – The system will automatically group all variations of a Structured layout together, making it easier to find and organize your layouts on the Layouts page.
Shared field list – All fields created across a Structured layout's variations will be added to the layout's shared field list. When creating new variations, this list will be available to you in the Layout Editor.
Field customizations – You can customize certain field settings, like Data Type and Supervision, for individual releases by creating field customizations.
Easier editing – With the introduction of "working versions," you no longer need to create a draft for a Structured or Semi-structured layout you would like to edit. This change helps to streamline the layout-editing process and simplify versioning.
Visibility into open changes – We've introduced change-management features to help you understand who has made changes to your layout and how those changes affect layout variations and field customizations.
Updated
Document Type filter for layouts — We’ve added a Document Type filter to the available filters in the Layout Library, allowing you to filter for Structured, Semi-structured, or Additional layouts.
28.1.2
Machine Learning
Fixed
Classification models after upgrade – We've resolved an issue that prevented Classification models trained on a v28 trainer from recognizing documents after the application was upgraded to v28.
Submission Processing
Fixed
Minor rotation and MSSQL databases – We've fixed an issue that caused jobs to halt when the “Minor rotation” Image Correction setting was enabled in instances with MSSQL databases.
Fallback Layout UUID and Structured documents – We've resolved an issue that caused submissions to halt if they had a fallback_layout_uuid specified and were automatically matched to a Structured layout.
API
Fixed
Permissions for Test Snippet endpoint – In order to prevent unauthorized access to the container running the Hyperscience application, we’ve added permission checks for requests sent to api/snippets/test_snippets.
28.1.1
Layout Editor
Fixed
Checkbox alignment – We've fixed the alignment of the Layout Editor's Duplicate checkbox when viewed in Internet Explorer 11.
28.1.0
Layout Editor
New
"Duplicate" field option in Structured layouts – With this release, you can now designate fields as Duplicates if you expect them to appear on multiple pages of a Structured document. If a field has been marked as a Duplicate, the system will extract only the first instance of that field in the document and ignore subsequent instances. Because it results in a single extraction, enabling this feature prevents the creation of repetitive Transcription tasks.
After the document has been processed, the "Duplicate" field label will be shown on the Document Output page, and you will be able to find Duplicate fields by using the filter in the right-hand sidebar. The "Duplicate" label will also appear in JSON outputs, which will only contain one instance of each Duplicate field.
Currently, the Duplicate option is available only in single-page Structured layouts. When one of these layouts matches to multiple pages in a submission, the system will extract the field instance that appears on the first submission page that matches to the layout.
28.0.11
Security
Updated
Migrating API users to a single authentication method – In v28.0.10, we introduced security measures that enforced the use of a single authentication method. As part of that update, we provided customers with a SQL script that would migrate their users to the authentication method enabled for their instance in order to prevent users from being logged out after upgrading. This hotfix automatically migrates API users with token-based authentication, eliminating the need for customers to run that SQL script to prevent these users from being logged out. Users accessing Hyperscience via a browser will be migrated when they log in for the first time after the upgrade.
Connectors
Fixed
Configuring Task Restrictions for input connections – We’ve fixed an issue that prevented users from selecting a user group from an input connection's "Task Restrictions" drop-down list.
Deselecting options in Submission Processing Settings – In previous versions, users could not save a connection after deselecting a boolean option, or checkbox, in its Submission Processing Settings. A fix for this issue is contained in v28.0.11+.
Jobs
Fixed
Image-deletion jobs – We've fixed a task-sequencing issue that caused image-deletion jobs to halt, even though the images had already been deleted.
28.0.10
Security
Fixed
User authentication and API access – This version introduces the following security measures:
Single authentication method – The system will verify that only one authentication method is enabled. You can use our local, built-in user-management feature or one of our supported external authentication providers. Authentication tokens created through other methods will be invalidated.
To ensure that you have only one method enabled—and prevent your users from being logged out upon upgrading—follow the steps outlined in Upgrade Considerations and Known Issues.
If an external authentication provider is enabled:
If you add the TRAINER_USER variable to your “.env” file, you can still use a local user’s credentials to connect the trainer to the application. For more information, see Trainer Installation.
You can enable automatic token invalidation by adding the TOKEN_REVALIDATION_ENABLED variable to your instance’s “.env” file and setting it to true.
If this variable is set to true, you may need to create a list of users who are exempt from token invalidation. To learn more, see External Authentication Methods and API Users.
Authentication tokens for local users will be invalidated.
Automatic token invalidation – If you enable this feature, the system will invalidate API tokens for users every 12 hours. You can also choose to exempt specific API-only users from token invalidation if those users need continuous access to the API without logging in via a browser.
Trainer
Fixed
Trainer container space – We’ve corrected an issue that caused core dumps in a trainer's Docker container to fill the container space, resulting in failed training jobs.
Transcriptions
Fixed
Transcription of layout identifiers – We've resolved an issue that caused some layout identifiers to be transcribed with incorrect ML configurations, resulting in blank transcriptions.
Submission Processing
Fixed
MSSQL and resources for large submissions – We've fixed an issue in instances with MSSQL databases that caused submissions to halt If they contained a large number of documents.
28.0.9
Machine Learning
Fixed
Predicted bounding boxes from word-level models – We've fixed an issue that caused predicted bounding boxes generated from word-level models to omit portions of their text's final letters.
Configuration
Fixed
Compatibility with Docker cgroup drivers – Hyperscience v28.0.8 and later use Docker's --cgroup-parent container option. If Docker is configured to use systemd as the cgroup driver, some Hyperscience application containers may fail to start. This version contains a fix for the issue.
Layouts
Fixed
Maintaining field dictionary links in new instances – In previous versions of Hyperscience, there was no automated method to maintain links between fields and field dictionary entries when importing layouts and dictionaries to a new instance. This version allows you to run a script on the new instance to link fields to matching dictionary entries. You can run the script on any VM running the Hyperscience application.
28.0.8
Supervision
Fixed
Processing of submissions with blank pages – We've resolved the following issues related to submissions with blank pages:
An issue that prevented users from performing Field Identification tasks for documents with blank pages
An issue that caused predicted Table Identification bounding boxes to have undefined coordinates
Jobs with rejected documents – We've fixed an issue that caused jobs to halt if they contained documents that were rejected during Transcription Supervision.
Submission Processing
Fixed
Tracking of submission pages – We've enabled the tracking of submission pages in bulk, which increases system performance for submissions with a large number of pages.
Submission Imaging
Fixed
High-resolution pages in multipage submissions – We've resolved an issue that caused a submission’s images to become blurred if its PDF contained both high-resolution pages (e.g., with photos) and lower-resolution pages. With this fix, the system determines the correct resolution for each individual page rather than the submission as a whole.
Machine Learning
Fixed
Training with documents containing blank pages – We've resolved an issue that caused training to fail with some multipage documents that contained at least one blank page.
Classification training with incomplete submissions – We've fixed an issue that caused jobs to halt if a page used for Classification training had completed processing, but other documents in its submission were still in progress.
Reporting
Fixed
Automation Rate calculation – We've resolved an issue that resulted in the Automation Rate metric in Identification Reports to be greater than 100% in some instances.
Upgrading
Fixed
Automation rates after recalibration – Models initially trained with v28.0.0-28.0.7 of the application underperformed due to an issue in the application's recalibration component. This issue caused a drop in automation after the models were used in a new instance of the application. We've fixed the issue in v28.0.8, and we encourage all customers using affected versions to upgrade to v28.0.8 before upgrading to v29.
Configuration
Updated
Maximum CPU usage – We've added support for the HS_MAX_CPU_PERC environment variable, which allows you to set Hyperscience's maximum CPU usage. The value represents a percentage of CPU resources and can be an integer less than 100.
28.0.7
Submission Processing
Fixed
Fallback Layout UUID and Structured documents – We've resolved an issue that caused submissions to halt if they had a fallback_layout_uuid specified and were automatically matched to a Structured layout.
Machine Learning
Fixed
Character-recognition improvements – We've improved our character-recognition capabilities in the following ways:
We've fixed an issue that caused Field ID bounding boxes to omit the beginning of initial characters, which affected transcription quality.
We've changed how we determine character borders to ensure that the edges of characters are included.
We've improved our ability to recognize text on gray backgrounds.
API
Fixed
Permissions for Test Snippet endpoint – In order to prevent unauthorized access to the container running the Hyperscience application, we’ve added permission checks for requests sent to api/snippets/test_snippets.
28.0.6
Machine Learning
Fixed
Classification models after upgrade – We've resolved an issue that prevented Classification models trained on a v28 trainer from recognizing documents after the application was upgraded to v28.
Submission Processing
Fixed
Minor rotation and MSSQL databases – We've fixed an issue that caused jobs to halt when the “Minor rotation” Image Correction setting was enabled in instances with MSSQL databases.
Authentication
Fixed
Microsoft Azure user creation and SAML authentication – We've resolved the following issues related to our Azure integration:
An issue that prevented SAML authentication of users created through Azure Active Directory
An issue that caused the usernames and names of Azure users to be stored as strings of random characters in the Hyperscience application
28.0.5
Supervision
Fixed
Submissions locked in Supervision state – We've resolved a task-sequencing issue that prevented Submissions from moving out of the Supervision state. Specifically, entries for certain documents in the Work Queue did not have a “Complete Task” link, locking the user out of the Supervision process.
Machine Learning
Fixed
Model performance affected by sample rates – We've fixed an issue that caused a reduction in model performance when the Field ID QA sample rate was set to 0 and the Transcription QA Sample Rate was set to a value greater than 0.
Trainer failure with Field Locator 1.0 – We've resolved an issue that caused the trainer to fail when training a Field Locator 1.0 model. In those instances, the trainer would fail if certain types of content were not present in a document.
28.0.4
API
Updated
New Submission Creation parameter – We've added the source_routing_tag parameter to our Submission Creation endpoint. When used with the has_source_routing_tag function, this parameter allows you to use source tags in output connections' notification filters when creating Submissions through our API.
Submission Processing
Updated
New functions available in notification filters – We've made the following functions available when creating notification filters for connectors' output connections:
has_metadata_dict_key() – Returns true if the submission's metadata is a dictionary and contains the given key.
has_metadata_dict_value(,) – Returns true if the submission's metadata is a dictionary and contains the given key-value pair.
Releases
Fixed
Downloading releases – We've fixed an issue that prevented users from downloading a release from the release’s page in the application. If you are hosting your file store on Amazon S3, the fix requires you to add the FILE_STORE_S3_REGION variable to your application's ".env" file. The value of this variable must indicate the Region where your file store's S3 bucket is located (e.g., FILE_STORE_S3_REGION=us-east-1). See Amazon's documentation for a list of valid Regions.
Layout Editor
Fixed
Locking layouts with empty tables – We've resolved an issue that allowed users to lock a layout version that had tables with no columns.
Delete icons for Tables – We've restored the Delete icons in the Tables list for Semi-structured layouts.
28.0.3
API
New
Endpoints to generate Submission and Reporting data CSVs – We've added endpoints that make it possible to automate the generation submission-related CSVs, along with many of the reports available on the Reporting page of our application. More information can be found in our API documentation.
Machine Learning
Updated
Data type for American English characters – We've added a new default data type, Freeform Characters (American English), which contains only ASCII printable characters and does not include a language model. Its ML configuration, freeform_nolm_restricted_to_ascii, can be used to create custom data types.
Fixed
Model management after hotfixes – We fixed an issue that prevented candidate locator models from older versions from being shown in the application after a hotfix. This issue also made it possible to create a new candidate model before the candidate from the previous version was deployed or dismissed.
Reporting
Updated
Reduced memory usage – We reduced the amount of memory required to generate Supervision Transcription data for Usage Reports.
Fixed
Accuracy of "Manual Working Time" and "Machine Working Time" – We fixed an issue that caused the times in "Manual Working Time" and "Machine Working Time" to be less than the times recorded by our system. This fix affects both displayed and downloadable data, and it retroactively corrects historical data for these reports.
Configuration
Updated
Automated resource-allocation variables based on instance hardware – We now use the number of cores available in your system to automatically calculate the values of resource-allocation variables. You can override the calculated values, if needed.
Integrations
Fixed
Reduced loading time of "Connectors" tab –We resolved an issue that extended the loading time of the "Connectors" tab on the Administration page.
28.0.2
Submissions
Fixed
Failed submissions with Additional documents – We fixed an issue that caused submissions to fail if they contained at least one document manually matched to an Additional layout with a Manual Extract workflow and any non-Additional documents.
Incorrect Transcription Automation percentage – We resolved an issue that caused the "Machine Transcribed" percentage on the Document Output page to be incorrect for submissions containing a Signature field.
Supervision
Fixed
Clearing of field contents when keying "Ö" – We resolved an issue that caused the contents of a field to be cleared when the "Ö" character was keyed during Supervision.
Row boundaries and bounding boxes in the Template Tool – We fixed some UI issues that occurred when editing a table’s rows with the Template Tool, including the deletion of bounding boxes and the unintentional movement of row boundaries.
Transcription
Fixed
Keyboard shortcuts for signatures – We resolved an issue that prevented keyers' "1" or "0" entries for signature transcriptions from being shown in the UI.
Reporting
Updated
Reorganized columns – We moved the Usage Report columns we introduced in v28.0.0 to the end of the report.
28.0.1
Application Installation
Fixed
Database access – We fixed an issue that prevented database access when using a Postgres schema other than public.
Permissions
Fixed
Work Queue loading times – We resolved an issue that increased Work Queue loading times for users with only Supervision Transcription permissions. The issue only affected clients with PostgreSQL application databases.
Layout Editor
Fixed
Delete icons for fields – We restored the Delete icons in the Fields list for Semi-structured layouts.
Submissions
Fixed
Status icons for finished jobs – We restored the Status icons for finished jobs in the "Potential Layouts" tab on the Submissions page.
Trainer
Fixed
Recalibration training with previous versions – We fixed an issue that caused the first Transcription Automation Training to partially fail after attaching a v28 trainer to a previous version of the application.
28.0.0
Please note that if you are using Field ID or Table ID models running on Field Locator 1.0, you will need to manually configure the input limits in order to achieve optimum performance.
Layouts
Updated
Layout management
We made several improvements to layout management, specifically around drafts and versioning.
Last edited – See when a draft was last edited and by whom.
Version notes – Add notes to “draft” and “locked” layouts to specify any important changes that were made between versions.
Layout source indicator – In the Layout Editor, see the name of the source layout for layout drafts that were created using the “copy-to-draft” feature. Note that this source information only appears for “draft” layouts, not for “locked” layouts.
Archiving layout versions – Archive layout versions from the Layout Detail page.
Layout alignment
Improvements for pages with repeated layout sections – This update improves field placement in documents where a section of a layout repeated on a page.
Submissions Page
Updated
Table Data in Submission CSVs
We've added data about tables and table cells to our Submission CSVs.
New Table Cell-level Data CSV
The Table Cell-level Data CSV offers visibility into the transcription of individual table cells, which you can use to optimize transcription performance.
This CSV contains metadata about each cell in a table, along with the source and status of its transcription.
Updates to Page-level Data CSVs
To help you compare actual and expected automation rates, we've added columns to the Page-level Data CSV and updated the contents of one of its columns.
New columns:
Number of Table Rows
Number of Table Columns
Number of Table Cells
Number of Machine Only Table Cells Transcribed
Updated columns:
Number of Machine Only Fields Transcribed – This column now contains information about non-table fields only.
Updates to Submission-level Data CSVs
We’ve added the following columns to the Submission-level Data CSV. When used in conjunction with the CSVs listed above, this new data allows you to estimate the manual transcription work required by your team.
New columns:
Number of Tables
Number of Table Cells
Supervision
New
Automatic Routing for Supervision and QA Tasks
This new feature increases efficiency during processing by automatically routing Supervision and Quality Assurance tasks to users within a certain permission group. Extractions from a ‘routing’ field, or fields, on a layout are used to determine the task restriction that should be applied to the submission during processing.
Updated
Automatic Document Classification
With release 28, Automatic Document Classification has been moved out of beta and is now a production-ready feature in Hyperscience.
IMPORTANT: Reclassification Tasks Will Be Deleted When Upgrading to V28
In releases 26 and 27, before QA and Supervision was built for Automatic Document Classification, there were “Reclassification Tasks” generated by the classification model. These were previously used to generate new training data for the model.
Customers should complete these Reclassification Tasks and the Model Validation Tasks generated by them before upgrading to 28 if they want to generate training data to improve model performance. Otherwise, no new training data will be generated until submissions are processed in 28 and Classification QA is completed.
We have built several enhancements to Classification model management and expanded the ability to configure this feature to address your business’ goals.
Enhanced model management:
Release Detail page – Learn about how Classification models work, view model status, and see model performance stats in the new Automatic Document Classification card at the top of the page.
Model Detail page – Analyze model performance stats, define model accuracy requirements, upload training documents and run training, investigate historical model activity, and view model compatibility between releases and application versions.
Models tab – Now you can view Classification model information from the “Models” tab within the “Library” section. Filter by model types (Classification, Identification) and then filter by release name once you have selected “Classification models”, view the number of compatible releases, see model status, and check training status.
Page Sorting settings – We have added a new subsection in the “Page Sorting” settings to configure ADC: “Additional and Semi-Structured Documents”. From here you can enable/disable ADC, define application-wide target accuracy, define the QA sample rate, and define document grouping logic.
Enhanced reporting:
Document Classification QA tasks – Verify whether or not page classifications made by humans or the machine were correct. This data will be used to report accuracy and automation metrics.
Automation reporting – View automation metrics related to Classification: the automation percentage, number of machine classifications, and total number of pages that went through Classification.
Manual Accuracy vs. Machine Accuracy reporting – View accuracy metrics related to Classification: the machine accuracy percentage, the human accuracy percentage, and the total calculation points for each stat.
Work Queue Prioritization
We have improved our prioritization functionality by focusing on goal completion time for a submission. Previously, manual work was prioritized according to whether it was classified as “high, medium, low” at the layout-level.
Note that this feature replaces “Layout Priority” with “Submission Processing Deadline”.
Submission processing deadline – Define the goal completion time for a submission. You can define this parameter at a system, layout, and/or connection-level.
Default and custom processing deadline settings – Use the default settings (process within 24 hours of the time of submission) or define custom rules for prioritization.
View and modify deadline in Work Queue – See how much time is left to complete the submission according to the defined deadline and adjust it as necessary.
Reject Documents during Transcription Supervision
We have introduced a “Reject” button so that you can stop processing documents that are identified as not in good order. Note that this button will only appear during Transcription Supervision for Structured documents.
To reject a document:
Go to the right-hand panel during Transcription Supervision and click on the “Document Details” dropdown.
Click on the “Reject” button and select a reason for rejection.
The default rejection reason is: “000 - Not in good order”.
Additional information about this feature:
Rejected documents will be indicated on the respective Submission Output page. The API output will also reflect rejection information.
You can configure custom rejection reasons.
Updated Identification Supervision Tasks
Prior to this release, data keyers performing Identification Supervision tasks on Semi-structured documents needed to draw bounding boxes around fields themselves. The introduction of One-Click Bounding Boxes reduces the manual work required to draw bounding boxes during Identification Supervision tasks. These bounding boxes, combined with our new field predictions, can increase keyer speed and accuracy in Identification Supervision.
From the beginning of the task, we show the machine's predictions in the right-hand sidebar to help keyers focus on identifying the remaining fields. If we don't have a prediction for a field, the keyer can click on the field's content, which will create a bounding box around that text fragment. If necessary, the keyer can adjust the edges of the box until the box contains all of the field's text.
In the unlikely event that a prediction needs to be corrected or adjusted, the keyer can do one of the following:
If a bounding box doesn't contain all of the text to be transcribed (e.g., the top parts of some letters are outside of the box), the keyer can adjust the edges of the box until the box contains all of the field's text.
If the prediction was wrong, or if the keyer needs to combine the boxed content with neighboring text fragments, the keyer can override the prediction by pressing spacebar and creating a box manually, as they've done in prior releases.
To further reduce keyer fatigue, we've also updated the keyboard shortcuts used to navigate between fields and end the task. Most significantly, pressing Enter will submit all of the fields rather than advance the keyer to the next field.
Because the updates in this release bring major changes to the user experience, we recommend training your keyers on the new Supervision steps before upgrading to V28.
Note that One-Click Bounding Boxes can also be used with our new Template Tool in Table ID Supervision tasks, described below.
Template Tool in Table ID Supervision Tasks
In this release, we’ve expanded on the Table Identification Automation we introduced in Release 27 with our new Template Tool. With this tool, you can reduce the amount of time it takes to annotate tables that don’t follow a standard grid format.
To use the tool in Table ID Supervision tasks:
Keyers can select the “Template Tool” option above the task’s document. They can toggle between the Template Tool and the Column Tool, introduced in our last release, at any time.
Keyers then identify the columns in the table.
Next, in a row that contains all of the fields to be transcribed, the keyer uses the One-Click Bounding Boxes feature described above to identify the content of each field.
Once the machine has mapped those fields to other rows in the table, the keyer can make adjustments to bounding boxes, rows, or columns as needed.
Note that keyers can also use the Template Tool with tables that follow a standard grid format; when working with those tables, they should be encouraged to use the tool that will allow them to more accurately capture the data to be transcribed.
Keyboard shortcuts for Transcription Supervision
We added the option to change the keyboard shortcuts for adding a new line and for navigating to a previous field during Transcription Supervision. This setting (called “Legacy Supervision shortcuts”) can be configured in the Beta Features area and will be disabled by default.
“New line” shortcut
New shortcut – Shift + Enter
Previous shortcut – Alt + \
“Previous field” shortcut
New shortcut – Shift + Backspace
Previous shortcut – Shift + Enter
"Field Character Limit Reached" Field exception
If a transcription contains more than 2000 characters, we will truncate it to meet that limit, and we will indicate that we've done so through this exception.
Reporting
Updated
Enhanced Metrics for Tables
We have separated reporting data for documents with Tables, allowing for more granular metrics for documents that feature both Tables and (non-Table) Fields.
Updated Reporting section – The Manual Working Time chart has been updated to distinguish Table-related data.
Updated charts outside of the Reporting section – The Work Queue Overview page has been updated to include Table-specific data.
Hourly Reporting CSVs in Keyer Projection Report
To give you more insight into your data keyers' hourly performance—both on the submission and field level—we've added the following CSVs to the Keyer Projection Report:
HourlyReportingSubmissionsOverview – Gives data about submissions and Field ID, transcription, and QA tasks for each hour in the report, including the number of tasks in the starting and ending work queues, the number of tasks added to the work queue, and related data points.
HourlyReportingTaskOverview – For each hour in the report, shows the number of tasks in the starting and ending work queues, the number of tasks added to the work queue, the total time spent on each type of task, and more.
The report allows you to view data for up to one year of work. With this data, you can better anticipate your workloads and staffing needs on an hourly basis.
Machine Learning
Updated
Data types
To enhance the automation of fields containing height or weight information, we've introduced the following data types. These data types should be used when adding height or weight fields to layouts.
Weight
Weight - X,XXX.XX
Weight - X.XXX,XX
Length – in addition to units written in letters, Length also supports the ' and " designations for feet and inches, respectively.
Length - X,XXX.XX
Length - X.XXX,XX
For these data types, the normalized values will include a space between numbers and units (e.g., “10 kg”), if units are present. If no unit is given for a measurement, a space will not be added after the number.
Model performance
We've improved the underlying architecture for our Field ID models and increased the recommended amount of training documents from 120 to 400. If you require less than 400 documents, please reach out to your Customer Success Manager.
Performance for Field ID, signature, and checkbox models – We’ve enhanced our Field Identification model and our transcriptions of signatures and checkboxes, reducing errors and driving more automation for these tasks. In particular, we've improved the accuracy of checkbox transcriptions in dental claims.
Performance on fields with few characters – Improved transcription of fields containing one or two characters, reducing the likelihood of these fields being read as blanks.
Layout-matching improvements for Structured documents – Enhanced layout-matching capabilities for Structured documents, which will help reduce errors associated with incorrect matches.
Application Settings
New
SAML support – SAML now supported as an authentication backend.
Google OIDC support – Since Google does not natively support groups scope via OIDC, we built a custom solution. This workflow is outside the OIDC protocol.
Updated
Metadata deletion
To help you manage the size of your database, we've introduced the following metadata-deletion features:
Activity Log data deletion – You can now specify the length of time Activity Log entries will be stored in your database before the system deletes them. The feature also allows you to limit the runtime of data-deletion jobs. If a job does not delete all of the data scheduled for deletion, the data will be deleted the next time the job runs.
Submission Log data deletion – When you delete a submission, the metadata for that submission will be automatically deleted from the Submissions Log. We've added this data deletion to the system’s default behavior, and it cannot be disabled.
Data logging
Limit on data saved from the Field ID model – In order to avoid a potential leakage of information, we've limited the amount of information from our Field ID model that is saved in log files.
Database configuration
MSSQL OpenSSL options – We added additional configurability to the OpenSSL cipher options for MSSQL server databases.
Oracle SSL encryption – We added support for SSL/TLS encryption. Please note that SSL/TLS authentication aka 2-way or mutual TLS is not supported.
Support for Postgres 12.4 - Starting with v28, the use of Postgres 12.4 is supported in application databases.
Fixed
User permissions – Fixed a bug where users were not able to complete Table ID tasks without the “View Submissions” permission.
Transcription automation training – Fixed a bug where the accuracy thresholds would be set to “N/A” after making changes to the period of records to use.
Integrations
Updated
Submission completion notification – Now you can configure the Amazon SQS and UiPath connectors to send a notification that a submission has been completed, without sending all of the associated extraction data.
Fixed
Processing jobs for large documents – We fixed an issue that caused processing jobs for very large documents to halt if the documents were ingested by a connector.
API
New
Endpoint – We created an additional HTTP endpoint that enables users to download the registered and DPI-preserved page image.
Update
API output
Table data – Added support for table data output in API v4.
Field data type – Now the API will return the user-visible FDT name for all layout types as a standard behavior.
"Supervision Required" exception – To indicate whether a Submission's machine_only parameter caused its Supervision tasks to be suppressed, we've added the supervision_required exception at the Field level. Note that this exception replaces the return_confidence Submission Retrieval parameter, which will be removed in a future release.
"Character Limit Reached" exception – We've added this Field-level exception to indicate when a transcription exceeds 2000 characters. In these instances, we will also truncate the content to meet that limit.