V38 Release Notes

Versions v38.1.x are available to SaaS customers only.

38.1.17 (26 Sept 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.1.16 (14 Sept 2024)

Quality Assurance

Fixed

Actions available during Quality Assurance – We've fixed an issue that allowed the Reject Document link to be shown in the user interface for QA tasks. We've also edited the text in the confirmation dialog boxes for Mark Layout Variation Incorrect to remove references to document reprocessing.

38.1.15 (28 Aug 2024)

Reporting

Fixed

Availability of data on the Overview and User Performance pages – We've fixed an issue that prevented data from appearing on the Overview (Reporting > Overview) and User Performance (Reporting > User Performance) pages of the application. The issue was caused by a time-zone miscalculation in Oracle databases and prevented data-aggregation jobs from running in some situations.

38.1.14 (16 Aug 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.1.13 (1 Aug 2024)

Connections

Fixed

CURL_CA_BUNDLE and ActiveMQ connections – We've fixed an issue that caused the ActiveMQ Message Queue Listener and Notifier Output Blocks to fail when the CURL_CA_BUNDLE ".env" file variable did not have a value.

Informative error messages from UiPath Notifier Output Blocks – We've resolved an exception-handling issue In UiPath authentication that made it more difficult to troubleshoot failures in the UiPath Notifier Output Block.

38.1.12 (17 Jul 2024)

Machine Identification

Fixed

Detecting text in Semi-structured documents – We've fixed an issue that prevented the machine from both detecting text and from generating Identification Supervision tasks in certain situations. Instead, submissions would halt in the flow's Machine Identification Block.

38.1.8 (5 Mar 2024)

Layouts

Fixed

"Latest version is not live" message for Semi-structured layouts – We've fixed an issue that caused a "Latest version is not live" warning message to appear on the layout details page for Semi-structured layouts, even though the latest locked version of the layout was live.

Submissions

Updated

Support for EML files and their attachments – You can now extract data from EML files and their attachments. When an EML file is ingested, the system creates a PDF file from the email's body and processes each of the file's attachments as a separate document in the submission.

More information about supported file types can be found in our What is a Submission? article.

Training Data Management

Updated

Processing table cells in Training Data Management – To improve keyers’ experience and create more alignment with the Table ID Supervision interface, we’ve enhanced the processing of table cells based on a template row in Training Data Management. To learn more about template rows, see Table Identification.

Transcription Models

Fixed

“Training in Progress” status for finetuning models – We’ve resolved an issue where initiating a finetuning-model training for fields affected the status of tables’ finetuning training, and vice versa. A “Training in Progress” status was shown for both models, even if finetuning was initiated only for fields or tables.

Importing finetuning models trained on manually classified Structured documents – We’ve fixed an issue where users were not able to import finetuning models trained on manually classified Structured documents. This issue was caused by the inclusion of internal configuration information in finetuning metadata.

Machine Classification

Updated

Batch loading of release data – We’ve enhanced worker initialization for Machine Classification by introducing batch loading of release data. This improvement boosts efficiency, scalability, and responsiveness by reducing data retrieval overhead during task startup.

Quality Assurance

Fixed

Transcription QA tasks not generated for cells – We’ve fixed an issue related to the internal calculations for Transcription QA for tables. Certain cells were incorrectly recorded as having values that were agreed upon by consensus. As a result, no QA tasks were generated for them.

PII Data Deletion

Updated

Concurrent deletion of PII data – We’ve improved the concurrent deletion of PII data to prevent errors caused by simultaneous deletion of records.

Connections

Fixed

Retry mechanism for Email Listeners – We've implemented a retry mechanism for Email Listeners, helping to mitigate potential failures caused by temporary errors in external services.

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated cryptography to 42.0.4 and aiohttp to 3.9.3.

38.1.7 (14 Feb 2024)

Training

Fixed

Performance of Table Identification models and select Field Identification models – We've fixed an issue that caused the performance of Table Identification models and Multiple Occurrence and Generic Freeform Text models for Field Identification to decrease. As part of this update, a feature that stopped training when the system determined that additional training would not improve model performance has been disabled by default.

Training Data Management

Fixed

One-click bounding boxes on pages after the first page – We've fixed an issue that prevented one-click bounding boxes from appearing on pages following the first page of a document.

38.1.6 (6 Feb 2024)

Layouts

Fixed

Submissions with many fields on a page in instances with MSSQL databases – We've resolved an issue that caused submissions to halt when their documents' layouts contained more than 450 fields on a page. The issue affected instances with MSSQL databases.

Model Training

Fixed

Applications and trainers with different patch versions – We’ve fixed an issue where attempting to deploy a model resulted in an app_version is invalid. Must be current app version error. This error occurred when the patch version of the application was not exactly the same as that of the trainer. You can now activate candidate models that were trained by trainers from different patch versions.

Flows

Fixed

Availability of v37 blocks – We've resolved an issue that prevented v37 flow blocks from being included in v38 of the application.

Kafka Listener

Fixed

Handling malformed messages – We’ve fixed an issue that prevented users from specifying how the Kafka Listener should handle the malformed messages it receives. As part of this update, we’ve added support for the DISCARD_MALFORMED_RESOURCES “.env” file variable. By default, this variable is set to true, and the system silently commits the offset of any malformed messages it receives in Kafka without processing those messages. Setting this variable to false allows you to use a dead-letter queue to hold any malformed messages for downstream processing.

Llama Block

Fixed

Deploying the Llama Block – We’ve resolved an issue related to the deployment of the third-party large language model (LLM) Llama Block. You can now expect improved stability and performance when using this block in your flows.

38.1.5 (26 Jan 2024)

Flows

Fixed

Completing submissions without documents – We've fixed an issue that caused the Complete Block to fail—and submissions to halt—when submissions did not contain documents. As part of this update, the Complete Block can finish its tasks successfully regardless of a submission's structure.

Security

Fixed

Disabling TLS verification – We've resolved an issue related to the requests library that prevented TLS verification from being disabled when HS_TLS_VERIFY_ENABLED was set to false in the ".env" file.

38.1.4 (19 Jan 2024)

Training

Fixed

Training Identification models for non-English languages – We've fixed an issue that caused the training of Identification models trained on non-English documents to fail.

Flows

Updated

More submission data in output of on-error flows – If an on-error flow is run during the processing of a submission, the output of that flow contains the UUID of the halted submission, along with details on why the submission halted. Exposing this information enabled flow developers to automate remedial actions, and it allows for faster identification of halted submissions.

To learn more about on-error flows, see On-Error Flows.

Fixed

Queuing of tasks after block shutdown – We've resolved a task-queuing issue that caused tasks to time out or be delayed when blocks were shut down.

Output Connections

Fixed

Exporting transformed submission data – We've fixed an issue that caused submissions to halt when API Version was set to transformed in Output Block connection settings.

API

Updated

correlation_id query parameter for Listing Submissions – We've added a correlation_id query parameter to the Listing Submissions endpoint. This parameter allows you to filter for the Submission that was processed in Flow Runs with the specified correlation_id.

38.1.3 (12 Jan 2024)

Input Connections

Fixed

Submitting Base64-encoded data via Message Queue (MQ) Listener connections – We've fixed an issue that prevented Base64-encoded data from being ingested through MQ Listener connections.

38.1.2 (10 Jan 2024)

Secrets Management

Fixed

Retrieving secrets from One Identity Safeguard – We've fixed an issue that prevented instances from retrieving secrets from One Identity Safeguard. As part of this fix, we've restored a file that was removed in a prior update.

38.1.1 (1 Dec 2023)

Training Data Management

Fixed

Navigating through documents in the document viewer – We've fixed an issue that caused crashes and other unexpected behavior when navigating through documents by using the forward and back buttons at the top of the document viewer.

38.1.0 (28 Nov 2023)

Layout Identifiers

Updated

Maximum number of Layout IDs – We’ve added a limit of 2 layout identifiers in for each Structured layout. Users now see a warning if they exceed the maximum number of identifiers.

Training Data Management

Updated

Date Modified after running analysis – We’ve improved the logic that determines when a training document’s modified date is updated. Running analysis no longer updates values in the Date Modified column.

Fixed

Saving annotations in other documents – We’ve fixed a UI issue where bounding boxes were transmitted across documents when using the navigation in Training Data Management.

Submissions

Fixed

Responsiveness when viewing documents – We’ve fixed a query-plan issue in deployments using MSSQL databases that caused delays when opening the document viewer in some instances.

Document Classification

New

“Mark Layout Incorrect” and “Reject Document” options – We’ve added Reprocessing (Mark Layout Incorrect) and Reject Document options for both Structured and Semi-structured documents. The options are available in all Supervision tasks except for Table Transcription and Custom Supervision.

You can find these options in the Document Details section in the right-hand sidebar during Supervision. Hover over the tooltips to see more details for each option.

Table Identification

Fixed

Row numbers in Empty Row dialog – We’ve addressed an issue with the Empty Row dialog box where the row numbers of empty rows that spanned multiple pages were listed as many times as the number of pages they spanned. The dialog now lists each empty row’s number once.

Rows deleted by clicking “Delete all rows below” – We’ve fixed an issue where clicking the Delete all rows below button was deleting all annotations of the selected column. Clicking the button now deletes all rows below the selected one.

Multiple tables and deadlocking in MSSQL databases – We’ve fixed an issue in instances with MSSQL databases where bulk multiple-tables queries with larger batch sizes resulted in deadlocking.

Transcription Supervision

Fixed

ResizeObserver loop exceeded error in Chrome on Macs – We've fixed an issue that caused ResizeObserver loop exceeded errors to occur during Transcription Supervision in Mac Chrome browsers in some instances.

Reprocessing

Fixed

Reclassification in documents processed through Flexible Extraction – We’ve fixed a version-UUID issue that prevented documents processed through Flexible Extraction from having their pages reclassified.

Input Connections

Updated

Number of subfolders scanned by Email Listener for Microsoft 365 Outlook – We've updated the number of subfolders scanned by the Email Listener in Microsoft 365 Outlook accounts from 10 to 100.

Output Connections

New

Kafka Notifier – With the Kafka Notifier Output Block, you can now send notifications and extracted data to the Kafka topic of your choice in JSON format. To allow you to configure your Kafka producer options to meet your needs, we've introduced several ".env" file variables. You can also choose to have the Kafka Notifier send notifications synchronously or asynchronously.

More information about the Kafka Notifier can be found in Kafka Notifier.

Fixed

First submission processed through Box Notifier Output Block – We've fixed an authentication-token issue that caused an error to occur when a submission was processed through the Box Notifier Output Block after the block's initial setup.

Artifacts

Updated

Improving import/export performance for artifacts in S3 buckets – We’ve made optimizations to increase the speed of reading files in and writing files to remote S3 buckets.

Audit Log

Updated

Failed login attempts – We’ve added failed login attempts to the audit log as LOGIN_FAILED activity.

Memory Management

Fixed

Slide selector for automatically assigned machines – We've fixed an issue that prevented users from moving the slider to select a number of machines under "Automatically Assigned Machines" on the Memory Management page.

API

Fixed

external_id in Submissions OpenAPI schema – We've updated the data type of external_id in the FlatSubmission and NestedSubmission objects from integer to string.

38.0.37 (21 May 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.36 (7 May 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.35 (23 Apr 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.34 (11 Apr 2025)

Version 38.0.33 was not released and is not supported.

Updates

This version includes a number of updates that optimize our internal testing and deployment processes

38.0.32 (17 Mar 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.31 (6 Mar 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.30 (18 Feb 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.29 (4 Feb 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.28 (30 Jan 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.27 (17 Jan 2025)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.26 (5 Dec 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.25 (26 Nov 2024)

Flow Blocks

Updated

"Scope" setting for HTTP REST Blocks – We've added a Scope setting to HTTP REST Blocks, which allows you to specify a scope for requests authorized with OAuth 2.0. This setting is available only if the block's Authorization Type is set to OAuth 2.0 Client Credentials.

38.0.24 (21 Nov 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.23 (7 Nov 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.22 (24 Oct 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.21 (9 Oct 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.20 (26 Sept 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.19 (28 Aug 2028)

Submissions

Fixed

Months shown in date selector for users outside of the system's time zone – We've resolved an issue that caused incorrect month names to appear in the date filter's calendars on the Submissions page. This issue affected users working on machines whose time zones did not match the time zone used in the application.

Reporting

Fixed

38.0.18 (16 Aug 2024)

Updates

This version includes a number of updates that optimize our internal testing and deployment processes.

38.0.17 (1 Aug 2024)

Connections

Fixed

38.0.16 (17 Jul 2024)

Machine Identification

Fixed

38.0.15 (3 Jul 2024)

Models

Fixed

Using undeployed Identification models – We've fixed an issue that resulted in the continued use of undeployed Identification models in submission processing in some circumstances.

38.0.14 (20 Jun 2024)

Flexible Extraction

Fixed

Transcribing fields in manually reclassified Structured documents – We've resolved an issue that prevented fields from appearing in Flexible Extraction tasks for Structured documents that had been manually reclassified.

Security

Fixed

Addressing security vulnerabilities – To increase the functionality and security of your system, we've upgraded:

requests to 2.32.2,
docker to 7.1.0,
types-requests to 2.31.0.6, and
idna to 3.7.

38.0.13 (6 Jun 2024)

Machine Classification

Fixed

Image Correction for documents with large, dark areas – We've fixed an issue that prevented Image Correction from detecting the incorrect orientation of documents that contained large, dark areas (e.g., images of checks on a dark background).

38.0.12 (23 May 2024)

Training Data Management

Fixed

Showing anomalies in table annotations – We've fixed an issue that prevented detected anomalies in table annotations from being shown in the application in some situations.

Flows

Fixed

Steps in importing flows – We've resolved an issue that resulted in 400 errors at various points in the flow-import process in some instances. The issue was caused by a mismatch between the next step in the process and the step associated with the passed transaction ID.

Manual Classification

Fixed

Rotating images in flows created in v36 – We've fixed an issue that caused submissions in v36 flows to halt when any of their page images were rotated during Manual Classification.

38.0.11 (8 May 2024)

Flow Blocks

Fixed

Completeness of field transcriptions from the Full Page Transcription Block – We’ve fixed an issue where certain fields, like purchase-order numbers, were truncated when they were processed by the Full Page Transcription Block.

Text Classification

Fixed

Uploading training data from Windows machines – We've resolved an issue that prevented Windows users from uploading ZIP files containing training data for Text Classification models.

Machine Identification

Fixed

Asterisk as custom character for splitting segments – We've resolved an issue with text-segment detection that resulted in IndexError: list index out of range during Machine Identification. The issue occurred when custom_char_for_splitting_segments was set to * in /admin.

Table Identification predictions – We’ve resolved an issue where multiple rows were incorrectly labeled with the same annotation during Table Identification. With this update, rather than assigning the same annotation to multiple rows, the system retains the row prediction that has the highest confidence level, and it generates a Supervision task for the remaining rows.

Security

Updated

Improvements related to URL redirections – We’ve enhanced security by preventing unauthorized URL redirections, ensuring users are only directed to safe, trusted locations. This update improves overall security and protects against potential phishing and social-engineering attacks.

Installation

Updated

Increased post-install timeout – We’ve increased the post-install timeout from 3 to 4 minutes, providing more time for the system to load. Users will now see the following error message if the system’s timeout is reached before the application starts:

“The application does not appear to have started within 240 seconds, however startup times can vary due to environmental factors such as proxies and firewalls. If you are unable to access the login screen after an additional 2-3 minutes, please reach out to Hyperscience Support for additional troubleshooting assistance.”

38.0.10 (19 Apr 2024)

User Experience

Fixed

Month labels for calendars in date filters – We’ve addressed an issue where incorrect month labels for calendars in date filters were shown to users in time zones that were different from the application’s time zone. This issue was caused by inconsistencies in timezone handling.

Table Identification

Updated

Graceful error handling for tasks from deleted documents – The system now shows a dialog box containing an error message when a keyer attempts to submit a Table Identification task for a document that has been deleted. In previous versions, keyers were redirected to an "Unexpected error" page when the error occurred.

PII Data Deletion

Fixed

PII data deletion for QA records – We’ve resolved an issue where field QA records were deleted during the PII-deletion process, while QA records for table cells were not. In 38.0.10 and later, both field and table-cell QA records are included in the PII-deletion process.

38.0.9 (18 Apr 2024)

Languages

Fixed

Machine transcription of Korean multiline fields – We’ve fixed an issue that caused machine transcription to fail on multiline Korean fields. Examples of incorrect transcriptions and their corrected versions appear below.

Incorrect: 원전세배당소득 | Correct: 이자.배당소득 원천세
Incorrect: 이장소백당소득 | Correct: 이자.배당소득 지방소득세

Transcription Models

Fixed

Setting thresholds for transcription models – We've fixed an issue that prevented users from modifying threshold settings for transcription types, even when the Transcription models for those transcription types were disabled. With this update, after running training and disabling Transcription models, you can freely adjust thresholds for the models’ transcription types in flow settings as expected.

Flows

Updated

Floating-point values for SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS – In addition to integer values, you can also enter values of type float for the SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS ".env" file variables. This update gives you flexibility when customizing your submission-processing latency, particularly if low latency levels are desired.

Quality Assurance

Fixed

Logic for automatic QA sampling rates – We’ve fixed a dereferencing issue that caused silent failures for flows with specific default settings. As part of this update, the logic accurately handles dereferencing, ensuring proper handling of affected flows. The issue affected flows created in Hyperscience versions that preceded the application version.

Reporting

Fixed

Filtering the Transcription Sampled Errors report by user – We’ve resolved an issue in the Transcription Sampled Errors report that caused all errors for a specific field to be displayed, even though the report was filtered to show errors made by a specific user only.

Audit Log

Fixed

Filtering by multiple users or activities – We’ve fixed an issue where filtering the Audit Log by multiple users or activities showed zero results, despite the presence of matching records. With this update, the Audit Log’s filters correctly display results when multiple users or activities are selected.

File Storage

Fixed

Folder creation and inode depletion – We’ve fixed an issue where the file store created folders unnecessarily, which depleted inodes and caused disk-space errors. We've divided files into leaf folders for easier management, improving the file system's structure, preventing overcrowding, and ensuring a more balanced file distribution. As a result, this update minimizes the risk of performance issues.

Security

Fixed

Upgrading requests – To ensure that TLS verification is disabled when the HS_TLS_VERIFY_ENABLED ".env" file variable is set to false, we've upgraded requests to 2.31.0.post0.

API

Updated

38.0.8 (14 Feb 2024)

Training

Fixed

Machine Classification

Fixed

Speed of Machine Classification of Structured documents – We've reduced the time required to retrieve metadata about each layout variation in a release, making Machine Classification more efficient for Structured documents.

38.0.7 (6 Feb 2024)

Training Data Management

Fixed

One-click bounding boxes on pages after the first page – We've fixed an issue that prevented one-click bounding boxes from appearing on pages following the first page of a document.

Flows

Fixed

Availability of v37 blocks – We've resolved an issue that prevented v37 flow blocks from being included in v38 of the application.

38.0.6 (2 Feb 2024)

Layouts

Fixed

Machine Transcription

Fixed

Caching predictions for handwritten text – We've fixed a race condition that caused a FileNotFound error to occur when the system was caching its predictions for handwritten text.

Flows

Fixed

Completing submissions without documents – We've resolved an issue that caused the Complete Block to fail—and submissions to halt—when submissions did not contain documents. The issue also affected submissions that contained documents without a data_types value in the Complete Block’s input. As part of this update, the Complete Block can finish its tasks successfully regardless of a submission's structure.

Queuing of tasks after block shutdown – We've fixed a task-queuing issue that caused tasks to time out or be delayed when blocks were shut down.

38.0.5 (15 Jan 2024)

Training

Fixed

Training Identification models for non-English languages – We've fixed an issue that caused the training of Identification models trained on non-English documents to fail.

38.0.4 (5 Jan 2024)

Training Data Management

Fixed

Reporting

Fixed

Calculating the time taken to complete Flexible Extraction and Custom Supervision tasks – We've fixed an issue that caused completion-time calculations for Flexible Extraction and Custom Supervision tasks to depend on the contents of the tasks (e.g., fields, tables, decisions).

Secrets Management

Fixed

Retrieving secrets from One Identity Safeguard in Podman-based instances – We've resolved an issue that prevented secrets from being retrieved from One Identity Safeguard in instances running on Podman. With this update, hard-coded references to Docker have been removed, allowing both Docker- and Podman-based instances to retrieve secrets from Safeguard.

38.0.3 (15 Dec 2023)

Training Data Management

Updated

Resource management – We’ve made some optimizations for uploading documents, as well as for saving and editing annotations on existing training documents. Memory usage should be lower during these tasks, and a small increase in speed can be observed, especially when working with larger documents.

Fixed

Task Queue

Fixed

Filtering by specific flow – We’ve fixed an issue where filtering by a specific flow caused more tasks than expected to be displayed in the Task Queue.

Reporting

Updated

Reporting for Custom Supervision – We’ve added Custom Supervision metrics to the Keyer Projection Report. You can download the report from the User Performance page (Reporting > User Performance).

The following metrics have been added to the report:

HistoricalProcessing.csv
- Fields Updated in Custom Supervision
- Cells Updated in Custom Supervision
- Field Characters Keyed in Custom Supervision
- Table Cells Keyed in Custom Supervision
- Decisions Changed in Custom Supervision
- Decisions Completed in Custom Supervision
HourlyReportingSubmissionsOverview.csv
- Users Performing Custom Supervision
- Time Spent in Custom Supervision (Seconds)
- Custom Supervision Fields and Cells in Starting Work Queue
- Custom Supervision Fields and Cells Added to Work Queue
- Custom Supervision Fields and Cells Completed
- Custom Supervision Fields and Cells in Ending Work Queue
- Custom Supervision Decisions in Starting Work Queue
- Custom Supervision Decisions Added to Work Queue
- Custom Supervision Decisions Completed
- Custom Supervision Decisions in Ending Work Queue
Keyer Performance.csv
- Custom Supervision Decisions Changed
- Custom Supervision Decisions Completed

You can also see metrics for Custom Supervision in the HourlyReportingTaskOverview.csv in the Keyer Projection Report.

The report’s metrics can also be retrieved by sending requests to the Historical Processing Report, Hourly Submission Overview Report, Keyer Performance Report, and Hourly Task Overview Report endpoints of our API.

OpenID Connect (OIDC)

New

Redirecting users during ID token renewal – By default, when renewing OIDC ID tokens, the application no longer redirects users to the identity provider’s token endpoint. To allow this step to be bypassed, we have introduced the HS_OIDC_RENEW_ID_TOKEN_WITH_REFRESH_TOKEN “.env” file variable. When this variable is set to true, the renewal transaction occurs without redirecting users out of the application, enhancing the overall user experience. See OpenID Connect (OIDC) for more details.

Fixed

Renewing ID tokens – We’ve fixed an issue with the HS_OIDC_RENEW_ID_TOKEN_EXPIRY_SECONDS “.env” file variable that prevented the OIDC ID token from being renewed. Now, the ID token is renewed as intended.

Security

Fixed

Updating cryptography – To address security vulnerabilities, we've updated cryptography to 41.0.6.

38.0.2 (1 Dec 2023)

Languages

Fixed

Processing text in Turkish submissions – We've resolved a string-parsing issue that caused Turkish submissions to halt if they contained a colon (":").

Flows

Updated

Default timeout for blocks – We've increased the default timeout for block requests from 60 seconds to 180 seconds.

Log events for failed flows – We’ve reintroduced a log event that indicates when a flow has failed, even if the retry attempts for the flow have not yet been exhausted.

Fixed

Clicking on blocks with invalid identifiers – We’ve fixed an issue where clicking on a block with an invalid identifier in an imported flow caused an “Unexpected error” to occur.

Submissions

Fixed

Responsiveness when viewing documents – We’ve fixed a query-plan issue in deployments using MSSQL databases that caused delays when opening the document viewer in some instances.

Submission Processing

Updated

Windows-1252 encoding and HTML_SUPPORTED_ENCODING_TYPES – You can now specify the HTML encoding types supported in your instance or alter their preferred order by using the HTML_SUPPORTED_ENCODING_TYPES “.env” file variable. This variable lists the order in which encoding types should be used when processing HTML files. Hyperscience supports UTF-8 and Windows-1252 encoding types, with HTML_SUPPORTED_ENCODING_TYPES having a default value of utf-8, windows-1252.

Training Data Management

Updated

Classification

Fixed

Submission status after classifying documents as Additional Form Pages – We’ve fixed an issue where classifying a document as an Additional Form Page (No Layout Found) did not update the submission’s status on the Submission Overview page.

Progress of rejected documents – We’ve addressed an issue where rejected documents were being sent to other manual and machine tasks after Document Organization. In the case of reprocessing a submission with two documents (one rejected, one marked for reprocessing), the Document Organization task now processes the rejected document unchanged, excluding it from the Classification Supervision task. The document retains its rejected status through the rest of the flow. Meanwhile, the reprocessed submission follows the standard workflow and enters Classification Supervision.

Flexible Extraction

Updated

Processing documents with duplicate pages – Users can now manually select a layout and extract data for documents with duplicate pages during Flexible Extraction. In these documents, identical fields that appear on identical pages are treated as unique fields by the system. Note that Machine Classification for documents with duplicate pages is not supported.

Task Queue

Fixed

Effects of changing filters – We’ve fixed an issue where changing the filters in the Task Queue would only apply the changes and deselect any filters originally applied. This issue led to incorrect filtering when choosing both a date range and a filter from the Filters list.

Input Connections

Updated

Number of subfolders scanned by Email Listener for Microsoft 365 Outlook – We've updated the number of subfolders scanned by the Email Listener in Microsoft 365 Outlook accounts from 10 to 100.

OpenTelemetry

Updated

Emitting metrics after timeouts – The system now emits OpenTelemetry metrics for tasks after they exhaust their allotted retry attempts and time out.

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated urllib3 to 1.26.18 and pygments to 2.15.0.

38.0.1 (26 Oct 2023)

Version 38.0.0 was not released and is not supported.

Languages

Updated

Enhancements to extraction of Korean-English text – This version of Hyperscience includes the following optimizations for the extraction of Korean-English text:

increased overall accuracy of Korean transcriptions
enhanced performance on medical prescriptions and addresses
improved language modeling

More information on Korean-English and our other supported languages can be found in Supported Languages.

Data Types

New

Capitalized Names – We've added a Capitalized Names data type that expects names that have the first letter of each name (e.g., first name and last name) capitalized.

For more details on Capitalized Names and other default data types, see Supported Characters and Default Data Types.

Layouts

New

Multiple tables in Semi-structured layouts – Semi-structured layouts can now contain more than one table definition, allowing you to extract data from multiple tables in a single document without the use of a Custom Code Block.

The system generates separate Supervision tasks for each table. During these tasks, if another table in the document has been annotated, keyers can see which rows have been identified in that table.

Each table has its own Identification model. You can monitor the performance of each table's model on the Layout Details page. Keyers can also annotate all of a document's tables in Training Data Management.

To learn more about working with multiple tables, see Table Identification.

Submission Processing

New

Knowledge Store – The Knowledge Store allows you to store structured business information that you can use to enrich, validate, and transform unstructured data in your submissions. This feature, along with related updates to Custom Supervision, eliminates the need to complete these tasks in other systems after submissions have been processed in Hyperscience. For example, if being able to retrieve information on client accounts in Hyperscience would enable your keyers to validate key data points in a submission, you could create a definition for an item of type Client Account and create a collection of Client Account items. You could then create a Custom Supervision task where keyers can validate the relevant information about client accounts. Alternatively, Custom Code Blocks can also be configured to make use of Knowledge Store data at any point in a flow.

You can add, edit, and view Knowledge Store items in the Storage section of the application, and those items can be imported and exported in CSV format. You can store up to 1 million items with 100 properties each.

To learn more, see Knowledge Store.

Flows

New

On-error flows – In previous versions of Hyperscience, if a flow execution failed, the related submission would halt without anyone being notified of the failure, leaving the submission unfinished until someone noticed it and took corrective action. With the introduction of error-handling flows in v38, you can assign an on-error flow to your flows. This on-error flow runs if the flow it is assigned to has exhausted all of its allotted retry attempts. If a submission halts in a flow that has an on-error flow assigned to it, a halted-submission notification is sent via an Output Block.

Just as you can with other flows, you can create custom on-error flows. For example, you can configure your message-queue system to create alerts from the on-error flow’s notifications, which can be sent as emails or messages in applications like Slack or Microsoft Teams. Then, your team can troubleshoot and correct the issue, helping to ensure that your organization's SLAs are met.

To maximize the benefits of this feature, we encourage you to apply an on-error flow to each flow you have deployed in production.

For information about these error-handling flows, see On-Error Flows.

Set Transformed Output Block – We've added a Set Transformed Output Block, which you can include in custom flows. This block generates the transformed output for submissions that have post-processing customization applied to them.

Updated

Improvements to flow groups – We've made the following enhancements to flow groups in v38:

Performing actions on flow groups — You can now import, export, archive, and unarchive entire flow groups instead of on single flows.
"Document Processing" flow group — Previous versions of Hyperscience included a single "Document Processing" flow, but in v38, "Document Processing" is a flow group.
- The flow group has a subflow that contains the blocks in the standard document-processing flow with the exception of the Input Block and Output Block. Those blocks are included in the top-level flow of the flow group.
- The contents of the Document Processing subflow cannot be edited. You cannot add or remove blocks in the subflow, but you can change the settings of the subflow and its individual blocks. When editing the flow's blocks, you can select the block you want to edit from the drop-down menu in the right-hand sidebar. This change makes it easier to find the settings you're looking for in Flow Studio.
- The separation of "Document Processing" into multiple flows allows you to reuse the Document Processing subflow in other flow groups and to ensure that the core document-processing steps remain consistent across those flow groups.
Separate pages for top-level flows and all flows — We've separated the lists of top-level flows (a.k.a. main flows) and all flows in your instance: top-level flows (a.k.a. main flows) appear on the Top-level Flows page (Flows > Top-level Flows), and all flows are listed on the All Flows page (Flows > All Flows).

For more details on flow groups, see Managing Flows and Document Processing Flow in V38.

Fixed

Filtering by multiple statuses – We’ve fixed the Flows Status drop-down filter to allow users to select more than one status at a time. We’ve also made adjustments to the Clear All button’s functionality to ensure that all selected statuses are cleared when it is clicked.

Flow Blocks

New

GPT Block – If your use case could benefit from the use of AI—such as in message generation, data validation, translation, or document analysis—you can add a GPT Block to your flow to connect to one of OpenAI's Generative Pre-Trained Transformer (GPT) models. As a result, you can leverage the potential of GPT while your documents are being processed in Hyperscience, eliminating the need for additional AI-powered processing downstream.

You can place the GPT Block at the point in your flow where you would like to make use of GPT's capabilities. The block sends requests to the GPT API with the prompts you specify. Note that the "conversational" aspect of ChatGPT is not supported by the GPT Block; each request results in a single message and response.

In order to ensure the accuracy of the GPT Block's output, place a Custom Supervision Block after the GPT Block in your flow.

Note that the GPT Block requires an OpenAI account, which you need to manage and pay for independently of Hyperscience.

Llama Block (Beta) – With our new Llama Block, you can leverage the capabilities of Meta's Llama 2 13 billion parameter (13B) large language model (LLM) to enhance the output of your submissions. You can include the block at the point in your flow where you would like the model to perform the post-extraction or text-generation task of your choosing. Example uses of the Llama Block include spelling checks and corrections, message creation based on extracted data, data validation, and data comparison.

To ensure the accuracy of the block's output, include a Custom Supervision Block after the Llama Block in your flow.

You can install the Llama Block in both SaaS and on-premise deployments, and it is designed for offline use. The block is not included in the Hyperscience installation bundle and is available for download separately.

Input Connections

New

Kafka Listener – When you add a Kafka Listener connection to a flow's Input Block, you can ingest, send, and schedule submissions from a Kafka queue for processing in Hyperscience. Except for the specifications of file URLs, Kafka messages must be formatted in JSON in the same structure as Submission Creation API requests. You can configure the block's output to be sent to a Kafka topic of your choosing via an HTTP connection.

To allow you to configure your Kafka consumer options to meet your needs, we've introduced several ".env" file variables. You can also scale the number of consumers in your instance.

For more information on the Kafka Listener and its configuration options, see Kafka Listener.

Manual Classification

New

Reclassification and reprocessing – Reclassification allows keyers to flag misclassified documents during Supervision tasks, which then sends the document's submission to Manual Classification. This feature prevents submissions from reaching a "dead end" in Hyperscience, reducing the need to resubmit submissions or process them in other systems. The submission's ID remains unchanged during the entire reclassification process.

If any of a submission's documents are flagged as misclassified, the new Reprocessing Block, placed directly before the Complete Block, sends the submission to Manual Classification, even if it didn't go through the Manual Classification Block before. Any tasks completed on the submission during its initial processing before the Manual Classification Block are skipped. During reclassification, all pages of the submission are made available to the keyer. Any documents not affected by the reclassification skip the rest of the blocks. If a document is changed by reclassification, it is processed in the remainder of the flow, and it cannot be flagged as misclassified again.

To learn more about reclassification, see Reprocessing.

Updated

Manual Rotation – If an unmatched page in a Document Classification task needs to be rotated in order to have the correct orientation, a keyer can rotate the page by right-clicking on it and clicking Rotate page 90° clockwise. The keyer can click on this option as many times as needed to achieve the correct orientation. By allowing keyers to adjust the orientation of pages, this feature eliminates the need for these pages' submissions to be resubmitted.

To learn more about Manual Rotation, see Manual Rotation.

Using the original versions of page images – When submission pages are ingested into Hyperscience, the system creates images of them to use throughout the submission's processing. It then makes adjustments to those images to make them more readable by the machine and by humans. Occasionally, it may be beneficial to use the original image of the page rather than the machine-adjusted version. To do so, keyers can right-click on the page's thumbnail in the left-hand sidebar of the Document Classification interface and then deselect the Machine Adjusted Image checkbox.

More details about this feature can be found in Manual Rotation.

Field Identification

Updated

Enhancements to Multiple Occurrences – To make the Multiple Occurrences feature easier to use, we've made the following updates:

“Multiple occurrences” checkbox for layouts — We've added a Multiple occurrence checkbox for fields in the Semi-structured Layout Editor. The checkbox is unselected by default.
- When this checkbox is selected, keyers can annotate multiple occurrences of the field in Training Data Management, and the system uses a Multiple Occurrence model to make predictions for the field.
- When the checkbox is not selected, keyers can annotate multiple occurrences of the field during Supervision and QA tasks, but they won't be able to annotate them in Training Data Management.
- Note that you cannot select the Multiple occurrences checkbox in layouts that contain at least one field of the Clause Extraction data type.
Visual distinctions between multiple occurrences and multiple bounding boxes during annotation — It's easy to confuse Multiple Occurrences and Multiple Bounding Boxes when annotating Semi-structured documents. To make it clearer to keyers whether they're adding occurrences or bounding boxes, we've updated the annotation interface to indicate what kind of entry is being added. These updates are available in Supervision and QA tasks as well as in Training Data Management.

For more information on Multiple Occurrences, see Field Identification.

Manual Transcription

Updated

Pre-filling Transcription tasks with machine transcriptions – When the Supervision Pre-fill Best Guess Transcription setting is enabled and the machine has low confidence in its transcription for a field, the system populates the field's Transcription textbox with its best guess for the field's value. Then, the keyer can either submit the transcription as-is or make edits to it before submitting it, saving them the time it would take to enter the initial transcription themselves.

This option is disabled by default in all flows. To enable this option, contact your Hyperscience representative.

For more information, see Transcription Settings.

Reviewing blank table cells during Transcription Supervision tasks – If the machine has high confidence that a blank cell in a table should remain blank, it will not be flagged for a keyer to review during the Transcription Supervision task for the table. This update helps keyers focus on transcribing cells rather than reviewing cells that have no text, leading to faster task completion.

To learn more, see Best Practices for Table Identification and Table Transcription.

Unstructured Extraction

Updated

Clause Extraction – With the updates in v38, you can extract text that contains more than 2000 characters from Unstructured documents. There is no limit to the number of characters you can extract for each data point, allowing you to extract text that is several pages in length. The same functionality available to data points containing 2000 characters or less is also available to longer data points, including location predictions, machine transcriptions, and Custom Supervision.

To learn more, see Clause Extraction.

Unstructured Extraction for on-premise Kubernetes deployments – You can now extract data points from unstructured documents in on-premise Kubernetes deployments of Hyperscience. To do so, your trainer machine needs to have both a GPU (graphics processing unit) and a CPU (central processing unit), as training Unstructured models requires additional processing resources.

For more information about the technical requirements for Unstructured Extraction and setting up a trainer with a GPU, see Enabling Trainers with GPUs in On-Premise Kubernetes Deployments.

Custom Supervision

Updated

Adding and editing case notes – Keyers can now add and edit case notes during Custom Supervision tasks. When this feature is enabled, keyers can modify case notes directly from the Custom Supervision interface, eliminating the need to find the relevant Case Details pages in the application and change notes there. Each case note can contain up to 2000 characters.

To use this feature, a Custom Supervision Block must be followed by a Machine Collation Block.

Note that if the page or document being reviewed during Custom Supervision is part of multiple cases, any additions or edits to case notes will be applied to all of the cases the page or document is assigned to.

To learn how to enable this feature in your Custom Supervision tasks, see our Flows SDK documentation.

Reporting

New

Operational value reporting – To give you insight into how Hyperscience makes your document processing more efficient, we've redesigned Reporting's Overview page (Reporting > Overview) to highlight the following key metrics:

Throughput
Average human handling time
Average lead time

You can filter the data by layout, flow, unit (submission or document), and date range. You can also download the data presented on the page to a CSV file.

As part of this update, we've changed the locations of the reports that were previously on the Overview page.

System Throughput, Automation, and Automation Training Result reports are now on the Processing Time page (Reporting > Processing Time).
Document Output Accuracy is now on the Accuracy page (Reporting > Accuracy).

More information about operational value reporting can be found in Operational Value Reporting.

SaaS

New

OpenID Connect and SAML configuration – Previously, if you wanted to integrate a SAML or OIDC identity provider (IdP) with a SaaS deployment of Hyperscience, you needed to send the connection details to our Support team, who would configure the integration on your behalf. In new deployments of v38, you can manage your SAML or OIDC integration on your own in the application. After you've entered the required details for the integration, you can test the connection to your IdP.

For more information, see OpenID Connect (OIDC) and SAML.

Updated

Enhancements to autoscaling – We've expanded the autoscaling capabilities in SaaS deployments to include the scaling of the workflow engine and the horizontal scaling of the database. These improvements help to increase submission throughput and maximize the potential of the infrastructure found in SaaS deployments.

To learn more, see Scaling Hyperscience.

API

New

Knowledge Store endpoints – We've created endpoints that allow you to retrieve and manage your Knowledge Store data via our API.

For more information, see our API documentation.

Operational value reporting endpoints – We've added System Throughput and Time to Completion endpoints to the API, allowing you to retrieve data for these business-value metrics programmatically.

To learn more about these endpoints, see System Throughput Report and Time to Completion Report our API documentation.