With the introduction of flows, we have created the following types of blocks to help you customize your flows to meet your teams’ needs. To learn more about any of these blocks, or for assistance in adding them to your flows, contact your Hyperscience representative.
Block types and settings
Application-level and flow-level settings
While some blocks have settings that allow you to customize how they work, many blocks are affected by application-level or flow-level settings. To learn more about these settings, see Application Settings Overview and Flow Settings.
Input Blocks
Formerly known as “Input Connectors,” Input Blocks allow you to integrate your organization’s data sources into our system. Through these blocks, you can process documents from a variety of sources, such as inboxes, message queues, or folders on the network.
A full list of the Input Blocks we currently support, along with more details about each, can be found in Input Blocks.
Settings
No matter how many Input blocks you choose to enable, information about those blocks is contained in a main Input block for your flow. This block has the settings described below.
Name | Required? | Description |
---|---|---|
Allow API submissions | No | Indicates whether the flow accepts submissions submitted via API. |
Allow manual submissions | No | Indicates whether the flow accepts manually uploaded submissions. |
Submission Initialization
The Submission Initialization Block contains settings that connect your flow to your:
AWS S3 submission retrieval store,
OCS submission retrieval store,
Generic web storage (HTTP/HTTPS) submission retrieval store, or
Azure Blob Storage retrieval store.
Settings
You can customize the functionality of your block by editing the settings described below.
AWS S3
S3 Submission Retrieval Store
If you are using an S3 bucket as your submission retrieval store and you are not authenticating through IAM roles, provide your AWS access key ID and secret access key in the S3 Submission Retrieval Store field.
To enter your credentials:
Click Edit value.
Enter your credentials in JSON format:
{ "aws_access_key_id": "<your_access_key_id>", "aws_secret_access_key": "<your_secret_key>" }
You can authenticate requests using AWS Signature Version 2 (SigV2). To use AWS Signature Version 2, add the following variable and value to the S3 Submission Retrieval Store field:
"s3_signature_version":"s3"
Click Done.
Click Save in the upper-right corner of the page.
In the dialog box that appears, click Save & Deploy.
For more information about AWS access key IDs and secret access keys, see Amazon's Understanding and getting your AWS credentials.
S3 Submission Retrieval Endpoint URL
If your submission retrieval store is not in the public cloud (i.e., its URL does not point to s3.amazonaws.com — for example, a government cloud or an S3-compatible internal setup), enter its URL in S3 Submission Retrieval Endpoint URL. You do not need to edit your “.env” file to update this URL.
To edit the endpoint URL for your S3 submission retrieval store:
Enter the URL in the S3 Submission Retrieval Endpoint URL field or edit its contents.
Click Save in the upper-right corner of the page.
In the dialog box that appears, click Save & Deploy.
If the bucket you’re using as your submission retrieval store is in a public cloud (as opposed to a government cloud or an S3-compatible internal setup), leave this field blank.
OCS
OCS Configuration
If you are using an OSC submission file store, enter the configuration details for your file store in this field.
To enter your configuration details:
Click Edit value.
Enter the configuration details in JSON format:
{ "host_url": "<your_host_url>", "username": "<your_username>", "password": "<your_password>", "ssl_cert": "<CA_bundle_filename_OR_SKIP>" }
The value of ssl_cert should match the CA bundle filename inside the $HS_PATH/certs directory. To disable certificate validation, set this value to SKIP.
Click Done.
Click Save in the upper-right corner of the page.
In the dialog box that appears, click Save & Deploy.
Generic Web Storage (HTTP/HTTPS)
Generic Web Storage (HTTP/HTTPS) Configuration
If you are using a generic web storage submission file store, enter the configuration details for your file store in this field.
We use Basic Authentication for Generic Web Storage Configuration.
To enter your configuration details:
Click Edit value.
Enter the configuration details in JSON format:
{ "username": "<your_username>", "password": "<your_password>", "ssl_cert": "<CA_bundle_filename_OR_SKIP>" }
The value of ssl_cert should match the CA bundle filename inside the $HS_PATH/certs directory. To disable certificate validation, set this value to SKIP.
Click Done.
Click Save in the upper-right corner of the page.
In the dialog box that appears, click Save & Deploy.
Azure Blob Storage
The Azure Blob Storage option for submission retrieval storage is available in v39.2 and later.
If you are using Azure Blob Storage as your submission retrieval store, you can use the fields described below to configure the system’s connection to the blob.
Azure Blob Storage Authentication Type
From the Azure Blob Storage Authentication Type drop-down list, select the authentication type the system should use to access the blob:
SAS Token Only
Service Principal
Managed Identity
Account Key
When you select an authentication type, additional settings appear.
Settings for SAS Token Only authentication
Name | Required? | Description |
---|---|---|
Azure Blob Storage Account URL | Yes | The URL of the storage account (e.g., https://<account_name>.blob.core.windows.net) |
Settings for Service Principal authentication
Name | Required? | Description |
---|---|---|
Azure Blob Storage Account URL | Yes | The URL of the storage account (e.g., https://<account_name>.blob.core.windows.net) |
Azure Blob Storage Tenant ID | No | The tenant ID of the service principal |
Azure Blob Storage Client ID | No | The client ID of the service principal. If multiple client IDs exist for the service principle, and Azure Blob Storage Client ID is left blank, the default client ID will be used. |
Azure Blob Storage Client Secret | No | The client secret for the service principal |
Azure Blob Storage Authority Host | No | The host of the Microsoft Entra authority for the storage account. If omitted, the host of the Azure Public Cloud authority (login.microsoftonline.com) is used. For a list of valid values, see Microsoft’s azure.identity.AzureAuthorityHosts class. |
Settings for Managed Identity authentication
Name | Required? | Description |
---|---|---|
Azure Blob Storage Account URL | Yes | The URL of the storage account (e.g., https://<account_name>.blob.core.windows.net) |
Azure Blob Storage Client ID | No | The client ID of the managed identity. If multiple client IDs exist for the managed identity, and Azure Blob Storage Client ID is left blank, the default client ID will be used. |
Settings for Account Key authentication
Name | Required? | Description |
---|---|---|
Azure Blob Storage Account URL | Yes | The URL of the storage account (e.g., https://<account_name>.blob.core.windows.net) |
Azure Blob Storage Account Key | No | The access key for the storage account |
Azure Blob Storage Account Name | No | The name of the storage account |
If incorrect authentication information is entered, the flow runs for the attempted file-ingestion attempts will fail. The flow runs’ output will contain error messages passed to the system by Azure.
For more information about troubleshooting flow runs, see Testing and Debugging Flows.
Classification Blocks
We’ve divided our Classification function into two blocks: one for Machine Classification and another for Manual Classification.
Machine Classification
With machine classification, Hyperscience can automatically match your submissions to Structured, Semi-structured, or Additional layouts. Machine classification requires training to recognize the kinds of submissions you process through Hyperscience.
Settings
Name | Required? | Description |
---|---|---|
Image Correction | No | Identifies and corrects the orientation of Semi-structured images by rotating them. Cannot be enabled if Faster PDF Transcription is enabled. |
Faster PDF Transcription | No | If enabled, the system processes pages in PDF files in their native format, allowing for faster transcription. If disabled, the system processes PDF pages by creating images of them and extracting data from those images. To ensure that this feature works as intended, only enable Faster PDF Transcription when submitting PDFs whose pages are correctly oriented and do not require rotation before processing. Cannot be enabled if Image Correction is enabled. If you are processing PDFs and other file types in your flow, consider creating a custom flow that routes PDFs to a Machine Classification Block that has Faster PDF Transcription enabled. |
Captured Image Enhancement | No | Improves machine readability of Semi-structured documents captured by mobile devices. To rotate and properly process Semi-structured documents captured by mobile devices, we recommend enabling both Captured Image Enhancement and Image Correction. Before enabling Captured Image Enhancement, make sure that the majority of the pages you will be processing are captured by mobile devices. Contact your Hyperscience representative for more information. |
Manual Classification
Manual Classification, or Classification Supervision, allows your keyers to manually match submissions to their layouts. Depending on your flow, keyers may perform Classification Supervision if the system cannot match a submission to a layout with high confidence.
Settings
You can customize the functionality of your Manual Classification Block by adding Task Restrictions.
Name | Required? | Description |
---|---|---|
Default task restrictions | No | Select the task restrictions that should be applied to tasks created by this block. See Task Restrictions Overview for more information. |
Identification Blocks
We’ve created two Identification blocks to cover both our machine Identification and Identification Supervision capabilities.
Machine Identification
With Machine Identification, you can automate the identification of fields and tables in your submissions.
Settings
Other than the settings under “Block Details,” Machine Identification Blocks have no block-specific settings.
Manual Identification
Manual Identification allows your keyers to complete Field ID Supervision or Table ID Supervision tasks, where they draw bounding boxes around the contents of certain fields, table columns, or table rows. This identification process ensures that the system transcribes the correct content in the Transcription steps of the data-extraction process.
Settings
You can customize the functionality of your Manual Identification Block by adding Task Restrictions.
Name | Required? | Description |
---|---|---|
Default task restrictions | No | Select the task restrictions that should be applied to tasks created by this block. See Task Restrictions Overview for more information. |
Transcription Blocks
Just as we did with Classification and Identification, we’ve divided our Transcription capabilities into Machine Transcription and Manual Transcription Blocks.
Machine Transcription
In the Machine Transcription Block of your flow, Hyperscience automatically transcribes the content of your submissions, whether the text was written by hand or typed through a machine.
Settings
Other than the settings under “Block Details,” Machine Transcription Blocks have no block-specific settings.
Manual Transcription
Manual Transcription, or Transcription Supervision, lets your keyers manually enter the text found in fields or tables. Depending on your settings, your keyers may manually transcribe certain pre-selected fields or fields that the machine could not transcribe with high confidence.
Settings
You can customize the functionality of your block by editing the settings described below.
Task Restrictions
Name | Required? | Description |
---|---|---|
Default task restrictions | No | Select the task restrictions that should be applied to tasks created by this block. See Task Restrictions Overview for more information. |
Supervision
Name | Required? | Description |
---|---|---|
Supervision Transcription masking | No | Prevents users from inputting invalid characters during Supervision Transcription tasks. |
Table output manual review | No | Generates a Table Transcription task if any table cells are identified, during which the keyer will complete a full manual review of both the transcribed data and its bounding boxes. If disabled, a Table Transcription task will only be generated if one or more cells have transcribed values below the accuracy thresholds defined in Application > Settings. |
Create Manual Transcription Task for Tables with Blank Cells | No | Always sends blank cells to Manual Transcription, regardless of machine confidence. Enabled by default. If disabled, a Table Transcription task will only be generated if one or more cells in the table have transcribed values below the accuracy thresholds defined in Application > Settings. |
Flexible Extraction Block
Depending on your flow’s configuration, Flexible Extraction tasks allow your keyers to:
validate transcriptions, or
add transcriptions to manually categorized Structured pages, which did not go through regular Transcription Supervision.
To use Flexible Extraction as a data-validation method, you need a Custom Code Block. The rules in that block determine when a document should be sent to Flexible Extraction, as well as whether the entire document or particular fields should be validated. For information on setting up Custom Code Blocks and Flexible Extraction Blocks in this way, contact your Hyperscience representative.
Settings
You can customize the functionality of your block by editing the settings described below.
Task Restrictions
Name | Required? | Description |
---|---|---|
Default task restrictions | No | Select the task restrictions that should be applied to tasks created by this block. See Task Restrictions Overview for more information. |
Supervision
Name | Required? | Description |
---|---|---|
Flexible Extraction Transcription masking | No | Prevents users from inputting invalid characters during Flexible Extraction tasks. |
Collation Block
We’ve created a Collation Block to allow the grouping of files, documents, and pages into cases. To learn more about Case Collation, see Case Collation.
Settings
You can customize the functionality of your block by editing the settings described below.
Name | Required? | Description |
---|---|---|
Replace case data from duplicate file names | No | Replaces case data from repeated file names within the same case. For example, you can enable this setting if you want to resubmit a file containing new data and remove the old, duplicated data from the case. Note that this setting does not delete the old data; it just removes it from the case. |
Custom Supervision Block
To enable the tailoring of a Supervision task’s interface to a specific business process, we’ve created a Custom Supervision Block.
To use Custom Supervision, you need a custom flow with a Custom Code Block:
A custom flow is required because the Custom Supervision Block is not included in the default Document Processing flow.
A Custom Code Block is required to define and format the data input that the Custom Supervision Block needs to show a task.
A Routing Block is not required, but it controls whether a submission is sent to Custom Supervision or not. Without a Routing Block, every submission to your custom flow will go to Custom Supervision.
Settings
Name | Required? | Description |
---|---|---|
Task purpose | Yes | The custom task name given to the Custom Supervision task in the Task Queue. To learn how to make this custom task name visible in the Task Queue, see Navigating the Task Queue. |
Default task restrictions | No | Select the task restrictions that should be applied to tasks created by this block. See Task Restrictions Overview for more information. |
Custom Supervision transcription masking | No | Prevents users from inputting invalid characters during Custom Supervision Transcription tasks. |
Database Blocks
Database Blocks allow you to make queries from Hyperscience to databases, which increases the overall speed of your flows by minimizing the need for manual transcription.
To learn more about Database Blocks, see Database Blocks.
Custom Code Blocks
Custom Code Blocks enable you to transform and validate extracted submission data before Hyperscience sends it to your downstream systems. The table below lists the kinds of post-processing rules you can implement with Custom Code Blocks.
Rule Type | Description | Examples |
---|---|---|
Field normalization / Data transformation | Change the formatting of data for compatibility with downstream systems |
|
Data validation | Perform an external data lookup or check data within the submission to make sure the data is valid, and flag the submission as NIGO if it is not |
|
Data augmentation | Add data to a submission's JSON to prevent processing issues or to route data to specific downstream systems |
|
You cannot add Custom Code Blocks to your flow through the platform. To learn more about Custom Code Blocks, or to add them to your flows, contact your Hyperscience representative.
Settings
If your business needs change and you need to modify a Custom Code Block’s code (e.g., add a keyword to a keyword search), you can do so with guidance from Hyperscience. To learn more, see Modifying Custom Code Blocks.
Named Entity Recognition Block
The Named Entity Recognition Block allows you to:
detect key PII entities such as:
Names
Addresses
Locations
Organizations
Companies
enhance full-page transcription output with information about detected entities.
You need to use Named Entity Recognition Blocks in conjunction with Full Page Transcription Blocks. For example, you can build a redaction flow that processes documents through full-page transcription, then detects all personal names, and at the end uses a Custom Code Block to put black boxes over the detected names.
Custom Entity Detection Block (Beta)
The Custom Entity Detection Block allows you to locate and identify:
single words, and
word patterns that can be described with a combination of regular expressions and keywords
You need to use Custom Entity Detection Blocks in conjunction with a block, such as the Full Page Transcription Block, that returns a collection of text segments. For example, you can build a redaction flow that processes documents through full-page transcription, then detects all phone numbers, addresses, and names, and at the end uses a Custom Code Block to place black boxes over the detected text segments.
The Custom Entity Detection Block is a beta feature that is not yet part of our Flows SDK. For information on setting up Custom Entity Detection Blocks, contact your Hyperscience representative.
Routing Blocks
Routing Blocks let you send submission data to different destinations based on the criteria you specify. In this way, Routing Blocks create branches in your flow.
You cannot add or configure Routing Blocks in the application. For assistance, contact your Hyperscience representative.
Settings
Other than the settings under “Block Details,” Routing Blocks have no block-specific settings.
API Blocks
API Blocks allow you to connect to other data sources in your organization in order to augment or verify extracted data. You can work with your Hyperscience representative to configure these blocks and place them anywhere in your flow after data extraction. API Blocks do not contain business logic; that logic lives in subsequent flow blocks.
We offer two types of API Blocks: HTTP Rest and SOAP.
For more information about API Blocks, see API Blocks.
Document Renderer
Hyperscience converts files into images before processing begins. To convert this data into a format that’s easier to process downstream, we’ve introduced the Document Renderer Block. This block allows you to download a PDF file from submissions that have gone through Machine or Manual Classification.
The Document Renderer block is included in Document Processing Subflow V40.
To configure it:
In the left-hand sidebar, click Flows, and click on the name of the flow that contains Document Processing Subflow V40 (e.g., Document Processing).
Click Edit Flows.
In Flow Studio, click Start Document Processing Subflow.
In the Settings Type drop-down list, click on Document Rendering.
Select the Document Rendering Enabled setting.
Enter your desired size and quality settings for rendered documents:
Adjust the page size (in inches or millimeters), width, and height.
Specify the quality of the images — By default, the quality is set to 50%, which balances image clarity and file size. We recommend using this default setting for best results. Lowering the quality reduces the file size but may make images less clear, while increasing the quality creates larger files with sharper images. For example, a document originally 1 MB in size can grow to 40 MB when rendered in high resolution.
Click Save.
Download a document
After you’ve completed your submission, a download URL is available in the submission’s JSON output. To download the documents:
Go to the Submissions page.
Open the submission whose documents you want to download.
Click Actions, and then click View JSON Output.
Use your browser’s search function to locate
download_url
.Copy the URL and append it to your environment’s URL (e.g.,
example.hyperscience.com/api/<URL>
).Choose a folder on your local machine to save the file, and the download will begin.
Complete Blocks
Every flow needs a Complete Block. This block initiates Quality Assurance tasks and changes the submission’s status to “Complete.”
Settings
Other than the settings under “Block Details,” Complete Blocks have no block-specific settings.
Output Blocks
Output Blocks were called “Output Connectors” in previous versions of Hyperscience. With Output Blocks, you can send data extracted by Hyperscience to other systems for downstream processing. If you want your flow to send notifications for submission statuses other than “Complete,” you will need to work with your Hyperscience representative to set up a separate Notification flow.
A full list of the Output Blocks we currently support, along with more details about each, can be found in Output Blocks.
You can control which Output Blocks are enabled in your flow at any time by selecting or deselecting the Enabled option in each Output Block.
Settings shared by all block types
All blocks have the following settings under “Block Details”:
Name | Required? | Description |
---|---|---|
Display Name | Yes | The block's name in Flow Studio. You can change the name for each of your flows. |
Description | No | The block's description in Flow Studio. You can change the description for each of your flows. |