Hyperscience extracts data from documents and converts them into a machine-readable format. We support Structured, Semi-structured, and Additional documents. To learn how to differentiate between the document types, see Understanding Document Types.
Structured use cases
Use Structured layouts for documents that follow a consistent, predefined format. The pages in these documents have a clear visual design with fixed fields and standardized, repetitive patterns.
Key aspects of Structured layouts
Fixed field locations - The fields are always in the same place on the document.
Standardized formats - The format of the forms is consistent across all documents.
Predictable patterns - The structure of the document remains consistent. To learn more about layout types, see Determining Layout Type.
In this article, you’ll learn how to build and evaluate your Structured models in Hyperscience for efficient document processing.
Step 1 - Sample your documents
Reviewing your documents is the first step in creating a robust, structured model.
Gather document samples: Collect examples of the document type you want to process. Include different variations if available. Learn more in the Layout Variations section of this article.
Review field data: Analyze the information in each field. Ensure the data fits the expected format and doesn’t include unusual or unsupported characters. Learn more in our Supported Characters and Default Data Types article.
Example: Check if the field “Date” follows the same format throughout your dataset.
Check for patterns and consistency: Consistent documents are crucial for your Structured model performance. Ensure all fields across the documents have the same position and format. That way, you will set the proper data type for each field when creating the Structured layout. Learn more about data types in What is a Data Type? and Choosing a Data Type.
Document quality: Remove documents that would reduce model performance (e.g., documents containing unrelated information or highly distorted, skewed, noisy, pixelated, or duplicated pages).
Edge cases: Check for documents that don’t fit your sample. Examples of edge cases are documents with unexpected formats.
Layout variations
Sometimes, your documents have small differences in their visual design. In Hyperscience, these differences are handled seamlessly by using layout variations within Structured layouts.
Layout variations
Layout variations occur when documents of the same type (such as HCFA or W-8 forms in the US) have the same key information but differ in how that information is arranged on the page.
Example
The position of fields (like names, dates, or totals) might be different from one document to another, or some fields might be missing in one document and present in another.
In the two documents below, notice how the “Home Phone” and “Cell Phone” fields remain the same, but additional fields like “Have you ever been convicted of a felony?” appear in one document and not the other.
Step 2 - Upload a blank form
To process filled documents with the same visual design through Hyperscience, you need to create a Structured layout based on a non-filled (blank) form.
Upload a blank form
Blank forms
Upload a blank form to create a Structured layout for the documents you want to process.
To upload a blank form:
Go to Library > Layouts.
Click Add Layout.
Click Structured Layout, and then click Next.
Upload a PDF, TIFF, JPG, or PNG file in one of the following ways:
Drag and drop the image file in the dialog box, or
click Choose File to upload from your machine.
If you upload a multi-page TIFF or PDF file, a single layout variation will be created from that file.
Upload additional files by clicking Add More Files. When uploading multiple JPG or PNG files, the order of the layout variations’ pages will match the order of the files shown during the upload process.
Click Next.
Enter a name for your layout in the Layout Name field.
Choose the language you expect people to use when filling out the documents from the Language drop-down menu.
Fields language
Some fields may use a different language than the rest of the document. You can set the language for each field in the Layout Editor.
Click Create.
Once the blank form is added, you need to create a Structured layout in the Layout Editor. The Layout Editor is where you define what information Hyperscience should extract from the document. Learn more in Step 3 - Create a Structured Layout section of this article.
Step 3 - Create a Structured layout
Field types
Create a layout that shows Hyperscience where to find the data you need in the Layout Editor. It helps you map fields on the form to the information you want to extract. You can extract data from the following data points:
Fields - The individual elements of a document that contain key information, such as names, dates, addresses, or amounts. For example, in an HCFA form, fields might include the “Name,” “Birthdate,” or “Address.” Hyperscience identifies and extracts data from these fields to automate document processing, making it easier to handle large volumes of documents accurately.
Checkboxes - A checkbox is used to capture two-option answers, like “Yes”/”No” or “True”/”False.” For example, a form might ask, “Are you legally authorized to work in the US?” with checkboxes for “Yes” or “No.” Hyperscience can detect whether a checkbox is checked or not and extract that information. To process checkboxes, you need to define the data type of this field as Checkbox.
Signatures - This field shows where a handwritten or digital signature is required or present. Hyperscience can detect if a signature is present or missing, but it doesn’t read what the signature says — only if it exists or not. To process signatures, you need to define the data type of this field as Signature. Learn more about Data Types in the section below.
Text segmentation and signatures
Signature segmentation is trained as a standalone model, separate from text and checkbox segmentation, allowing it to focus exclusively on identifying signatures in documents. That way, the model produces more reliable and complete bounding boxes around signatures.
The bounding boxes from text segmentation are used to refine the segmentation of signatures, ensuring that fragmented or partial signatures are consolidated for better accuracy. To learn more see our Text Segmentation article.
Data types
Data types help the system understand what kind of information to expect in a specific field. It allows the system to process your documents more accurately. For example:
The Numeric data type is used for fields that contain only numbers.
The Generic Text data type is for fields with sentences or general text.
Some data types like Date, Currency Amount, or Email Address expect specific formats or lengths.
To process Signatures or Checkboxes, make sure to use the Signature and Checkbox data types when creating your Structured layout.
To learn more about data types, see our What is a Data Type? article. You can also go to Default Data Types to see a list of all default data types in the system.
Creating a Structured layout
To access the Layout Editor:
Go to Library > Layouts and click on the name of your layout.
Once in the Layout Variations tab, click on the name of the variation.
Applying changes to a layout
You’re working on a draft version of the layout in the Layout Editor. To save or apply your changes, follow the steps in the Commit changes and deploy section of this article.
Layout Editor best practices
Drawing bounding boxes
Use your cursor to draw bounding boxes around each field you want to extract.
Ensure the bounding boxes are precise to avoid cutting off text, which could reduce machine confidence during extraction.
Use the Layout Editor’s zoom functionality for fields like checkboxes.
Use the arrow keys on your keyboard to adjust the position of the bounding box more precisely.
Copy and paste a bounding box with a similar size to avoid drawing it manually. Make sure the name and the data type match the field you’re working with. Alternatively, use the Duplicate (
) button.
Delete a field by pressing the Backspace button on your keyboard.
When drawing bounding boxes, include the field’s label to ensure the system will capture all the information filled within the field, ignoring the pre-printed text.
Duplicating layouts
While you cannot duplicate layouts from the application, you can copy all the fields of your layout and paste them into a new one.
Field names and data types
Assign a clear and descriptive name for each field.
Select the appropriate data type (e.g., Numeric, Generic Text, Date) based on the field’s content.
Configure field settings
Set the field properties, such as Transcription Supervision, Output Names, or specific data validations for each field.
You can also bulk-edit field settings by selecting multiple fields and applying changes simultaneously.
Setting | Description | Example |
---|---|---|
Field Name | This name is used to label the field in the system and should be easy to read. It also appears in the output when a submitted page matches the layout. | If the form label says Name (Last, First), you might name the field Applicant Name (Last, First) or Name_LastFirst in the Layout Editor. The goal is to make it easy to understand what the field contains when looking at the extracted data. |
Data Type | Defines the kind of data the field should contain. | If the field is meant to capture a date of birth, you should select Date as a data type. Doing so tells to the system to expect a date format like MM/DD/YYYY in that field. Learn more in Data Types. |
Output Name | Defines a programmatic name for each field, in addition to the human-readable display name. This name is included in the output for submitted pages matched to the layout. | If the display name is Applicant Name (Last, First), the output name might be applicant_name_last_first. This version is machine-friendly and often used in exported data or API responses. |
Transcription Supervision | Specifies the way the system handles the transcription of the field. Select one option from the drop-down list:
|
|
Multiline | This setting allows the system to process fields with more than one line of text. It will improve the machine’s processing of these fields. For most new fields, this checkbox will initially be unchecked; however, it will be automatically checked for bounding boxes greater than a certain height. |
|
Dropout | Indicates whether to ignore background text in a field, like pre-printed labels or symbols. When this setting is enabled, the system compares the uploaded blank with the submitted page and removes any pre-printed text, keeping only the new or handwritten content. If the checkbox is not selected, the entire field—including any pre-printed text—will be considered for transcription. This setting is enabled by default. | A form might have a pre-printed dollar sign or “.00” in an Amount field. If you check the Dropout checkbox, Hyperscience will ignore this fixed text and only capture what’s filled in the submitted document. |
Required | When a field is marked as “Required,” the system will apply special logic to the processing of submitted pages matched to that layout. | If the transcription of a required field is determined to be blank, or if the field is marked illegible, an exception will be generated stating that the value of the required field was missing. |
Duplicate | This setting allows you to configure a field to be extracted only once, even if it appears multiple times in a document. Only the first occurrence of the field is included in the output, saving time and avoiding redundant data processing. This option is available only for single-page Structured layouts. For more information, please contact your Hyperscience representative. | Use this setting when a single-page layout might match multiple pages in a submission, and you only want to extract the field once. Enabling this setting is helpful when the same page is repeated multiple times and the field value stays the same:
When you enable this setting for the Payer ID field, the system will extract it from the first HCFA page in the submission and ignore it on the rest. |
Not in Language | Use this setting if you expect a field to contain text in a different language than the one set for the layout.
To learn more, see Supported Languages. | If your layout language is Korean, but a specific field will contain English, select Not in Korean and choose English from the drop-down list. |
Beta features | ||
Automatic field cloning | Automatically detects shapes that are geometrically similar to the currently selected field and allows you to convert them into actual fields. | Useful for creating multiple checkboxes (and other similar repeating fields) by manually drawing only one of them. |
Bounding box one-click mode | If enabled, the system automatically predicts field bounding boxes in the Layout Editor for Structured documents. | Click once inside a field to automatically draw the predicted bounding box. |
PDF extraction | Creates bounding boxes and determines field names on layout variations by reading PDF-field metadata.
| For example, if you are creating the first layout variation of an HCFA form with embedded PDF-field metadata, the system will automatically create bounding boxes and assign field names.
|
Step 4 - Adjust your layout variations
When creating a new variation in the Layout Editor, you can choose to start from an existing variation. When you use an existing variation, the system copies over all bounding boxes and field settings, allowing you to make only the necessary adjustments. This approach helps:
Save time by reusing existing work.
Maintain consistency across similar layouts.
Avoid manual reconfiguration of fields.
Use this option when you're working with documents that have minor layout differences, such as:
Slightly repositioned fields.
Additional or missing sections.
Small formatting updates between document versions.
Using layout variations
You must have at least one existing variation in the layout. If no variations are available, you'll need to create a new layout from scratch.
Start from an existing variation
Click the Start From Variation drop-down list to select an existing variation as a base. The system then copies the bounding boxes and field settings from the selected variation to the new one.
Shared fields
Shared fields are the ones included in all of a layout’s variations.
When you rename a shared field, the new name will appear in all variations where the field exists.
Deleting a shared field will remove it entirely from the layout and all variations. A system warning is displayed when you attempt to delete a shared field.
Review the fields in your layout
Before making changes, review how the field is used in all variations.
Updating the data type of a shared field (e.g., from Numeric to Generic Text) will affect all variations.
Active and Inactive Items
If you want to remove a field from one variation but keep it in others, you should deactivate the field instead of deleting it. Doing so ensures that the field is preserved in other variations while being hidden or ignored in the current variation.
To deactivate a field:
Select the field you want to hide in your current variation:
In the Layout Editor, click on the field you want to deactivate.
Deactivate the field:
Move the field to the Inactive Items list by clicking the Deactivate Fields (
) button. This action removes it from the variation but keeps it available in the layout. You can find it in the Inactive Items list in the Layout Editor.
Restore an Inactive Field:
To reactivate the field in a variation, go to the Inactive Items list and click the
button to move it back to the Active Items list.
Using Active and Inactive Items
This feature prevents accidental deletion of fields used in other variations and helps you to maintain consistency, ensuring the layout remains adaptable for future needs. Below are some example scenarios:
Scenario 1: Two variations with shared fields
Variation A and Variation B both have a shared field, "Customer Name."
If you change the data type of "Customer Name" in Variation A from Generic Text to Numeric, this change will also apply to Variation B.
Scenario 2: Removing a field from a single variation
You need to remove the "Invoice Number" field from Variation B but want to keep it in Variation A. Instead of deleting the field, deactivate it in Variation B. Doing so ensures it remains active in Variation A.
Step 5 - Commit changes and deploy
Auto-save
All layout drafts are automatically saved, so there is no need to manually save your changes.
To commit your changes to the draft version of the layout variation, click Commit Changes on the right-side of the page.
To leave the Layout Editor without committing your changes, click the X in the upper-right corner of the page.
You will still be able to commit your changes later by clicking Commit Changes on the layout’s details page.
To match submitted pages to a layout, the layout must have at least one committed version, and the layout must be deployed in a live release. For more information about layout versions and creating releases, see our What is a Layout Version?, Editing and Finalizing a Layout Version, and What is a Release? articles.
Deploy your release by assigning it to a flow. Learn more in Assigning a Release to a Flow.
Step 6 - Evaluate Structured models
After you’ve committed changes and deployed your Structured layout, it is Live and ready to process documents. The next step is to upload your filled documents and monitor the model’s performance. In this section, you will learn more about evaluating Structured models.
Upload your completed documents
Upload your documents as submissions by following the steps below.
Go to Submissions.
Click Create Submission.
Upload the filled documents. If you’re uploading multiple documents at once, select One Submission per file to evaluate the performance of each document.
Click Next.
Select the flow you’re using for the model from the Flow drop-down list.
Select the layout used for the model from the Layout drop-down list.
Click Upload.
Learn more about submissions in our How a File Becomes a Submission article.
Using QA to improve accuracy
Use the Documents tab to review the extracted data and identify potential issues. This step is key to ensuring the quality of your model's predictions. For Structured and Semi-Structured documents, Hyperscience provides Field Transcription QA, which focuses on validating field-level data to refine accuracy. Learn more in Transcription Quality Assurance and Accuracy.
Field Transcription QA allows you to sample and review individual fields from submitted documents. For instance, if a document has 10 fields and the QA sampling rate is set to 10%, each field has a 10% chance of being selected for review. This means that over time, about 10% of all fields will be reviewed, but for any single document, the number of reviewed fields can vary—it may be none, one, or more than one. This targeted approach ensures efficiency while maintaining high-quality validation. Learn more about scoring in Transcription Accuracy and Automation.
To enable Field Transcription QA:
Go to Flows and click on the flow you want to enable QA for.
Click Edit Flows to access the Flow Studio.
Click Start Document Processing Subflow and choose General Transcription from the Settings Type drop-down menu.
Select the Transcription Quality Assurance checkbox.
Click Save to confirm the changes.
Once enabled, Field Transcription QA helps ensure that the data extracted from your documents aligns with the ground truth, offering a feedback loop that continuously improves model performance. Learn more about improving the performance of your Structured layouts in our Layout Performance Reports and Improving Layout Performance articles.