Training a Structured Model

Hyperscience extracts data from documents and converts them into a machine-readable format. We support Structured, Semi-structured, and Additional documents. To learn how to differentiate between the document types, see Understanding Document Types.

Structured use cases

Use Structured layouts for documents that follow a consistent, predefined format. The pages in these documents have a clear visual design with fixed fields and standardized, repetitive patterns.

Key aspects of Structured layouts

  • Fixed field locations - The fields are always in the same place on the document.

  • Standardized formats - The format of the forms is consistent across all documents.

  • Predictable patterns - The structure of the document remains consistent. To learn more about layout types, see Determining Layout Type.

In this article, you’ll learn how to build and evaluate your Structured models in Hyperscience for efficient document processing.

Step 1 - Sample your documents

Reviewing your documents is the first step in creating a robust, structured model.

  • Gather document samples: Collect examples of the document type you want to process. Include different variations if available. Learn more in the Layout Variations section of this article.

  • Review field data: Analyze the information in each field. Ensure the data fits the expected format and doesn’t include unusual or unsupported characters. Learn more in our Supported Characters and Default Data Types article.

    • Example: Check if the field “Date” follows the same format throughout your dataset.  

  • Check for patterns and consistency: Consistent documents are crucial for your Structured model performance. Ensure all fields across the documents have the same position and format. That way, you will set the proper data type for each field when creating the Structured layout. Learn more about data types in What is a Data Type? and Choosing a Data Type.

  • Document quality: Remove documents that would reduce model performance (e.g., documents containing unrelated information or highly distorted, skewed, noisy, pixelated, or duplicated pages).

  • Edge cases: Check for documents that don’t fit your sample. Examples of edge cases are documents with unexpected formats.

Layout variations

Sometimes, your documents have small differences in their visual design. In Hyperscience, these differences are handled seamlessly by using layout variations within Structured layouts.

Layout variations

Layout variations occur when documents of the same type (such as HCFA or W-8 forms in the US) have the same key information but differ in how that information is arranged on the page.

Example

  • The position of fields (like names, dates, or totals) might be different from one document to another, or some fields might be missing in one document and present in another.

  • In the two documents below, notice how the “Home Phone” and “Cell Phone” fields remain the same, but additional fields like “Have you ever been convicted of a felony?” appear in one document and not the other.

Step 2 - Upload a blank form

To process filled documents with the same visual design through Hyperscience, you need to create a Structured layout based on a non-filled (blank) form.

Upload a blank form

Blank forms

Upload a blank form to create a Structured layout for the documents you want to process.

To upload a blank form:

  1. Go to Library > Layouts.

  2. Click Add Layout.

  3. Click Structured Layout, and then click Next.

  4. Upload a PDF, TIFF, JPG, or PNG file in one of the following ways:

    • Drag and drop the image file in the dialog box, or

    • click Choose File to upload from your machine.

      • If you upload a multi-page TIFF or PDF file, a single layout variation will be created from that file.

      • Upload additional files by clicking Add More Files. When uploading multiple JPG or PNG files, the order of the layout variations’ pages will match the order of the files shown during the upload process.

  5. Click Next.

  6. Enter a name for your layout in the Layout Name field.

  7. Choose the language you expect people to use when filling out the documents from the Language drop-down menu.

    Fields language

    Some fields may use a different language than the rest of the document. You can set the language for each field in the Layout Editor.  

  8. Click Create.

Once the blank form is added, you need to create a Structured layout in the Layout Editor. The Layout Editor is where you define what information Hyperscience should extract from the document. Learn more in Step 3 - Create a Structured Layout section of this article.

Step 3 - Create a Structured layout

Field types

Create a layout that shows Hyperscience where to find the data you need in the Layout Editor. It helps you map fields on the form to the information you want to extract. You can extract data from the following data points:

  • Fields - The individual elements of a document that contain key information, such as names, dates, addresses, or amounts. For example, in an HCFA form, fields might include the “Name,” “Birthdate,” or “Address.” Hyperscience identifies and extracts data from these fields to automate document processing, making it easier to handle large volumes of documents accurately.

  • Checkboxes - A checkbox is used to capture two-option answers, like “Yes”/”No” or “True”/”False.” For example, a form might ask, “Are you legally authorized to work in the US?” with checkboxes for “Yes” or “No.” Hyperscience can detect whether a checkbox is checked or not and extract that information. To process checkboxes, you need to define the data type of this field as Checkbox.

  • Signatures - This field shows where a handwritten or digital signature is required or present. Hyperscience can detect if a signature is present or missing, but it doesn’t read what the signature says — only if it exists or not. To process signatures, you need to define the data type of this field as Signature. Learn more about Data Types in the section below.

Text segmentation and signatures

Signature segmentation is trained as a standalone model, separate from text and checkbox segmentation, allowing it to focus exclusively on identifying signatures in documents. That way, the model produces more reliable and complete bounding boxes around signatures.

The bounding boxes from text segmentation are used to refine the segmentation of signatures, ensuring that fragmented or partial signatures are consolidated for better accuracy. To learn more see our Text Segmentation article.

Data types

Data types help the system understand what kind of information to expect in a specific field. It allows the system to process your documents more accurately. For example:

  • The Numeric data type is used for fields that contain only numbers.

  • The Generic Text data type is for fields with sentences or general text.

  • Some data types like Date, Currency Amount, or Email Address expect specific formats or lengths.

  • To process Signatures or Checkboxes, make sure to use the Signature and Checkbox data types when creating your Structured layout.

To learn more about data types, see our What is a Data Type? article. You can also go to Default Data Types to see a list of all default data types in the system.

Creating a Structured layout

To access the Layout Editor:

  1. Go to Library > Layouts and click on the name of your layout.

  2. Once in the Layout Variations tab, click on the name of the variation.

Applying changes to a layout

You’re working on a draft version of the layout in the Layout Editor. To save or apply your changes, follow the steps in the Commit changes and deploy section of this article.

Layout Editor best practices

Drawing bounding boxes

  • Use your cursor to draw bounding boxes around each field you want to extract.

  • Ensure the bounding boxes are precise to avoid cutting off text, which could reduce machine confidence during extraction.

    • Use the Layout Editor’s zoom functionality for fields like checkboxes.

    • Use the arrow keys on your keyboard to adjust the position of the bounding box more precisely.

    • Copy and paste a bounding box with a similar size to avoid drawing it manually. Make sure the name and the data type match the field you’re working with. Alternatively, use the Duplicate () button.

  • Delete a field by pressing the Backspace button on your keyboard.

  • When drawing bounding boxes, include the field’s label to ensure the system will capture all the information filled within the field, ignoring the pre-printed text.  

Duplicating layouts

While you cannot duplicate layouts from the application, you can copy all the fields of your layout and paste them into a new one.

Field names and data types

  • Assign a clear and descriptive name for each field.

  • Select the appropriate data type (e.g., Numeric, Generic Text, Date) based on the field’s content.

Configure field settings

  • Set the field properties, such as Transcription Supervision, Output Names, or specific data validations for each field.

  • You can also bulk-edit field settings by selecting multiple fields and applying changes simultaneously.

Setting

Description

Example

Field Name

This name is used to label the field in the system and should be easy to read. It also appears in the output when a submitted page matches the layout.

If the form label says Name (Last, First), you might name the field Applicant Name (Last, First) or Name_LastFirst in the Layout Editor. The goal is to make it easy to understand what the field contains when looking at the extracted data.

Data Type

Defines the kind of data the field should contain.

If the field is meant to capture a date of birth, you should select Date as a data type. Doing so tells to the system to expect a date format like MM/DD/YYYY in that field. Learn more in Data Types.

Output Name

Defines a programmatic name for each field, in addition to the human-readable display name. This name is included in the output for submitted pages matched to the layout.

If the display name is Applicant Name (Last, First), the output name might be applicant_name_last_first. This version is machine-friendly and often used in exported data or API responses.

Transcription Supervision

Specifies the way the system handles the transcription of the field. Select one option from the drop-down list:

  • Autotranscribe - The field will not be sent for manual review. Instead, the system will use the machine’s best guess as the final value.

  • Default - The field will be sent for machine transcription. On a new field, the Transcription Supervision setting will initially be set to Default and can subsequently be changed by the user.

  • Always - The system will always send the field to Supervision as a Manual Transcription task, regardless of the machine’s confidence in its transcription.

  • Consensus - The system does not record a value for the field until it receives the same post-normalization transcription value twice. When this option is selected, at least one manual transcription of the field will be required, regardless of the machine’s confidence in its transcription. Select Consensus when an accurate transcription of a given field is particularly important.

  • Autotranscribe - This setting is appropriate for scenarios where the accuracy of a given field’s data is not sufficiently critical to warrant human review, but the machine’s best guess would be helpful to output. If the machine’s confidence in its transcription is below the set threshold, an illegible field exception will be output instead of the machine’s transcription value.

  • Default -  If the machine’s confidence level on its transcription is above the set threshold, the machine’s transcription value will be output. If the confidence level is below the threshold, the field will be sent to Supervision as a Manual Transcription task.

  • Always - If a field is cropped, distorted, or poorly written, this setting will always send it to Supervision, generating a Manual Transcription task. This setting is used to guarantee a manual review of the field.

  • Consensus - Learn more in Transcription Supervision Consensus.

Multiline

This setting allows the system to process fields with more than one line of text. It will improve the machine’s processing of these fields. For most new fields, this checkbox will initially be unchecked; however, it will be automatically checked for bounding boxes greater than a certain height.

  • Use multiline for fields like Address, Description, or any field where more than one line of text is expected.

Dropout

Indicates whether to ignore background text in a field, like pre-printed labels or symbols. When this setting is enabled, the system compares the uploaded blank with the submitted page and removes any pre-printed text, keeping only the new or handwritten content. If the checkbox is not selected, the entire field—including any pre-printed text—will be considered for transcription. This setting is enabled by default.

A form might have a pre-printed dollar sign or “.00” in an Amount field. If you check the Dropout checkbox, Hyperscience will ignore this fixed text and only capture what’s filled in the submitted document.

Required

When a field is marked as “Required,” the system will apply special logic to the processing of submitted pages matched to that layout.

If the transcription of a required field is determined to be blank, or if the field is marked illegible, an exception will be generated stating that the value of the required field was missing.

Duplicate

This setting allows you to configure a field to be extracted only once, even if it appears multiple times in a document. Only the first occurrence of the field is included in the output, saving time and avoiding redundant data processing.

This option is available only for single-page Structured layouts. For more information, please contact your Hyperscience representative.

Use this setting when a single-page layout might match multiple pages in a submission, and you only want to extract the field once. Enabling this setting is helpful when the same page is repeated multiple times and the field value stays the same:

  • You’ve created a single-page layout for the HCFA-1500 form.

  • A submission includes 5 filled HCFA pages, one for each patient visit, but each page contains the same Payer ID.

  • Because the Payer ID is the same on each page, there’s no need to extract it multiple times.

When you enable this setting for the Payer ID field, the system will extract it from the first HCFA page in the submission and ignore it on the rest.

Not in Language

Use this setting if you expect a field to contain text in a different language than the one set for the layout.

  • You can choose any supported language, even if it’s from a different language family.

  • Only one language can be selected per field.

To learn more, see Supported Languages.

If your layout language is Korean, but a specific field will contain English, select Not in Korean and choose English from the drop-down list.

Beta features

Automatic field cloning

Automatically detects shapes that are geometrically similar to the currently selected field and allows you to convert them into actual fields.

Useful for creating multiple checkboxes (and other similar repeating fields) by manually drawing only one of them.

Bounding box one-click mode

If enabled, the system automatically predicts field bounding boxes in the Layout Editor for Structured documents.

Click once inside a field to automatically draw the predicted bounding box.

PDF extraction

Creates bounding boxes and determines field names on layout variations by reading PDF-field metadata.

  • PDF Extraction can only be used when creating the first variation of a layout.

  • PDF Extraction cannot be used when creating subsequent layout variations.

For example, if you are creating the first layout variation of an HCFA form with embedded PDF-field metadata, the system will automatically create bounding boxes and assign field names.

  • Don’t use PDF Extraction if you are adding new layout variations.

Step 4 - Adjust your layout variations

When creating a new variation in the Layout Editor, you can choose to start from an existing variation. When you use an existing variation, the system copies over all bounding boxes and field settings, allowing you to make only the necessary adjustments. This approach helps:

  • Save time by reusing existing work.

  • Maintain consistency across similar layouts.

  • Avoid manual reconfiguration of fields.

Use this option when you're working with documents that have minor layout differences, such as:

  • Slightly repositioned fields.

  • Additional or missing sections.

  • Small formatting updates between document versions.

Using layout variations

You must have at least one existing variation in the layout. If no variations are available, you'll need to create a new layout from scratch.

Start from an existing variation

  • Click the Start From Variation drop-down list to select an existing variation as a base. The system then copies the bounding boxes and field settings from the selected variation to the new one.

Shared fields

Shared fields are the ones included in all of a layout’s variations.

  • When you rename a shared field, the new name will appear in all variations where the field exists.

  • Deleting a shared field will remove it entirely from the layout and all variations. A system warning is displayed when you attempt to delete a shared field.

    Review the fields in your layout

    Before making changes, review how the field is used in all variations.

  • Updating the data type of a shared field (e.g., from Numeric to Generic Text) will affect all variations.

Active and Inactive Items

If you want to remove a field from one variation but keep it in others, you should deactivate the field instead of deleting it. Doing so ensures that the field is preserved in other variations while being hidden or ignored in the current variation.

To deactivate a field:

  1. Select the field you want to hide in your current variation:

    • In the Layout Editor, click on the field you want to deactivate.

  2. Deactivate the field:

    • Move the field to the Inactive Items list by clicking the Deactivate Fields () button. This action removes it from the variation but keeps it available in the layout. You can find it in the Inactive Items list in the Layout Editor.

    • Restore an Inactive Field:

      • To reactivate the field in a variation, go to the Inactive Items list and click the button to move it back to the Active Items list.

Using Active and Inactive Items

This feature prevents accidental deletion of fields used in other variations and helps you to maintain consistency, ensuring the layout remains adaptable for future needs. Below are some example scenarios:

  • Scenario 1: Two variations with shared fields

    • Variation A and Variation B both have a shared field, "Customer Name."

    • If you change the data type of "Customer Name" in Variation A from Generic Text to Numeric, this change will also apply to Variation B.

  • Scenario 2: Removing a field from a single variation

    • You need to remove the "Invoice Number" field from Variation B but want to keep it in Variation A. Instead of deleting the field, deactivate it in Variation B. Doing so ensures it remains active in Variation A.

Step 5 - Commit changes and deploy

Auto-save

All layout drafts are automatically saved, so there is no need to manually save your changes.

  • To commit your changes to the draft version of the layout variation, click Commit Changes on the right-side of the page.

  • To leave the Layout Editor without committing your changes, click the X in the upper-right corner of the page.

    • You will still be able to commit your changes later by clicking Commit Changes on the layout’s details page.

  • To match submitted pages to a layout, the layout must have at least one committed version, and the layout must be deployed in a live release. For more information about layout versions and creating releases, see our What is a Layout Version?, Editing and Finalizing a Layout Version, and What is a Release? articles.

  • Deploy your release by assigning it to a flow. Learn more in Assigning a Release to a Flow.

Step 6 - Evaluate Structured models

After you’ve committed changes and deployed your Structured layout, it is Live and ready to process documents. The next step is to upload your filled documents and monitor the model’s performance. In this section, you will learn more about evaluating Structured models.

Upload your completed documents

Upload your documents as submissions by following the steps below.

  1. Go to Submissions.

  2. Click Create Submission.

  3. Upload the filled documents. If you’re uploading multiple documents at once, select One Submission per file to evaluate the performance of each document.

  4. Click Next.

  5. Select the flow you’re using for the model from the Flow drop-down list.

  6. Select the layout used for the model from the Layout drop-down list.

  7. Click Upload.

Learn more about submissions in our How a File Becomes a Submission article.

Using QA to improve accuracy

Use the Documents tab to review the extracted data and identify potential issues. This step is key to ensuring the quality of your model's predictions. For Structured and Semi-Structured documents, Hyperscience provides Field Transcription QA, which focuses on validating field-level data to refine accuracy. Learn more in Transcription Quality Assurance and Accuracy.

Field Transcription QA allows you to sample and review individual fields from submitted documents. For instance, if a document has 10 fields and the QA sampling rate is set to 10%, each field has a 10% chance of being selected for review. This means that over time, about 10% of all fields will be reviewed, but for any single document, the number of reviewed fields can vary—it may be none, one, or more than one. This targeted approach ensures efficiency while maintaining high-quality validation. Learn more about scoring in Transcription Accuracy and Automation.

To enable Field Transcription QA:

  1. Go to Flows and click on the flow you want to enable QA for.

  2. Click Edit Flows to access the Flow Studio.

  3. Click Start Document Processing Subflow and choose General Transcription from the Settings Type drop-down menu.

  4. Select the Transcription Quality Assurance checkbox.

  5. Click Save to confirm the changes.

Once enabled, Field Transcription QA helps ensure that the data extracted from your documents aligns with the ground truth, offering a feedback loop that continuously improves model performance. Learn more about improving the performance of your Structured layouts in our Layout Performance Reports and Improving Layout Performance articles.