Creating Structured Layouts

If you plan on processing completed forms through Hyperscience, you first need to create a Structured layout for the form. After you add the layout to Hyperscience, you can use the Layout Editor to define what information the system should extract from the processed pages that match the specific layout.

Create a Structured layout

  1. Go to Library > Layouts.

  2. Click Add Layout.

  3. Click Structured Layout, and then click Next.

  4. Upload a PDF, TIFF, JPG, or PNG file in one of the following ways:

    • Drag and drop the image file into the box provided.

    • Click Choose File to find the file on your machine and upload it.

      • If you upload a multi-page TIFF or PDF file, a single layout variation will be created from the file.

  5. If your form consists of multiple files, upload additional files by clicking Add More Files.

    • When uploading multiple JPG or PNG files, the order of the layout variation’s pages will match the order of the files shown during the upload process.

    • The system will create separate layouts for each TIFF or PDF file that you upload.

  6. Click Next.

  7. Enter a name for your layout in the Layout Name field.

  8. Under Language, click on the drop-down list, and click on the language that you expect people will use when entering information in this layout’s documents.

    • If you expect specific fields to contain text in different languages than the document’s main language, you can select languages specific to those fields in the Layout Editor. In the Layout Editor, selecting the Not in option allows you to select a language for each of those fields. You can choose from any of the languages we support.

  9. Click Create.

By following these steps, you will create a layout with a single variation when you follow the steps above. You can add variations to the layout by following the steps in Adding a Variation to a Layout.

To make edits to your layout, you’ll need to go to the Layout Editor, as described in the next section.

Access the Layout Editor 

You can specify the information that should be extracted from submitted pages that match the layout by defining the field metadata in the Layout Editor. 

To access the Layout Editor:

  1. Go to Library > Layouts, and click on the name of the layout you want to edit.

  2. Click on Layout Variations, and click on the name of the variation. 

When in the Layout Editor, you’ll be making changes to the working version of the layout. To commit your changes—or to save without committing your changes—follow the steps in Saving and exiting below. 

Using the Layout Editor

LayoutEditor.png

The layout’s title appears across the top of the page. The button to commit changes appears in the upper-right corner of the page. To learn more about committing changes, see the Saving and exiting section later in this article.

The Layout Editor for Structured layouts is split into three panels: page thumbnails, the main page view, and the field list. 

Page viewer

To change between pages in the layout or layout variation, click on the page thumbnails on the left-most panel, or use the paging controls above the table of extracted fields and layout identifiers. Changing pages will automatically save your progress.

  • Layouts consist of fields. Each field specifies a piece of information to be extracted from a submitted page matched to that layout.

Field list

The right panel displays both Layout Identifiers and Fields together, across all pages. To learn more about Layout Identifiers, see Layout Identifiers.

  • Fields are highlighted in blue. Layout IDs are highlighted in green.

  • Selecting a field or layout ID on the right panel will focus the page image in the middle panel to the corresponding bounding box for that field.

  • Use the search bar to find data types or field name for a field that has been defined in the layout. 

    • You can also click Filters to filter the list by data type.

When you create a Structured layout, the system attempts to identify its fields and adds them to the fields list. You can edit the bounding boxes and any settings for these fields.

Any fields added by you or the system are added to the layout’s shared field list. If you create variations of the layout, the fields in the shared field list will be available in those variations. In the Layout Editor, the shared field list is split into two lists:

  • Active items, which have bounding boxes in the layout variation

  • Inactive items, which have not yet been defined in the layout variation 

To learn more about working with active and inactive fields, see Editing a Structured Layout Variation.

Defining fields

On a layout for a Structured document, fields have two elements:

  • A bounding box that defines the area on the page where one expects to find a field.

  • Metadata, which consists of settings used by the machine to read and process the field value, as well as names used to label the information extracted from the field.

To define a field, click and drag on the page image to create a bounding box. If the field already appears in the field list, click on its name before drawing its bounding box.

  • Ensure the bounding box covers the area of the layout image where information is expected to appear.

  • To resize, click on the bounding box, then click and drag on a corner or side to expand or shrink the box.

  • Make incremental adjustments by using the respective arrow keys on the keyboard, or by first clicking in the center of the bounding box and then dragging the box in the desired direction.

Defining field metadata

After drawing the bounding box, define the field metadata in the field list on the right. The following pieces of metadata can be defined for each field on a Structured layout:

Field Name - labels the field throughout Hyperscience and is intended to be human-readable. This name is also provided in the output for submitted pages matched to the layout.

Data type - designates the type of data that is expected for the given field.  

Output Name - allows you to provide a programmatic name for each field, in addition to the human-readable display name. This name is included in the output for submitted pages matched to the layout.

Transcription Supervision - allows you to specify the field’s transcription handling. The possible values and their meanings are as follows:

  • Autotranscribe - a field with this setting will not generate manual transcription tasks and will instead output the machine’s transcription value. This setting is appropriate for scenarios where the accuracy of a given field’s data is not sufficiently critical to warrant human review, but the machine’s best guess would be helpful to output.

    • If the machine’s confidence in its transcription is below the set threshold, an illegible field exception will be output instead of the machine’s transcription value.

  • Default - if toggled on, the field will undergo machine transcription. If the machine’s confidence level on its transcription is above the set threshold, the machine’s transcription value will be output. If the confidence level is below the threshold, the field will be sent to Supervision as a manual transcription task.

    • On a new field, the Transcription Supervision setting will initially be set to Default, and can subsequently be changed by the user.

  • Always - when the Transcription Supervision setting is set to Always, the system will always send the field to Supervision as a manual transcription task, regardless of the machine’s confidence in its transcription. This is used to guarantee a manual review of the field.

  • Consensus - when the Transcription Supervision setting is set to Consensus, the system does not record a value for the field until it receives the same post-normalization transcription value twice. This means that at least one manual transcription of the field will be required, regardless of the machine’s confidence in its transcription. This is used to indicate when accurate transcription of a given field is particularly important.

For each field, a field name and data type must be defined. If either is missing, an error message saying "Missing Field Information" will be shown, and the field will be highlighted in red in the layout editor. 

Additional Settings

There are additional settings and features you can enable to improve machine performance: 

  • Multiline - this setting should be enabled for any field where more than one line of text is expected and will improve the machine’s processing on these fields. For most new fields, this checkbox will initially be unchecked; however, it will be automatically checked for bounding boxes greater than a certain height.

  • Dropout - the dropout feature can be enabled to indicate to the machine whether any background features in the field area should be included in the transcription of that field on matched pages.

    • For example, a field stating a recurring contribution amount may include a pre-printed dollar sign at the front, as well as a pre-printed “.00” at the end if only whole dollar recurring contributions are allowed. Or, a field may have a printed header or label that is included in the bounding box, as in this image.

    • Checking the dropout indicator will indicate to the machine to compare the layout image with the submitted page image and “drop out” this pre-printed text, leaving only the text which differs between the layout image and the submitted page. Conversely, deselecting the dropout checkbox will indicate to the machine that the full field value, including any pre-printed text, is eligible for transcription.

  • Required - when a field is marked required, the system will apply special logic to the processing of submitted pages matched to that layout. If the transcription of a required field is determined to be blank, or if the field is marked illegible, an exception will be generated stating that the value of the required field was missing.

  • Duplicate - if you expect a field to appear on multiple pages of a document composed of repeated pages, you can select this option to ensure that the system extracts only the first instance of the field and not subsequent instances. Doing so prevents the system from generating repetitive Transcription tasks, which reduces keyer effort.

    • The Duplicate option is available only in single-page layouts. When one of these layouts matches to multiple pages in a submission, the system will extract the field instance that appears on the first submission page that matches to the layout.

  • Not in - if you expect this field to contain text in a language that differs from the one selected during the layout-creation process, select the Not in option, and then select the language from the Language drop-down list that appears. For example, if you selected Korean as your layout’s language but expect a certain field to contain English text, you would select the Not in Korean option, and then you would select English from the Language drop-down list.

    • You can choose from any of our supported languages, regardless of their language family. To learn more about language families, see Supported Languages.

    • You can select only one language for each field.

Editing multiple fields

You can select multiple fields from the field list and change the following parameters:

  • Data type

  • Supervision settings (Autotranscribe, Default, Always, Consensus)

  • Field configurations (Multiline, Dropout, Required/Not required, Duplicate)

  • Size and location of bounding boxes

To select multiple, non-contiguous items use CMD + Click and to select contiguous items use ShiftClick.

Saving and exiting

All layout working versions auto-save, so there is no need to manually save changes that have been made.

  • To commit your changes to the working version of the layout variation, click Commit Changes at the top of the page.

  • To leave the Layout Editor without committing your changes, click the X in the upper-right corner of the page. 

    • You will still be able to commit your changes later by clicking Commit Changes on the layout’s details page. 

  • To match submitted pages to a layout, the layout must have at least one committed version, and the layout must be deployed in a live release. For more information about layout versions and creating releases, see the articles What is a Layout Version?, Editing and Finalizing a Layout Version, and What is a Release?.