Field Identification

Overview

Field Identification (Field ID) tasks are specific to Semi-structured documents. These tasks are created when the system is unsure if it has correctly located a field, or if a field is specified for manual identification in the Layout Editor.

The task begins with an instruction screen that states best practices for completing Field ID tasks. You may choose to dismiss this screen after its first appearance.

Screen_Shot_2019-10-31_at_2.14.13_PM.png

Documents without tables

On the Field Identification page, a list of the layout's possible fields appears in the right-hand sidebar.

If you have a trained model for the layout, we may have predictions for the placement of some fields. As part of Field Identification, you will need to:

  • confirm or correct our predictions, and 

  • manually identify any fields we don't have predictions for.

Before you begin…

To help us transcribe each field accurately, review our Best Practices for Field ID Supervision and QA and keep them in mind as you identify each field's content. 

You can complete Field ID Supervision by following the steps below.

  1. Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts. The selected field will be highlighted.

    • If we have a prediction for the field, its icon in the right-hand sidebar appears in blue ( FieldIDPredictionIcon.png). The prediction appears on the image as a blue bounding box, and you can move to step 3.

  2. If we don't have a prediction for the field, you will need to manually identify the field's content on the page. 

    • Click on the field's content. A bounding box appears around the field's content, which is highlighted in blue. The size of the bounding box is automatically adjusted depending on the field’s type (e.g., text field, checkbox, or signature).

  3. Do one of the following:

    If...

    Then...

    The bounding box includes all of the field's content.

    Move on to the next step.

    The box is in the right place but doesn't include all of the field's content (e.g., parts of letters fall outside of the box).

    Adjust the edges of the box until it contains all of the content that should be transcribed.

    FieldIDAdjustBounding.gif

    Neighboring text segments should also be included in the field's transcription.

    With a click-and-drag motion, draw a bounding box that includes all of the field's content.

    FieldIDCombineSegments.gif

    The box doesn’t include any of the field’s content OR no bounding box appears around the field’s content when hovering over it.

    Press the spacebar, and with a click-and-drag motion, draw a bounding box that includes all of the field's content.

    FieldIDBoxFromScratch.gif

  1. Repeat steps 1-3 until you’ve identified all of the fields that appear in the document, confirming or correcting our predictions or manually identifying fields we don’t have predictions for.

  2. When you’ve finished identifying fields in a document, click Confirm All and Continue (Enter), or press Return or Enter.

  3. If any bounding boxes are overlapping, you will be asked to make the bounding boxes tighter to the area of the field.

Documents with tables

If your document contains both fields and tables:

  1. You will first complete Field Identification, as outlined above.

  2. Then, you will be brought to the Table Identification workflow.

For more detailed information about table extraction, see Table Identification.

Fields with multiple occurrences

The Multiple Occurrences (MOs) feature helps you identify multiple instances of a field. 

An occurrence is defined as one of the distinct values from a sequence of values for a field. For example, if two people own a bank account, then the “Account Owner” field needs two values to be extracted, one for each name. 

Do NOT use MOs to select the same value multiple times.  If the same value appears twice, you should only label the first occurrence in the natural reading order. 

Identify multiple occurrences of a field 

  1. Create a bounding box for the first occurrence of the field.

  2. Click on Add another [field’s name]. You can also use the shortcut CMD + ALT + +.
     mceclip0__6_.png

  3. Create a bounding box for the second occurrence of the field.

Use the Add another text segment option when a given occurrence cannot be annotated with only one bounding box. For more information on multiple bounding boxes, see Multiple bounding boxes for fields.

  1. Repeat steps 2 and 3 for each additional occurrence of the field.

Multiple Occurrences model 

The default Field ID model can only predict one occurrence per field. For example, if a field can be referred to as a sequence of distinct values, you may need to use the Multiple Occurrences model. For more information on training a new Field ID model, see Training a New Field Identification Model.

If you want to process documents with multiple occurrences of fields for a specific layout, you need to select the Multiple Occurrence Field ID model for the layout before model training. To do so, follow the steps below: 

  1. Go to the admin page by adding “/admin/form_extraction/template/” to the end of the application URL (e.g., production.example.com/admin/form_extraction/template/).

  2. Click on the UUID of the layout you’d like to train a Multiple Occurrence model for.

  3. In the Flex engine type for training setting, select MULTIPLE_OCCURRENCES from the drop-down menu.

  4. Click Save

If you have a trained MULTIPLE_OCCURRENCES model for the layout, we may have predictions for the placement of some of the fields’ occurrences. As part of Field Identification, you will need to:

  • confirm or correct our predictions, and 

  • manually identify any occurrences we don't have predictions for.

Note that if your model has low confidence in identifying some of the field’s occurrences, the entire field will be sent to Field ID Supervision, not individual occurrences.

If you previously used a layout to extract multiple occurrences in v32, v33, or v34 but still do not have a trained MULTIPLE_OCCURRENCES model created in v35, you can always send your documents to Field ID Supervision by doing one of the following:

  • Enable Identification Supervision in the Layout Editor for fields with possible multiple occurrences. To learn more, see the “Defining field metadata” section in the Creating Semi-structured Layouts article.

  • Using a custom code block, set the value of the manual_identification_processing_type property to FORCE for the Manual Identification block. Thus, the machine will try to predict the first occurrence of a field on a page, and your keyers can identify the additional occurrences manually. To learn more about Custom Code Blocks, see the “Custom Code Blocks” section in Flow Blocks.

Multiple bounding boxes for fields

To annotate values across line and page breaks that can’t be captured by a single bounding box, you can draw multiple bounding boxes for fields. For example, if you have a field whose value spans across two pages, you can draw two bounding boxes for the same field. 

To create multiple bounding boxes for a single field, follow the steps below:

  1. Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts.

  2. Draw a bounding box for the first text segment of the field.

  3. Click Add another text segment.

  4. Draw a bounding box for the second text segment of the field.

Note that you can repeat steps 3 and 4 until you’ve identified all text segments of a field. 

Screenshot_2022-04-28_at_17.15.27.png

If a field has an occurrence with multiple bounding boxes, and you delete this occurrence, all of the occurrence’s bounding boxes will also be cleared.

Searching for text segments

To search for text segments across all pages of a document, you can:

  • Click the search bar at the top of the Field ID page and type a keyword. Note that the search bar supports single-word searches. We recommend searching for the most relevant keyword.

  • Press Command + Option + F for Mac or Control + Alt + F for Windows and type a keyword.

Pressing Return for Mac or Enter for Windows will return all text segments that match your search. You can navigate through the search results, using the Previous Segment (mceclip0__7_.png) and Next Segment (mceclip1__3_.png) buttons.

All search results are highlighted with orange rectangles. The currently selected search result is highlighted with a dark orange rectangle while all other search results are highlighted with light orange rectangles. Note that the search is not case-sensitive. 

Addressing incorrect layouts

The Field ID task provides a Mark Layout Variation Incorrect button under the "Document Details" dropdown in the right panel. This action allows the user to skip further extraction on Semi-structured documents which have been matched to the wrong layout.

Once the Mark Layout Variation Incorrect action has been confirmed, all pages in the document will be marked as No Layout Found, and no further extraction work will be performed. 

Multi-page documents & page re-ordering

If the document has more than one page, you can navigate between pages by clicking on the preview images in the left-side column. If the pages are out of order, you can also re-arrange them while in the Field ID task view.

To do so, hover over the image you'd like to move, then click and drag the page to its desired position.

When locating fields, you should look across all pages before concluding that a field is not present. If the field is present on multiple pages, you only have to draw the box once. You should work with your team to determine if there's a preference on which page to draw the box on. 

Keyboard shortcuts

Field identification

Task

Mac Shortcuts

Windows Shortcuts

Change label location

F4

F4

Clear bounding box

Backspace

Backspace

Next field in list

E or

E or

Previous field in list

W or

W or

Free draw a bounding box

Spacebar + click and drag

Spacebar + click and drag

Add an additional occurrence of a field

Command + Option + +

Control + Alt + +

Remove an additional occurrence of a field

Command + Option + -

Control + Alt + -

Add another text segment

Option + click and drag

Alt + click and drag

Add another text segment and free draw a bounding box (Only supported for text fields)

Option + Spacebar + click and drag

Alt + Spacebar + click and drag

Focus segment search bar

Command + Option + F

Control + Alt + F

Complete Task

Command + Return

Control + Enter

All tasks

Task

Mac Shortcuts

Windows Shortcuts

Zoom in

Option + +

Alt + +

Zoom out

Option + -

Alt + -

Next page

Fn + ⬇

Page down

Previous page

Fn + ⬆

Page up

Keyboard shortcuts

F2

F2

Close task

Option + Command + X

Alt + Control + X