Field Identification

Field Identification (Field ID) tasks are specific to Semi-structured documents. These tasks are created when the system is unsure if it has correctly located a field, or if a field is specified for manual identification in the Layout Editor.

The task begins with an instruction screen that states the best practices for completing Field ID tasks. You may choose to dismiss this screen after its first appearance.

Screen_Shot_2019-10-31_at_2.14.13_PM.png

Documents without tables

On the Field Identification page, a list of the layout's possible fields appears in the right-hand sidebar.

If you have a trained model for the layout, we may have predictions for the placement of some fields. As part of Field Identification, you will need to:

  • confirm or correct our predictions, and 

  • manually identify any fields we don't have predictions for.

Before you begin…

To help us transcribe each field accurately, review our Best Practices for Field ID Supervision and QA and keep them in mind as you identify each field's content. 

You can complete Field ID Supervision by following the steps below.

  1. Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts. The selected field will be highlighted.

    • If we have a prediction for the field, its icon in the right-hand sidebar appears in blue ( FieldIDPredictionIcon.png). The prediction appears on the image as a blue bounding box, and you can move to step 3.

  2. If we don't have a prediction for the field, you will need to manually identify the field's content on the page. 

    • Click on the field's content. A bounding box appears around the field's content, which is highlighted in blue. The size of the bounding box is automatically adjusted depending on the field’s type (e.g., text field, checkbox, or signature).

  3. Do one of the following:

    If...

    Then...

    The bounding box includes all of the field's content.

    Move on to the next step.

    The box is in the right place but doesn't include all of the field's content (e.g., parts of letters fall outside of the box).

    Adjust the edges of the box until it contains all of the content that should be transcribed.

    FieldIDAdjustBounding.gif

    Neighboring text segments should also be included in the field's transcription.

    With a click-and-drag motion, draw a bounding box that includes all of the field's content.

    FieldIDCombineSegments.gif

    The box doesn’t include any of the field’s content OR no bounding box appears around the field’s content when hovering over it.

    Press the spacebar, and with a click-and-drag motion, draw a bounding box that includes all of the field's content.

    FieldIDBoxFromScratch.gif

  1. Repeat steps 1-3 until you’ve identified all of the fields that appear in the document, confirming or correcting our predictions or manually identifying fields we don’t have predictions for.

  2. When you’ve finished identifying fields in a document, click Confirm All and Continue (Enter), or press Return or Enter.

  3. If any bounding boxes are overlapping, you will be asked to make the bounding boxes tighter to the area of the field.

Documents with tables

If your document contains both fields and tables:

  1. You will first complete Field Identification, as outlined above.

  2. Then, you will be brought to the Table Identification workflow.

For more detailed information about table extraction, see Table Identification.

Fields with multiple occurrences

The Multiple Occurrences (MOs) feature helps you identify multiple instances of a field. 

An occurrence is defined as one of the distinct values from a sequence of values for a field. For example, if two people own a bank account, then the “Account Owner” field needs two values to be extracted, one for each name. 

Do NOT use MOs to select the same value multiple times.  If the same value appears twice, you should only label the first occurrence in the natural reading order. 

Identify multiple occurrences of a field 

  1. Create a bounding box for the first occurrence of the field.

  2. Click on Add another [field’s name]. You can also use the shortcut CMD + ALT + +.
     mceclip0__6_.png

  3. Create a bounding box for the second occurrence of the field.

    Use the Add another text segment option when a given occurrence cannot be annotated with only one bounding box. For more information on multiple bounding boxes, see Multiple bounding boxes for fields.

  4. Repeat steps 2 and 3 for each additional occurrence of the field.

Multiple Occurrences model 

The default Field ID model can predict multiple occurrences of fields. Users are now able to indicate whether a field needs annotation of multiple instances by selecting the Multiple Occurrences checkbox in the Layout Editor. Learn more in Training a New Field Identification Model

Multiple bounding boxes for fields

To annotate values across line and page breaks that can’t be captured by a single bounding box, you can draw multiple bounding boxes for fields. For example, if you have a field whose value spans across two pages, you can draw two bounding boxes for the same field. 

To create multiple bounding boxes for a single field, follow the steps below:

  1. Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts.

  2. Draw a bounding box for the first text segment of the field.

  3. Click Add another text segment.

  4. Draw a bounding box for the second text segment of the field.

Note that you can repeat steps 3 and 4 until you’ve identified all text segments of a field. 

Screenshot_2022-04-28_at_17.15.27.png

If a field has an occurrence with multiple bounding boxes, and you delete this occurrence, all of the occurrence’s bounding boxes will also be cleared.

Long-form extraction

Long-form extraction allows you to extract data points from long documents with unstructured text. For example, you can now identify fields of interest, such as title deeds, 10-Ks, and others, that you can then use downstream to generate actionable insights.

Data keyers can become more efficient at working on their tasks as they can leverage the automation capabilities of Long-form extraction. This automation is now possible with the introduction of an UNSTRUCTURED_EXTRACTION ID model. To learn how to enable and train UNSTRUCTURED_EXTRACTION ID models, see Training a New Field Identification Model.

You can use data-point extraction in conjunction with Named Entity Recognition Blocks. 

Limitations

The following limitations apply to long-form extraction:

Searching for text segments

To search for text segments across all pages of a document, you can:

  • Click the search bar at the top of the Field ID page and type a keyword. Note that the search bar supports single-word searches. We recommend searching for the most relevant keyword.

  • Press Command + Option + F for Mac or Control + Alt + F for Windows and type a keyword.

Pressing Return for Mac or Enter for Windows will return all text segments that match your search. You can navigate through the search results, using the Previous Segment (mceclip0__7_.png) and Next Segment (mceclip1__3_.png) buttons.

All search results are highlighted with orange rectangles. The currently selected search result is highlighted with a dark orange rectangle while all other search results are highlighted with light orange rectangles. Note that the search is not case-sensitive.

Addressing incorrect layouts

The Field ID task provides a Mark Layout Variation Incorrect button under the "Document Details" dropdown in the right panel. This action allows the user to skip further extraction on Semi-structured documents that have been matched to the wrong layout.

Once the Mark Layout Variation Incorrect action has been confirmed, all pages in the document will be sent for reprocessing. Learn more in Reprocessing.

Multi-page documents & page re-ordering

If the document has more than one page, you can navigate between pages by clicking on the preview images in the left-side column. If the pages are out of order, you can also re-arrange them while in the Field ID task view.

To do so, hover over the image you'd like to move, then click and drag the page to its desired position.

When locating fields, you should look across all pages before concluding that a field is not present. If the field is present on multiple pages, you only have to draw the box once. You should work with your team to determine if there's a preference on which page to draw the box on. 

Keyboard shortcuts

Field identification

Task

Mac Shortcuts

Windows Shortcuts

Change label location

F4

F4

Clear bounding box

Backspace

Backspace

Next field in list

E or

E or

Previous field in list

W or

W or

Free draw a bounding box

Spacebar + click and drag

Spacebar + click and drag

Add an additional occurrence of a field

Command + Option + +

Control + Alt + +

Remove an additional occurrence of a field

Command + Option + -

Control + Alt + -

Add another text segment

Option + click and drag

Alt + click and drag

Add another text segment and free draw a bounding box (Only supported for text fields)

Option + Spacebar + click and drag

Alt + Spacebar + click and drag

Focus segment search bar

Command + Option + F

Control + Alt + F

Complete Task

Command + Return

Control + Enter

All tasks

Task

Mac Shortcuts

Windows Shortcuts

Zoom in

Option + +

Alt + +

Zoom out

Option + -

Alt + -

Next page

Fn + ⬇

Page down

Previous page

Fn + ⬆

Page up

Keyboard shortcuts

F2

F2

Close task

Option + Command + X

Alt + Control + X