Keyer Data Management

With our Keyer Data Management tools, you can:

  • View field and table locator models’ training data.

  • View and edit ground-truth data for locator models.

  • View a list of documents used to train a locator model.

  • Choose whether to include a document in future training.

  • Create a body of documents that are always used in model training.

  • View automation rates for individual fields and table columns in a layout.

  • Quickly review the ground-truth data for low-performing fields and table columns.

Accessing Keyer Data Management tools

If you have the View Training Data permission (given to System Admin and Business Admin permission groups by default), you can access the Keyer Data Management tools for a model:

  1. Go to Library > Models.

  2. Click on the name of the model you would like to view training data for.

  3. Click the Field Identification or the Table Identification tab, depending on the type of training data you would like to view.

The Keyer Data Management Tools are located at the bottom of the Model Details page in the Training Documents card.

Navigating the Training Documents card

KDMTrainingDocumentsCard.png

On a model's Model Details page, there is a Training Documents table, which lists all training documents for this model.

The Training Documents table shows the following information for each training document:

  • The training document's ID

  • The ID of the group of training documents. With training data analysis, the system automatically groups similar documents together. To learn more about grouping documents together, see Training Data Analysis and Guided Data Labeling.

  • The number of pages the document contains.

  • The date of the document's submission was created.

  • The training status of the document:

    • Always: The document will always be used to train the model.

    • Auto: The document will be used to train the model until the document’s scheduled deletion, based on the PII Data Deletion settings for your instance.

    • Never: The document will never be used in future model training, even if it hasn't been deleted as part of PII Data Deletion.

    • Loading: The document is currently being uploaded.

    • Ready to Annotate: The document has been uploaded successfully but has not been annotated yet.

  • The date the document is scheduled for deletion.

Each document also has one of the following Actions links:

  • Annotate – only available for documents that do not have any annotations.

  • Edit Annotations – only available for documents that have already been annotated.

KeyerDataManagementActions.png

You can filter the contents of the table by training document ID, group ID, number of pages, submission date, training status, and scheduled deletion. You can also search for documents by their IDs.

Selecting at least one training document lets you use the Actions drop-down menu. This drop-down menu has the following buttons:

  • Remove training documents – removes the selected training documents and their associated annotations.

  • Edit training status – lets you change the training status of the selected training documents that have been annotated. All unannotated training documents that you’ve selected will keep their current status.

Viewing and editing annotation data for a document

If you have the Edit Training Data permission (given to System Admin and Business Admin permission groups by default), you can access the Annotation page for each of a model's training documents.

On the Annotation page, you can review and edit the bounding boxes for the document, along with the document's training status. To access this page, click on the Actions link for a document in the Training Documents card.

Review field and table column annotations

Clicking the Actions link on the Field Identification tab redirects you to the Review Field Annotations section. In the Review Field Annotations section of the right-hand sidebar, you can view a list of the layout fields that are present in the document.

Clicking the Actions link on the Table Identification tab redirects you to the Review Table Column Annotations section. In the Review Table Column Annotations section of the right-hand sidebar, you can view a list of the layout table columns that are present in the document.

View and edit field annotations

To view and edit annotations for a document's fields:

  1. Click on a field's name.

    • If a field has multiple occurrences, clicking on a field’s name will expand a list of all of the field’s occurrences on the page. In that list, click on an occurrence’s name.

    • If a field has multiple bounding boxes, clicking on a field’s name highlights all of the field’s bounding boxes on a document.

    • If a field has been added to the layout and is part of a live release but has not been annotated yet, the field is marked in yellow in the fields list.

    • To display annotation suggestions for unannotated documents, toggle the Display Suggestions setting to Enabled.

      Note that you can enable the Display Suggestions setting if you’ve annotated at least 2-3 documents from a training documents group.

      To learn more, see the “Step 3: Annotate Training Documents with Guidance” section in Training Data Analysis and Guided Data Labeling.

  2. If needed, adjust the position or dimensions of the field's bounding box. To do so, you can:

    • Drag the box to a new position on the page.

    • Hover over another text entry on the page, and click on the predicted bounding box that appears.

    • Click and drag on the box boundaries to adjust the height or width of the box.

      To learn more about Field Identification best practices, see Best Practices for Field ID Supervision and QA.

  3. Additionally, draw bounding boxes for any new fields that haven’t been annotated yet and are marked in yellow in the fields list.

  4. Additionally, you can also add and remove field occurrences. To add a field occurrence, follow the steps below:

    1. Click Add another [field’s name].

      AddAnotherOccurrence.png

      2. Create a bounding box for the field’s occurrence.

      To remove a field’s occurrence, click the Remove occurrence button ( DeleteIcon.png ) next to the occurrence’s name in the right-hand sidebar.

  5. To capture additional text segments for a given field, you can add additional bounding boxes. To add an additional bounding box, follow the steps below:

    a. Click Add another text segment.

    AddAnotherTextSegment.png

    b. Draw a bounding box for the additional text segment.

    Note that you can repeat steps 1 and 2 until you identify all text segments of a field. To remove a field’s bounding box, select the bounding box and click its X button.

  6. Click Save Changes.

  7. To view annotations for specific fields only:

    1. Expand the Show Fields drop-down menu at the top of the page.

    2. In the drop-down list that appears, select the fields whose bounding boxes you would like to review.

View and edit table column annotations

To view and edit annotations for a document's table column:

  1. Click on a table column’s name to highlight the respective table cells on the page.

    • Table columns’ names that are marked in yellow in the table columns lists have not been annotated yet.

    • To display annotation suggestions for unannotated documents, click the Display available suggestions button.

      Note that you can click the Display available suggestions button while manually identifying table cells if you’ve annotated at least 2-3 documents from a training documents group.

      To learn more, see the “Step 3: Annotate Training Documents with Guidance” section in Training Data Analysis and Guided Data Labeling.

  2. If needed, adjust the position or dimensions of the cells’ bounding boxes. To do so, you can:

  • Click and drag the corners of the box until it contains all of the cell’s content.

  • Adjust the row’s height by using the blue pill button.

  • Identify new cells by creating bounding boxes.

  • Identify multiple cells of the same column with a click-and-drag motion.

    To learn more about Table Identification best practices, see Best Practices for Table Identification and Table Transcription and the section “Step 2 of 2: Finish Table Identification” from Table ID Supervision.

  • Click Save Changes.

For nested tables, to switch between a document’s child and parent tables, use the drop-down menu at the top of the right-hand sidebar.

To view annotations for specific columns only:

  1. Expand the Visible Columns drop-down menu at the top of the page.

  2. In the drop-down list that appears, select the columns whose bounding boxes you would like to review.

Training Status

In the Training Status section of the right-hand sidebar, you can change the training status of the document. This option lets you indicate whether the document should always be used in training or if it should never be used in the model's training again. By selecting one of these options, you can override the PII Data Deletion settings for individual documents.

Building a set of training documents

If you indicate that individual documents should always be used in the model's training, you, in effect, create a stable, predictable set of documents that will be used across all trainings.

Change a document’s training status

To change the training status for a document:

  1. Select one of the following options:

    • Auto: (Default) The document will be used for training until it is deleted, based on the PII Data Deletion settings for your instance

    • Always: The document will always be used to train the model, even after it is scheduled for deletion under the PII Data Deletion rules for your instance.

    • Never: The document will never be used in future model training, even if it hasn't been deleted as part of PII Data Deletion.

    • Click Save Changes.

Page through available documents

You can quickly page through the Annotation pages for a model's documents by clicking and at the top of the page.

Viewing and editing annotations for a specific field or table column

You can view field-level and table-level automation details in the Field Identification and Table Identification tabs of the Model Details page.

Field-level Automation table

The Field-level Automation table has a row for each field in the model's layout. It shows the following information for each field:

  • The name of the field

  • The number of times the field was identified by the machine

  • The total number of times the field was identified (either by a keyer or the machine)

  • The automation rate for the field

    • You can sort this column to quickly find the fields with the lowest automation rates.

Each field in the Field-level Automation table also has a View Annotations link.

Viewing a field’s training data

If you are interested in reviewing the ground-truth, or training, data for a particular field, you can do so by following the steps below.

Troubleshooting fields with low automation rates

If the Field-level Automation table shows that a particular field has a low rate of automation, you can troubleshoot it by following these steps.

To view a field’s annotations:

  1. In the Field-level Automation table, click View Annotations in the field's row.

  2. View the field's bounding box in the given document and make any necessary edits, as described in View and edit annotations.

  3. Click at the top of the page to see the field's bounding box on the next available document.

Table-level Automation table

The Table-level Automation table has a row for each table column in the model's layout. It shows the following information for each table column:

  • The name of the table column.

  • The number of times the table column’s cells were identified by the machine.

  • The total number of times the table column’s cells were identified (either by a keyer or the machine).

  • The automation rate for the table column.

    • You can sort this column to quickly find the table columns with the lowest automation rates.

Each table column in the Automation table also has a View Annotations link.

For nested tables, the Table-level Automation table separates the parent and child columns into two subtables.

TableLevelAutomationTable.png

Viewing a table column’s training data

If you are interested in reviewing the ground-truth, or training, data for a particular table column, you can do so by following the steps below.

Troubleshooting table columns with low automation rates

If the Table-level Automation table shows that a particular table column has a low rate of automation, you can troubleshoot it by following these steps.

To view a table column’s annotations:

  1. In the Automation table, click View Annotations in the table column’s row.

  2. View the table column’s bounding boxes in the given document and make any necessary edits, as described in View and edit table column annotations.

  3. Click at the top of the page to see the table column’s bounding boxes on the next available document.