Training a New Field Identification Model

Overview

There are two ways to train a Field ID model.

  1. To automatically train and deploy Field ID models, you can enable the Continuous Field Locator model improvement setting.

  2. To manually train and deploy models, you must navigate to the Model Details page.

    • To complete this process, follow the instructions below.

Once you determine a Semi-structured layout where you would like to train a model, there are two ways to get to the Model Details screen:

  1. Library > Models

  2. Layouts > Models

To understand the requirements to train a model, see Requirements for Training a New Model.

From the models tab

You can reach a list of all Semi-structured models from the Library. Navigate to Models > Library, and click on a model name from the table to view its associated Model Details page.

From the layout details page

On the Layout Details page for any Semi-structured layout, navigate to the Model Details screen by clicking on the Field ID link in the upper-right corner under the "Models" section.

Multiple Occurrences Field ID model

The Multiple Occurrences (MOs) feature helps you identify multiple instances of a field. Learn more about fields with multiple occurrences in Field Identification.

The default Field ID model cannot predict multiple occurrences of fields. If you process documents with multiple occurrences of fields through a specific layout, you need to select the Multiple Occurrence Field ID model for this layout before model training. To do so, follow the steps below:

  1. Go to the admin page by adding "/admin/form_extraction/template/" to the end of the application URL (e.g., example.production.hyperscience.com/admin/form_extraction/template/).

  2. Click the UUID of the layout you’d like to edit.

  3. In the Flex engine type for training setting, select MULTIPLE_OCCURRENCES from the drop-down menu.

  4. Click Save.

To initiate model training, follow the steps from the section below. 

Initiating Model Training

Once on the Model Details screen, the system will let you know if you've completed enough QA or Field ID Supervision to initiate training. If you have not yet reached the minimum, you'll see the number of additional documents required to reach the minimum.

  • Once above the minimum, train a model by clicking the Run Training button (if there is no previous model) or Actions > Run Training (if there is an existing model).

  • After initiating training, the system will show that the model is pending.

The training process takes approximately 8 minutes per document on an 8-core machine with 32 GB of memory. Monitor the Notifications at the top-right of the application to keep track of model training jobs.

To cancel a model training job, see Canceling or Retrying a Training Job.

Model Validation Tasks

Model Validation Tasks (MVT) ask you to verify whether a field location is accurate or not. If you re-train a model using the results from MVT, you will improve automation.

MVT will be created once you train a model and run some documents through. Once you complete the MVT, a new model will automatically re-train at the next regularly scheduled interval. You can also manually trigger model training.

For more information, see Model Validation Tasks.

Additional Notes

  • If your deployment does not have a dedicated machine for training, document processing times will be severely delayed while the model trains. Without a dedicated machine, it is best to avoid processing documents while training models. 

  • Initiating training on subsequent models for a Semi-structured layout is identical to initiating the first model. However, when you view the Model Details screen, you'll see data associated with the live model in the Live Model section. 

  • If you have PII deletion enabled on your system, or if you have imported a model from another instance, it is possible that you may not have enough documents to run training even if you have a live model. If this is the case, you'll need to wait until enough documents have been through QA or Field ID Supervision (increasing the sampling rate can reduce the wait time).

    • Just like before, training a new model is as easy as clicking the Run Training button.