Training a New Table Identification Model

A trained Table ID model enables cell-level predictions and automatic table processing. A Table ID model can be trained to automatically identify both gridded and non-gridded tables.

  • A standard grid format refers to tables where data falls neatly within the boundaries of each cell, as defined by the rows and columns of the table, and the information contained within the rows and columns can be separated with straight lines without intersecting the content. 

  • A non-gridded format refers to tables where data does not fall neatly within the boundaries of each cell, as defined by the rows and columns of the table. 

Regardless of the grid format, Table ID models support both regular and nested tables. To learn about the differences between regular and nested tables, see Table Identification.

To train and deploy models, go to the Model Details page. Once you determine a Semi-structured layout where you would like to train a model, there are two ways to get to the Model Details page:

  1. Go to Library > Models, select Identification Models from the drop-down list at the top of the page, and then click on the name of the model.

  2. Go to Layouts, click on the name of the layout, and then click on the name of the Identification Model on the Layout Details page.

Note that Table ID model training must be triggered manually.

To understand the requirements to train a model, see Requirements for Training a New Model.

Table ID models look at transcribed text to improve table identification. This feature is called Table Detector and supports the following scenarios:

  • If there are multiple similar tables on the page, you can train the model to identify only a specific table with a predefined header’s name. 

  • Using the transcribed text from the page, the model can filter out unnecessary rows from a table.

Initiating Model Training

InitiatingModelTraining.png

Once on the Model Details page, the system will let you know if you've completed enough Table ID Supervision or QA to initiate training. If you have not yet reached the minimum, you'll see the number of additional documents required. 

When working toward the minimum, complete Table ID Supervision and Table ID QA to ensure that your data qualifies for model training. To learn more, see Requirements for Training a New Model.

  • Once above the minimum, train a model by clicking the Run Training button (if there is no previous model) or by clicking Actions and then Run Training (if there is an existing model).

  • After initiating training, the system will show that the model is pending.

We recommend using a 16-core machine with 64 GB of memory. Monitor the Notifications at the top right of the application to keep track of model training jobs.

To cancel a model training job, see Canceling or Retrying a Training Job.

Anomaly Detection 

With the Anomaly Detection feature, the system analyzes your training data and flags potential anomalies in the annotations for you to review. When you review each flagged annotation, you can mark it as correct or edit the annotation. If you re-train a model after reviewing the anomalies, you will improve automation. You can manually initiate model training at any point, even if you haven’t reviewed all of the flagged anomalies.

For more information, see Detecting and Correcting Anomalies in Field Annotations and Detecting and Correcting Anomalies in Table Annotations.

Additional notes

  • If your deployment does not have a dedicated machine for training, document processing times will be severely delayed while the model trains. Without a dedicated machine, it is best to avoid processing documents while training models.

  • Initiating training on subsequent models for a Semi-structured layout is identical to initiating the first model. However, when you view the Model Details page, you'll see data associated with the live model in the Current Model section.