Training a New Table Identification Model

Overview

A trained Table ID model enables cell-level predictions and automatic table processing. A Table ID model can be trained to automatically identify both gridded and non-gridded tables. A standard grid format refers to tables where data falls neatly within the boundaries of each cell, as defined by the rows and columns of the table, and the information contained within the rows and columns can be separated with straight lines without intersecting the content. A non-gridded format refers to tables where data does not fall neatly within the boundaries of each cell, as defined by the rows and columns of the table.

Regardless of the grid format, Table ID models support both regular and nested tables. To learn about the differences between regular and nested tables, see Table Identification.

To train and deploy models, go to the Model Details page. Once you determine a Semi-structured layout where you would like to train a model, there are two ways to get to the Model Details screen:

Library > Models
Go to Layouts > Models

Note that Table ID model training must be triggered manually.

To understand the requirements to train a model, see Requirements for Training a New Model.

Note that you can still train a future model and use existing training data annotated with the Column Tool from v30. The Column Tool was deprecated in v31.

Table ID models look at transcribed text to improve table identification. This feature is called Table Detector and supports the following scenarios:

If there are multiple similar tables on the page, you can train the model to identify only a specific table with a predefined header’s name.
Using the transcribed text from the page, the model can filter out unnecessary rows from a table.

From the Models tab

You can reach a list of all Semi-structured models from the Library. Navigate to Models > Library. Click on a model name from the table to view the associated Model Details page.

From the layout details page

On the Layout Details page for any Semi-structured layout, navigate to the Model Details screen by clicking on the Table ID link in the upper-right corner under the "Models" section.

Initiating Model Training

Once on the Model Details screen, the system will let you know if you've completed enough Table ID Supervision or QA to initiate training. If you have not yet reached the minimum, you'll see the number of additional documents required.

When working toward the minimum, complete Table ID Supervision and Table ID QA to ensure that your data qualifies for model training. To learn more, see Requirements for Training a New Model.

Once above the minimum, train a model by clicking the Run Training button (if there is no previous model) or by clicking Actions and then Run Training (if there is an existing model).
After initiating training, the system will show that the model is pending.

We recommend using a 16-core machine with 64 GB of memory. Monitor the Notifications at the top right of the application to keep track of model training jobs.

To cancel a model training job, see Canceling or Retrying a Training Job.

Model Validation Tasks

Model Validation Tasks (MVTs) ask you to verify whether a table cell location is accurate or not. If you re-train a model using the results from MVTs, you will improve automation.

MVTs will be created once you train a model and run some documents through. Once you complete the MVTs, a new model will automatically re-train at the next regularly scheduled interval. You can also manually trigger model training.

For more information, see Model Validation Tasks.

Additional notes

If your deployment does not have a dedicated machine for training, document processing times will be severely delayed while the model trains. Without a dedicated machine, it is best to avoid processing documents while training models.
Initiating training on subsequent models for a Semi-structured layout is identical to initiating the first model. However, when you view the Model Details screen, you'll see data associated with the live model in the Live Model section.