Overview
Table ID tasks are specific to Semi-structured documents with table columns and they appear within the Identification Supervision task flow. In order to see these tasks, you must define table columns on a Semi-structured layout. You can define a regular table or a nested table in a Semi-structured layout.
Regular tables allow you to extract data from tables with simple structures. Regular tables contain a single table with table columns.
Nested tables allow you to extract data from tables with nested, complicated structures where child row data points inherit data points from parent rows. Nested tables contain a parent and a child table with their respective table columns.
Table Identification task
Like the Field ID task, if the document has more than one page, you can navigate between pages by clicking on the preview images in the left-side panel.
If the pages are out of order, you can also re-arrange them in this task. To do so, hover over the image you'd like to move, then click and drag the page to its desired position.
Table ID Supervision
When completing Table ID Supervision tasks, you use our Template Tool. When using the Template Tool, you choose one row to be a "template row," which we'll use to make predictions for the location of table cells in other rows in the table. You can then adjust the size of the rows and the cells within each one. For nested tables, you first complete Table ID Supervision for the child table, and then complete Table ID Supervision for the parent table. For more information, see Table ID Supervision.
Table Identification automation
You can train a model to automatically identify tables that adhere to a standard grid format. A standard grid format refers to tables where data falls neatly within the boundaries of each cell, as defined by the rows and columns of the table, and the information contained has a 1:1 relationship with a corresponding column and row.
It is also possible to train a model to automatically identify non-gridded tables; however, the automation rates will be lower. A non-gridded format refers to tables where data does not fall neatly within the boundaries of each cell, as defined by the rows and columns of the table.
Regardless of the grid format, Table ID models support both regular and nested tables.
Some notes about table ID automation:
You can improve the automation rates for identifying non-gridded tables by training more diverse non-gridded samples.
Please note that table automation requires version 5 of the API in order to view the transcription output.
Model prediction scenarios
The following scenarios are possible when using Table ID automation:
Model accurately predicts some of the cells and does not identify/inaccurately predicts the rest.
Adjust the cell predictions.
Manually draw the cells that the model failed to predict.
Review cell splitting and submit for Transcription.
Model accurately predicts all of the cells with low confidence.
Review cell splitting and submit for Transcription.
Model accurately predicts all of the cells with high confidence.
No Supervision task is created, and the table is automatically submitted for Transcription.
Model inaccurately predicts all of the cells.
Remove all cells.
Manually draw all of the cells.
Review cell splitting and submit for Transcription.
Improve model performance by performing Table ID Supervision and Table ID QA on more documents and then re-training the model with the new data. See below for more details.
To see performance reporting on Table ID, see Automation and System Throughput.
Training a model
Table ID models are trained at the layout-level, just like Field ID models.
To run a model training job, at least 100 documents must go through Supervision; we recommend aiming for around 500 documents for optimum performance.
Model Validation Tasks are used to correct high confidence errors made by the machine; we recommend complete these tasks to boost automation and accuracy performance.
To learn more, go to Training a New Table Identification Model and Model Validation Tasks.