A high-quality model requires consistent annotations. That's why identifying potential discrepancies in the training sets before model training is crucial. To help with this effort, we've included a tool called Labeling Anomaly Detection in Training Data Management (TDM).
After completing the annotations, you can analyze the training data to find inconsistencies in your documents and any documents that are ineligible for training. To learn more about eligibility, see our Document Eligibility Filtering article. Labeling Anomaly Detection identifies and highlights potential anomalies in field and table annotations for review.
Before using Labeling Anomaly Detection:
Upload the required number of documents (120 minimum, 400 recommended).
Run training data analysis.
Annotate your training set.
Reanalyze your data.
Always re-analyze your data for updated information on the training set. The ineligibility details may need to be updated if documents have been added, removed, or modified since or during the last analysis. You can learn more about how to use Training Data Analysis in Step 4 of Training a Semi-structured Model.
Detecting anomalies
If anomalies were detected during the training data analysis:
A count of documents with potential anomalies appears on the Training Data Health card.
Each document containing potential anomalies is highlighted with a yellow bar on the left-hand side of its Doc ID in the Training Data table.
Labeling Anomaly Detection includes anomalies generated from Model Validation Tasks (MVTs) after training in previous versions. For more information. see Model Validation Tasks.
Above the Training Documents card, click Filters, and select Contains Anomalies from the Has Anomalies drop-down list.
Click Apply Filters.
Click the Edit Annotations link for a document highlighted as having anomalies.
Review one of the annotations highlighted as being a potential anomaly.
Check how the field was annotated in other documents in the same group. That way, you'll ensure consistency throughout the training set. If the annotation is not correct, adjust it accordingly and click Save Changes.
If the annotation is correct, click on it and then click Ignore Anomaly. A warning message will appear.
Click Confirm and then Save Changes.
Limitations of Labeling Anomaly Detection
Anomalies can only be detected for fields with a single occurrence and a single bounding box. Therefore, you could still see incorrect annotations across fields with multiple occurrences (MOs) and multiple bounding boxes (MBBs). Make sure to double-check all fields before submitting the document.
Table Anomaly Detection won't capture all errors in the annotations. If a column is missing, the other documents in the same group must have that column for the system to mark the missing column as an anomaly.
You can run Labeling Anomaly Detection for up to 5,000 pages at a time.
Anomaly indicators
The indicators described in the table below appear in the document viewer if anomalies are detected in the document.
Indicator | Description | Example |
---|---|---|
Cell-anomaly indicator - found in the right-hand sidebar | Dotted line around a cell - indicates a cell that needs to be reviewed Gradient indicator - shows places where the model suggests a cell should be | |
Page Indicator - found in the left-side sidebar | Dotted line around a page - indicates cell-level anomalies on the specific page |
|
Missing columns label - found on the top of the document, next to the colored markers for each column | The indicator for a missing column is a dotted, transparent label, located next to the bookmark indicators for each column. It indicates all possible missing columns on a document level. | |
Missing column tag - found in the right-hand sidebar next to the specific column that is missing | This tag shows the specific missing column. | |
Misplaced column - found around the colored markers for columns | Dotted line around the colored markers - indicates misplaced columns | |
Misplaced column tag - found in the right-hand sidebar next to the specific column that is misplaced | This tag shows the specific misplaced column. | |
Number of anomalies - found in the right-hand sidebar above the list of columns in the document | This yellow indicator displays the current number of potential anomalies in the document. It is dynamic and changes after each interaction with an annotation labeled as an anomaly. If you have a nested table, the number of potential anomalies will also appear next to the name of the parent or child table. | |
Ignore anomaly- action button, located in the colored label for a column | The bell button appears for single and multiple anomalies in a column. Click on it to ignore the detected anomaly Single anomaly - hover over it to see the specific anomaly Multiple anomalies - hover over to see the number of potential anomalies for this column |
Re-analyzing data
We recommend re-analyzing the data after reviewing all anomalies to ensure the training set is consistent and ready for model training. Click Reanalyze data to choose one from the two options listed below:
Reanalyze with ignored anomalies — If you decide to reanalyze the training set with the ignored anomalies, any anomalies that were previously ignored will not reappear.
Reanalyze from scratch — If you want to analyze the training set from scratch, the ignored anomalies will be included in the results.