The following best practices for using layout identifiers will ensure the highest levels of automation and accuracy for the classification of Structured documents.
Use unique text at the top or the bottom of the page as a layout identifier
The perfect candidate for a layout identifier is unique text that is located at the top or the bottom of the page. Documents usually have layout-specific versions, dates, or other identifiers that are located at the top or the bottom of the pages. Using such an identifier is enough to reduce the margin of error when matching pages to Structured layouts.
If there is no unique text at the top or the bottom of the page, use any other unique variation-specific text
In the absence of a layout identifier at the top or the bottom of the page, find a unique variation-specific identifier somewhere else on the page. For example, if there is a clause that is added in just one variation, part of the text from the clause will be unique and could serve as a good distinguishing factor. You can use this unique text as a layout identifier for the specific layout variation.
Only add as many layout identifiers as needed to distinguish between similar layouts
Only one layout identifier is supposed to be added on a single layout page. In rare cases, there are multiple layouts with the same identifier, and you need to add a second identifier to distinguish between the similar layouts. Do not add unnecessary layout identifiers because such identifiers only harm the matching process.
Use the same number of layout identifiers across variations of the same layout
If you have a layout with multiple variations, make sure to use the same number of layout identifiers across all variations. Otherwise, variations with a higher number of layout identifiers will always get the highest confidence boost, which can result in the wrong layout being matched.