Semi-structured Grouping Logic

A submission can consist of image files for various types of documents. For more information on assessing the type of document in a submission, see Understanding Document Types.

For submissions of Structured documents, the system is able to match the pages of the submission to the respective layout using the field location and data type that you defined during layout creation. Semi-structured layouts, however, are composed of a list of key fields that you can expect to find on a page without any geographic information (e.g., an invoice will have the vendor name but the location of this field will vary depending on the invoice).

As a result, for submissions of Semi-structured documents, the machine decides whether the Submission is comprised of one document with multiple pages or multiple documents of individual pages based on the Semi-structured grouping logic flow setting. This setting also affects how Additional layouts are grouped; therefore, this setting will affect all documents that are not Structured.

To learn more about the Semi-structured grouping logic setting and other flow settings, see Flow Settings.

Grouping Semi-structured or Additional pages together 

Semi-structured pages in a submission that are matched to the same layout (e.g., a list of key fields of interest) are grouped into a document based on two flow settings: Manual Classification Supervision and Semi-structured grouping logic.

Note that if the Semi-structured Classification flow setting is enabled, it will also affect how Semi-structured documents are handled. To learn more about this feature, see Automatic Document Classification.

While Manual Classification Supervision can simply be toggled on or off, there are three configurations for Semi-structured grouping logic:

  1. Consecutive pages as separate documents — Every page matched to the same layout will be treated as an individual document.

  2. Consecutive pages as a document — Consecutive pages that are matched to the same layout will be treated as one document.

  3. Manual review of consecutive pages — Consecutive pages that are matched to the same layout will create a manual document organization task for the submission.

    • This setting is only available if Manual Classification Supervision is enabled.

Note that for PDF or TIFF files, only consecutive pages of the same file will be grouped as one document.

Example scenarios and diagrams

Test Case

Semi-structured Classification 

Manual Classification Supervision

Semi-structured grouping logic (see numbers above)

Consecutive pages as a document

Enabled

Enabled

2

Consecutive pages to Supervision

Enabled

Enabled

3

Every page as a document

Enabled

Enabled

1

Disable Manual Classification Supervision

Enabled

Disabled

1

Disable Semi-structured Classification

Disabled

Enabled

1

Consecutive pages as a document

  • Semi-structured Classification – Enabled

  • Manual Classification Supervision – Enabled

  • Semi-structured grouping logic – 

In the example below, each page had a high-confidence match to a Semi-structured layout using Semi-structured Classification, so no Document Classification task was generated. However, if the machine made a low-confidence match, then Step 1 of Document Classification would be generated. Note that since the PDF contained two consecutive invoices, the machine grouped them together as one document.

ConsecutivePagesAsDocumentv30.png

Consecutive pages to Supervision

  • Semi-structured Classification – Enabled

  • Manual Classification Supervision – Enabled

  • Semi-structured grouping logic – 

In the example below, you can see that the submission contained 3 different files – 2 single-page PDFs and 1 multi-page PDF. Recall that for PDF or TIFF files, only consecutive pages of the same file will be grouped as one document. Therefore, even though each single-page PDF is a paystub, we expect the machine to treat them as different documents. In the same vein, we expect the multi-page PDF containing the consecutive invoices to group them together as one document. Finally, the setting that determines Semi-structured grouping logic was configured to create a Document Classification Supervision task for the entire submission if any consecutive pages were matched to the same layout. You should use this setting if you want to be completely sure that the document classifications are correct. 

ConsecutivePagesToSupervisionv30.png

Every page as a document

  • Semi-structured Classification – Enabled

  • Manual Classification Supervision – Enabled

  • Semi-structured grouping logic â€“ Every page matched to the same layout will be treated as an individual document 

In the example below, each page in the submission is treated as an individual document, regardless of the sequence of pages. Note that in this case, each page had a high-confidence match to a Semi-structured layout so no Document Classification task was created. However, if the machine made a low-confidence match, then Step 1 of Document Classification would be generated.

EveryPageAsADocumentv30.png

Disable Manual Classification Supervision

  • Semi-structured Classification – Enabled

  • Manual Classification Supervision – Disabled

  • Semi-structured grouping logic – Every page matched to the same layout will be treated as an individual document 

In the example below, each page in the submission is treated as an individual document, regardless of the sequence of pages. Note that in this case, the Invoice at the end of the PDF could not be matched to its respective layout. Furthermore, since Document Classification is disabled, there will be no Supervision task generated to match the Invoice to its layout; the invoice will be marked as "No Layout Found".

DisableManualDocumentClassificationv30.png

Disable Semi-structured Classification

  • Semi-structured Classification – Disabled

  • Manual Classification Supervision – Enabled

  • Semi-structured grouping logic – Every page matched to the same layout will be treated as an individual document

In the example below, each page in the submission is treated as an individual document, regardless of the sequence of pages. Note that in this case, none of the pages were automatically matched to their respective layout because Semi-structured Classification was disabled. Therefore, the Document Classification task will ask you to match each page to its respective layout and then categorize each of the pages into documents.

DisableSemistructuredClassificationv30.png