Document Drift Management (Layout Triage)

Document Drift Management (also known as Layout Triage), is a post-processing feature that helps you manage documents that don't match layouts during Classification. When submissions don't meet the Structured Layout Match Threshold threshold or are manually flagged as having incorrect or missing layouts, their pages are marked as unmatched.

This feature replaces the previous "Find Potential Layouts" and "No Layout Found" processes, offering a more streamlined and effective way to handle unmatched pages.

Document Drift Management allows you to determine and create the necessary layouts for these pages, ensuring that similar documents can be accurately classified and processed. In this article, you'll learn how to use Document Drift Management to handle unmatched pages efficiently.

Misclassified vs. Unmatched pages

Misclassified pages are those that the system recognizes but assigns to the wrong layout. You can manually reassign these pages to the correct onе. Unmatched pages don't fit any existing layout, so they remain unclassified.

Grouping pages

Document Drift Management works by grouping unmatched pages based on their similarities. It uses a pixel-level analysis to identify pages that share common visual templates. These pages are then automatically clustered into groups, making it easier to identify potential new layouts.

Hyperscience recommends using this feature for Structured use cases.

Document Drift Management effectively handles Structured documents, where consistent formatting is essential for accurate classification.

After the pages are grouped, you can review and adjust them as needed, then create new layouts from these groups.

The sections below explain how to use Document Drift Management to streamline the management of your layouts.

Using Document Drift Management

To access Document Drift Management, click on Submissions in the left-hand sidebar, and then click the Unmatched tab.

The Unmatched tab displays submissions with pages that did not match any existing layouts during the Classification process. This tab is the starting point for managing and organizing unmatched pages using Document Drift Management.

Follow the steps described below to learn how to navigate each one and use the grouping process effectively.

Step 1: Unmatched Submissions

The Unmatched Submissions tab displays a table of all submissions with unmatched pages.

Navigating Unmatched Submissions tab

Filters

Define a specific range of submission IDs by entering values for its beginning and end:

  • Submission ID Start

  • Submission ID End

  • Submission Date — Choose a date range you want to see results for.

Managing Columns

Manage the table’s columns from its menu ().

Columns

The Unmatched Submissions table displays the following columns:

  • Submission ID — The specific ID of the submission

  • Submission Date — The date the submission was uploaded

  • Unmatched/Total Pages — The number of unmatched pages and the total pages in the submission

  • Unmatched Page IDs — Displays the specific ID of the unmatched pages within the submission. You can also see each unmatched page with the preview button ( ).  Click it to lock the preview, and use keyboard arrows to go through pages.

  • Grouping Job — The consecutive ID number of the clustering job. Also indicates the job’s status:

    • Queued — The Grouping job is pending. Cancel it by clicking thebutton.

    • Running — The Grouping job is currently processing.

    • Failed — The Grouping job failed. You can find the flow error stacktrace in the info icon.

    • Canceled — The Grouping job was canceled by the user.

Grouping Pages

Follow the steps below to run a grouping job:

Grouping jobs in Document Drift Management run in the trainer. If the trainer is busy, your job will be queued.

  1. Select the submissions containing the pages you want to group. Check the pages by using the preview button.

  2. After you’ve made your selections, click Group Pages. The system initiates a visual clustering process, grouping pages based on pixel-level similarities.

  3. After the grouping job is complete, click on the Page Groups tab.

Step 2: Group Pages

The Page Groups tab displays a list of all clusters of similar pages that have been grouped together. It helps you manage these groups, review ungrouped pages, and refine them before creating new layouts. Learn how to navigate the tab and how to use the triage experience below.

You need a minimum of 5 similar pages to create a group. If you have less than 5 similar pages, they will go to the Ungrouped category. However, you can manually create groups with fewer pages.

Navigating the Page Groups tab

The Page Groups table contains the following columns:

  • Group Name — The name of the clusters.

  • Updated — The date the group was updated and the number of additional pages added after the last update.

  • Pages — The number of pages in the group.

  • Actions — A menu with options to download pages or to archive or delete the group.

    • To archive the group, select it and click Archive from the Actions menu.

Your archived groups can be found in the Archive tab.

Triage Groups

Organize unmatched pages into layouts based on visual similarity using the triage experience.

Best for Structured documents. Similar pages of Semi-structured pages may be grouped together. Extra pages, like fax cover sheets, and those with fewer than 5 similar pages will go to the Ungrouped category.

To start the triage process, select the groups you want to manage, then click Triage Groups.

If you don’t select a group, clicking the Triage Groups button will open the Ungrouped category.

Follow the steps provided in the in-product guidance to begin the triage process. To proceed, you can either close the guidance or select "Don’t show this again" if you prefer not to see it in the future.

Find additional tips in the sections below.

Triage similar page groups

Review groups

While pages are automatically grouped based on visual similarities, it's important to manually review and adjust these groups to ensure accuracy. To create a group:

  1. Click on a group in the list on the left to view its contents and start the triage process.

  2. Click on a page to display it in the preview section.

  3. Select multiple pages and click New Group from Selection to create a more precise grouping of those pages. After you’ve created a new group from the selected pages, an icon appears, indicating that the group is manual:

  1. Send the selected pages to the Unmatched group by clicking the button under the page preview. Once in the Unmatched group, the page may be automatically regrouped.

If you manually create a group, similar pages will not be automatically added to it in the future. Ensure manual groups are accurately defined and adjusted as needed.

Define layout properties

You can define the following properties of the layout:

  • Layout Name — Typically the name of the form

  • Layout Code - Serves as a "note" that helps identify the layout during triage. It's an extra way to differentiate between layouts that may have the same name. Not to be confused with the layout identifiers in Structured layouts.

Select a unique layout and page codes. Enter this information for each group to help cross-reference and suggest potential layout variations.

Groups are cumulative, so new pages matching existing groups will be added in future runs.

  • Layout Type — Choose the layout type based on your documents.

  • Index — The page index can help identify if any pages are missing from a group. If a group lacks a specific page index, it may indicate that a page was not included in the submission or was missed during grouping.

Each group in the triage experience represents a single-page index, not the entire layout. For example, if you have pages 1, 2, and 3, one group will represent page 1, another group will represent page 2, and so on. This framework ensures that each page type is grouped and processed individually.

Click Save

Ensure all properties are filled out, as they are required to proceed with the triage process and save the changes.

Define Blank form

When creating a new Structured layout, provide a blank version of each page. For best results, use an original form that hasn't been filled in. Clustering is most effective with Structured documents, where consistent formatting helps in accurate grouping.

When dismissing pages, a group can have 0 pages at the end. This behavior is expected if the group still has a blank page assigned or if it belongs to a Potential Layout.

If an original blank form isn’t available, you can set it manually by using our blanking tool. See the Create Blank Forms section of this article to learn how to redact your page and create a layout from it.

After you’ve defined the blank form, your group is triaged, and a layout can be created based on the clustered pages.

The Blanking tool will be integrated into the triage experience in future versions of the product.

Triage Ungrouped

The number of ungrouped pages will increase over time. You can triage them when you have enough similar examples.

  1. Review pages - Review the pages in the Ungrouped folder and delete any unnecessary pages, like fax cover sheets or blank sheets.

  2. Manually create page groups

    Select sets of pages in the Ungrouped folder to manually create additional groups as needed.

  3. Follow the steps above for manually created page groups.

Step 3 - Create Potential Layouts

Once you have triaged your groups, they will be ready for layout creation. Find them in the Potential Layouts tab.

  • Create a layout based on a group by clicking the Create Layout button.

  • From the Actions menu, you can edit the name of your layout, or you can archive or delete it.

  • Access the Layout Editor by clicking the Open Layout button.

Creating blank forms

The Blanking tool will be integrated into the triage experience in future versions of the product.

To create a blank manually:

  1. Open the Layout Editor.

  2. Click Edit Layout variation image to access the blanking tool.

  1. Draw redaction boxes over areas of the image you would like to remove from the layout variation.

    • Use the keyboard shortcuts to copy-paste redaction boxes.

    • Adjust the location of the redaction box by using the keyboard’s arrows.

    • Press the Backspace button to delete a redaction box.

  2. Click Preview to see the form with the blank fields.

  3. Click Save Changes.

  4. Create your layout by following the steps in the “Using the Layout Editor” section of Creating Structured Layouts.  

Limitations

This section outlines the known limitations of the Document Drift Management feature to help you better plan and manage your document-processing workflow.

  • Uploading blanks — When uploading a blank form, only the first page of a multi-page blank PDF will be uploaded. This limitation can pose challenges if your source document consists of multiple pages.

  • Layout variations — Layout variations are not directly supported within Document Drift Management. However, they can be manually created after the initial blank form is uploaded or by using the Layout Editor.

  • Clustering Job minimum — Clustering works with a minimum of 5 pages per group.

  • Clustering Job Limit: The clustering job can process a maximum of 10,000 pages at a time. Since groups are cumulative, you can rerun clustering on the same selection set to process additional pages beyond the initial 10,000.

  • Triage Group Limit: A maximum of 500 groups can be triaged at once. To manage this effectively, it's recommended to focus on smaller batches, typically around 10-50 groups at a time.