S3 Listener

The S3 Listener allows you to ingest files from a specified AWS S3 URI.

Contents of submissions

The connector accepts both single files and prefixes with files as submissions. Whether a file is processed as its own submission or as part of a larger submission depends on where it is located relative to the source URI:

  • If it is directly under the source URI, the system processes it as an individual submission.

  • If it is under a prefix under the source URI, the system considers it part of a larger submission, which consists of all files directly under the prefix.

Only one level of nesting is recognized by the connector when creating submissions with multiple documents. If there are other prefixes contained within the prefix you specify, files under those prefixes are ignored.

Metadata

For each submission, the S3 Listener can accept a JSON file that contains metadata, case data, and an external_id. The name of this file depends on whether the submission consists of a single file or a set of files:

  • filename.ext.json for individual files, where filename is the name of the file and ext is the file's extension

  • prefixname.json for files under an S3 prefix, where prefixname is the name of a prefix under the source URI.

The metadata files must be located directly under the source URI. If you put those under a prefix of the source URI, they will be ignored, and a submission won't be created.

An example metadata file for an S3 Listener submission appears below.

{
   "metadata": {
        "test": "Metadata for file in S3 bucket"
   },
   "cases": [{
        "external_case_id": "900",
        "filenames": ["div_lic_1.jpg", "div_lic_2.jpg"]
   }],
   "external_id": "123"
}

Archiving processed files

As files are ingested, they are copied to an archive URI and deleted from the source URI. The names of files do not change when they are archived and include the files’ prefixes, if any.

If the deletion of files from the source URI leaves empty prefixes, those prefixes will not be deleted and will remain in the bucket.

Sample use cases

  • Another system places documents in an S3 bucket on a regular basis. I want to ingest those files one by one.

  • I want to regularly scan and ingest certain types of files under a certain prefix in an S3 bucket.

Block settings table

In addition to the settings outlined below, you can also configure the settings described in Universal Integration Block Settings.

Name

Required?

Description

S3 Source URI

Yes

The location the connector will scan for image files (in the format of s3://// or //).

S3 Archive URI

Yes

The location the connector will move files to after they have been ingested into Hyperscience (in the format of s3:////  or //).

File Extensions

Yes

A list of the extensions that image files will need to have to be eligible for processing.

If there are file extensions that you want to support but do not see in the drop-down list, select other, and enter the extensions in Other File Extensions.

Other File Extensions

No

A comma-separated list of file extensions that do not appear in File Extensions.

This field only appears if other is selected in File Extensions.

Include Submission Level Parameters

No

Indicates whether the system will ingest JSON files along with document files and submission S3 prefixes. These JSON files can contain information such as metadata, case data, and external_id values. These JSON file names should match the names of the related files or S3 prefixes (e.g., XYZ.jpg.json for XYZ.jpg).

Use AWS EC2 Instance IAM Role Credentials

Yes

If selected, credentials are obtained from the EC2 instance directly, and the AWS Access Key ID and Secret Key are not present.

This option applies only to on-premise instances. It must be disabled in SaaS deployments.

Enabled by default.

AWS Access Key ID

Yes, if Use AWS EC2 Instance IAM Role Credentials is not selected

The access key ID allows access to the source and archival buckets.

This setting is only available if Use AWS EC2 Instance IAM Role Credentials is not selected.

AWS Access Key ID must be provided in SaaS deployments.

Secret Access Key

Yes, if Use AWS EC2 Instance IAM Role Credentials is not selected

The secret access key allows access to the source and archive buckets.

To edit the key, click Edit value, modify the key, and then click Done.

This setting is only available if Use AWS EC2 Instance IAM Role Credentials is not selected.

Secret Access Key must be provided in SaaS deployments.

Poll Interval (In Seconds)

No

The frequency at which the connector will monitor the source URI for submissions.

Defaults to 10.

Warm-Up Interval (In Seconds)

No

The length of time that a file must remain unmodified before it is eligible for processing.

When uploading to a prefix, make sure all the files within it are for the same submission. Otherwise, one prefix with many files within it may be split into two or more submissions, depending on the length of the warm-up interval.

Defaults to 15.

Setting up the S3 Listener

To set up the S3 Listener, enter the settings as described in the Block settings table above.

Before deploying a flow with the S3 Listener enabled, ensure that the credentials you’ve specified in the block settings have the following permissions assigned:

  • ListBucket and PutObject for both the source and the archive URIs

  • GetObject and DeleteObject for the source UR

To test if the permissions have been properly set, click Test Connection at the bottom of the connector settings in Flow Studio. If the required permissions are present, no errors will be detected.