Kafka Listener

The Kafka Listener is available in v37.0.10 and later.

With the Kafka Listener, you can integrate your Apache Kafka deployment with Hyperscience, allowing you to take advantage of Kafka’s low-latency, high-throughput messaging capabilities in your submission-creation process.

When you set up a Kafka Listener connection in a flow’s Input Block, the Listener ingests messages from the Kafka topics you specify and creates submissions from those messages.

Sample use cases

I send submission information from external systems to a Kafka topic. I want to ingest that information into Hyperscience so I can create and process submissions there.
My customers post relevant documents to a set of Kafka topics. I want to then ingest these documents into Hyperscience, where submissions will be created from them for processing.

Using the Kafka Listener in SaaS deployments

When using the Kafka Listener in a SaaS deployment of Hyperscience, you need to ensure that Hyperscience has access to your Kafka brokers. The way to do that depends on where your Kafka brokers are located.

If your Kafka brokers are located on..	…then:
A public interface	Ensure that Hyperscience has access to each Kafka broker through the configured port (e.g., opening the firewall for inbound and outbound traffic to that port).
A private network	Enable additional listeners on each of the brokers' instances, and create a way for Hyperscience to reach each of those listeners. For example, you can do so in the following ways: Add a public interface on each of the Kafka instances, and add a Kafka listener on the public interface:port combination, as well as a Kafka advertised listener on the same interface:port. Add a single public interface that routes to each of the Kafka brokers, each on a different port as the existing interface:port combinations configured. Each of the Kafka brokers should then be configured with advertised listeners on the same public interface:port combination routing to the new private interface:port listeners.

If your Kafka brokers are located on..

…then:

A public interface

Ensure that Hyperscience has access to each Kafka broker through the configured port (e.g., opening the firewall for inbound and outbound traffic to that port).

A private network

Enable additional listeners on each of the brokers' instances, and create a way for Hyperscience to reach each of those listeners. For example, you can do so in the following ways:

Add a public interface on each of the Kafka instances, and add a Kafka listener on the public interface:port combination, as well as a Kafka advertised listener on the same interface:port.
Add a single public interface that routes to each of the Kafka brokers, each on a different port as the existing interface:port combinations configured. Each of the Kafka brokers should then be configured with advertised listeners on the same public interface:port combination routing to the new private interface:port listeners.

Contact your Hyperscience representative for more information.

Message format

A JSON object must be passed to the Kafka topic in order for Hyperscience to read the appropriate input image files. It should have the same format as the request payload for the Submission Creation API endpoint, with one key difference: in the message payload, the files element is an array of file URLs and does not contain individual file_url elements.

To prevent the same message from being processed twice, include an external_id for each message.

An example message payload is shown below.

{
  "flow_uuid": "a89d6440-a2c2-423b-8c95",
  "machine_only": "true",
  "external_id": "abcd1234",
  "files": [
    "s3://s3-bucket/input/demo-file.pdf",
    "https://example.com/files/demo-file.pdf",
    "ocs://2021"
  ]
}

For more information about the Submission Creation payload, see the Submission Creation section of our API documentation.

Block settings table

In addition to the settings outlined below, you can also configure the settings described in Universal Integration Block Settings.

Connection settings

Name	Required?	Description
Bootstrap Servers	Yes	Comma-separated list of bootstrap servers the Kafka Listener ingests messages from. Each bootstrap server is a host/port pair the Listener uses for establishing the initial connection to the Kafka cluster.
Group ID	Yes	The group ID that will be used for all consumers created for this input connection
Topics	Yes	Comma-separated list of topic names that the input connection will ingest messages from
Consumer Config	No	The Consumer Config settings for the connection (e.g., authentication settings) in JSON format. The Kafka Listener supports all of the settings listed in Apache Kafka’s Consumer Configs documentation.

Consumer polling settings

Name	Required?	Description
Consume Timeout	Yes	The maximum number of seconds that pass before the connection times out and a new set of messages is ingested. Default is -1 (infinite). The actual timeout may be impacted by the value of Consume Num Messages. For more details, see Consume Timeout and Consume Num Messages.
Consume Num Messages	Yes	The maximum number of messages that will be ingested before a new submission is created. These messages are used to create the submission. Default is 1. The actual number of messages included in a submission may be impacted by the value of Consume Timeout. For more details, see Consume Timeout and Consume Num Messages.

Name

Required?

Description

Consume Timeout

Yes

The maximum number of seconds that pass before the connection times out and a new set of messages is ingested. Default is -1 (infinite).

The actual timeout may be impacted by the value of Consume Num Messages. For more details, see Consume Timeout and Consume Num Messages.

Consume Num Messages

Yes

The maximum number of messages that will be ingested before a new submission is created. These messages are used to create the submission. Default is 1.

The actual number of messages included in a submission may be impacted by the value of Consume Timeout. For more details, see Consume Timeout and Consume Num Messages.

Specifying Consumer Config settings in the “.env” file

You have the option of specifying the Consumer Config settings in the “.env” file rather than in the block settings. When doing so, add “HS_KAFKA_CONSUMER_CONFIG_” before the setting name (e.g., ssl.key.password becomes HS_KAFKA_CONSUMER_CONFIG_ssl.key.password).

You do not need to specify Consumer Config settings in both the block settings and “.env” file. If the same settings are specified with different values in the block settings and the “.env” file, the values in the block settings are used.

Consume Timeout and Consume Num Messages

Whichever value of Consume Timeout or Consume Num Messages is reached first determines how many messages are included in a submission.

Examples

Assume Consume Timeout is 10 seconds and Consume Num Messages is 5. If only 3 new messages are ingested in 10 seconds, a new submission is created with the 3 new messages, and the connection is reset.

Assume Consume Timeout is 10 seconds, and Consume New Messages is 5. If 5 new messages are ingested in 5 seconds, a new submission is created with the 5 new messages. The connection is then reset, even though the 5 seconds remaining in the timeout window haven’t passed yet.

Creating multiple Kafka consumers

The system automatically creates multiple threads for the Kafka Listener, with each thread acting as a Kafka consumer. Each consumer has its own Kafka partitions assigned, allowing parallel processing to occur.

By default, the number of consumers created is the same as the number of CPU cores present on the application machine where the Kafka Listener is running. To change the number of consumers, add the BLOCK_THREADS_KAFKA_TRIGGER variable to the “.env” file, and set its value to the number of consumers you want to create for the Kafka Listener.