Hightouch integrates directly with Apache Kafka to support high-throughput, distributed, or asynchronous workloads, letting you build a custom connector to your internal systems.
This destination was designed to be as flexible as possible. Some of its capabilities include:
connecting to multiple brokers
authenticating with Simple Authentication and Security Layer (SASL)
using your own certificate authority
publishing different topics for each message trigger
defining custom ordering and partition keys
Hightouch supports all managed Kafka services (Amazon MSK, Confluent Cloud, etc.) and can also connect to self-hosted instances.
The Client ID is a logical identifier of a client application—in this case it's the Hightouch Apache Kafka destination. It's used to distinguish each running application of your Kafka server. You can choose to name this anything you want that fits your use case.
For Hightouch to sync data to the right Kafka Brokers, you need to provide the host and port number in the format {host}:{port}. If you want to configure the destination to connect to multiple brokers, you can input the details separated by a comma. For example, you could enter {host1}:{port1},{host2}:{port2}.
Hightouch can connect to your Kafka server either with SASL or directly without authentication. For security purposes, it's best to configure your Kafka server to require SASL authentication when syncing production data, and to only omit authentication requirements for testing purposes.
When configuring your SASL mechanism in Hightouch, you have four options:
PLAIN
SCRAM SHA256
SCRAM SHA512
AWS IAM
All four provide the option to include your own self-signed certificate authority.
For Username and Password, enter your username and password configured in your Kafka server. If you are using a managed Kafka service, your details can usually be found in your environment's settings or as an API key and secret.
If you want to authenticate via AWS IAM, we assume your Kafka server is configured to use AWS IAM as an authentication method, that is STACK's Kafka AWS IAM LoginModule or a compatible alternative is installed on all target brokers.
Authorization Identity must be the aws:userid of the AWS IAM identity. Typically, you can retrieve this value using the aws iam get-user or aws iam get-role commands of the AWS CLI toolkit. The aws:userid is usually listed as the UserId or RoleId property of the response.
You can find your Access Key ID, Secret Access Key, and Session Token in your AWS account. For more information on AWS IAM credentials and authentication, refer to the official AWS docs.
Once you've connected your Kafka server to Hightouch, you've completed setup for a Apache Kafka destination in your Hightouch workspace. The next step is to configure a sync that send messages whenever rows are added, changed, or removed in your model.
Hightouch monitors your data model for added, changed, and removed rows. In this step, you specify which of these events should trigger message publishing.
In this step, you choose which topics to publish the messages to. Hightouch allows you to sync to existing topics that are already in your Kafka cluster.
Suppose you want to sync to multiple existing topics but don't want to create a new sync for every topic. As long as your model has a column associated to topic names in your Kafka cluster, Hightouch can sync to multiple Apache Kafka topics in just one sync.
To enable this feature, toggle USE COLUMN, and select a column in your model containing the topic name rows.
When syncing to multiple topics, if a topic name in the selected column of
your model doesn't exist in the Kafka cluster, then the entire batch of
messages will fail to sync.
With the JSON editor, you can compose any JSON object using the Liquid template language. This is useful for complex message data bodies containing nested objects and arrays, which can sometimes be difficult to model entirely in SQL.
This makes it so you can reference any column using the syntax {{row.column_name}}. You can also use advanced Liquid features to incorporate control flow and loops into your dynamic message data.
When injecting strings into your JSON object, be sure to surround the Liquid
tag in double quotes.
If you're already storing JSON data in your source, or if you have the ability to construct a JSON object using SQL, you can select one column in your model that already contains the full message data.
This setting is commonly used when syncing web events that have already been collected and stored as JSON objects in your database.
Along with your row data in JSON format, you can optionally include ordering keys to configure the order your Kafka cluster receives message and metadata fields as headers.
partition
A number field that determines which partition to send the message to. This field takes precedence over the key field. That is if you provided partition and key, the message will be sent to the partition stated in the partition field and not the key field. Hightouch automatically tries to cast the value to a number. If we can't cast the value to a number then it is sent as null.
key
If no partition column is selected but a key of string type is selected, then Kafka chooses a partition to send the message to based on a murmur2 hash of the key. For example, if you use an orderId as the key, you can ensure that all messages regarding that order will be processed in order.
If no partition or key is included, then the message will be sent to a partition in a round-robin fashion.
headers
This is an object containing key/value pairs of custom mapping fields.
In this step, you tell Hightouch how to handle rows present in your model results during the first sync run.
Certain workflows may require performing a backfill of all rows during the initial sync. For other use cases, you might only want to send messages in response to future data changes.
To date, our customers haven't experienced any errors while using this destination. If you run into any issues, please don't hesitate to . We're here to help.
Hightouch provides complete visibility into the API calls made during each of your sync runs. We recommend reading our article on debugging tips and tricks to learn more.