Search documentation...

K

Learn the Core Concepts

Hightouch is a Data Activation platform that connects and orchestrates data from sources to business tools. The platform manages the varying integrations and logic to activate data models from sources.

Source to Destination

Sources

A Source is wherever business data is stored, ranging from a data warehouse, database, CSV, SFTP, or even a BI Tool. it's most commonly a source of truth for business data.

To add a Source to your Hightouch workspace, go to the Sources overview page and click the Add source button.

Destinations

A Destination is a tool or service receiving data from a Source. This is typically where end-users consume data (outside of analysis). Hightouch integrates with 100+ Destinations including CRM systems, ad platforms, marketing automation, and support tools.

To add a Destination to your Hightouch workspace, go to the Destinations overview page and click the Add destination button.

Models

Source to Model

In order for Hightouch to know what data to sync, a Model is defined. A Model organizes elements of data to be queried from a data source. For most Sources, a Model is defined with SQL; Hightouch sends the SQL directly to the Source to query data. Alternatively, a Model can be defined with dbt Models or Looker Looks to leverage existing data models.

Hightouch's visual audience builder can be used to segment a Model to build audiences or cohorts of data (with no code) before syncing the data to a destination tool. This process creates an Audience that generates SQL that acts as a segmented Model.

Regardless of how a Model is built, it's configured with a unique Primary Key that is used by Hightouch to search and keep track of records. This is important to ensure Hightouch is only syncing new and updated data to a destination tool. How Hightouch manages difference checking will be covered in Diffing.

Syncs

Sync to Destination

Once Hightouch knows what data model to query from a Source, a Sync is configured to map the data from the Source to the Destination. The Sync manages how a Destination will receive data from the Source as well as the frequency of the pipeline. A Sync can be scheduled to trigger periodically, manually, or automatically via Airflow Operator, dbt Cloud, or the REST API.

The configuration of a Sync varies from Destination to Destination, but for the most part, the experience is the same; declaratively map data from Source fields to Destination fields and determine a sync mode (Upsert, Insert, Update, etc). Some Destinations will have different sync types for varying data types, such as Users vs Accounts vs Events.

A single Model can be configured with multiple Syncs to different Destinations. For example, a Model containing customer data is commonly configured to sync between sales, marketing, and support tools. Doing so enables all business tools to leverage the same source of truth.

Sync to Destinations

Change data capture & diffing

Hightouch employs diffing to ensure the platform doesn't send excessive requests for all rows in a Model every time a sync triggers; only deltas in your data model are synced to your destinations. A record of the data mapped and synced between a Model and a Destination (the diff file) is updated after each run. When a new Sync runs, the diff file is used to identify incremental changes to the Model. This is how Hightouch is able to only send requests for new and changed data in a Model.

The Primary Key specified in the Model is used to search and track records. When a new Sync triggers, Hightouch compares the Primary Keys in the new dataset with the previous dataset in the diff file. If the Primary Key for a record exists in both the new dataset and in the diff file, Hightouch scans the columns for any changes to the data. If the Primary Key is missing in the new dataset, Hightouch considers this a deleted record, whereas if the Primary Key is missing in the diff file, Hightouch considers this a new record.

By default, the diffing compute is done by Hightouch's infrastructure (local diffing) and doesn't require WRITE permissions back to the Source. Alternatively, diffing can be done entirely in your warehouse with warehouse planning. This process has Change Data Capture computing done within the Source warehouse to achieve faster syncs at higher volumes. This gives you the flexibility of providing write or read-only access to your warehouse with no loss in functionality.

More on change data capture & diffing

What is mapped is tracked

Hightouch only tracks changes in columns based on the fields mapped in a Sync configuration. For example, if only 10 fields are being mapped in a Sync from a Model that queries 20 fields, Hightouch will only track these 10 fields.

New columns → new diff

If a new column is added to the Model and that column (field) is added to a Sync, Hightouch initiates an initial sync and creates a brand new diff file.****

New data type → data change

If a column's data type is changed in the Model (for example, String → Number), Hightouch will detect a row change and sync the row as a changed record.

Historical diffs

Hightouch only compares the diff file from a current sync with the most recent diff file from that sync. Hightouch doesn't maintain a historical record of all rows and all columns (fields) that have ever been sent.

If a row drops out of a Sync, it's considered a new row even though the row may have been sent in the past. Hightouch doesn't store all primary keys that have ever been sent. Consequently, Hightouch recommends the following 'best practice':

Your warehouse should be your single source of truth. it's not a good practice to update data only in your end tool.

Where diffs happen

When Hightouch executes a Sync, the platform runs a query against the specified Source and syncs the generated diff file to an cloud bucket where the diff check occurs. This is considered local diffing, and it can be hosted either on Hightouch's infrastructure or your own infrastructure.

If warehouse planning is enabled, instead of moving the diff file to a cloud bucket, Hightouch will store and compute the diff checking directly in the Source. Warehouse planning is significantly faster when dealing with millions or hundreds of millions of records.

When diffs happen

When a Sync is triggered, the Sync will display a status of "Querying." This status means the Sync is in one of the following 3 diff states:

  1. Hightouch is waiting for the Model's query to complete in the Source
  2. Hightouch is transferring the diff file to a S3 bucket
  3. Hightouch is performing the diff checking

    Need help?

    Our team is relentlessly focused on your success. We're ready to jump on a call to help unblock you.

    • Connection issues with your data warehouse?
    • Confusing API responses from destination systems?
    • Unsupported destination objects or modes?
    • Help with complex SQL queries?

    Feature Requests?

    If you see something that's missing from our app, let us know and we'll work with you to build it!

    We want to hear your suggestions for new sources, destinations, and other features that would help you activate your data.

On this page

SourcesDestinationsModelsSyncsChange data capture & diffingMore on change data capture & diffing

Was this page helpful?