Search documentation...

K

Overview

Between syncs, Hightouch will automatically identify the incremental changes in your data model, so that only updates (added, changed, or removed records) are sent to destinations. By default, Hightouch figures out what rows need to be synced by pulling all the rows within your data model, and then diffing on Hightouch's infrastructure. See more info on this here
This can be slow for large Models that take a long time to query.
With Warehouse Planning, however, Hightouch does this diff directly in your warehouse by storing previously synced data into a special schema managed by Hightouch. Our larger enterprise customers are typically using our in-warehouse diffing, for performance reasons.
This functionality currently supports the following warehouses:
  • Snowflake
  • BigQuery
  • Redshift
  • Postgres
  • Databricks

Granting Required Warehouse Permissions

Warehouse Planning stores metadata about syncs in your warehouse within a schema called hightouch_planner. The user that Hightouch uses to connect to your warehouse must have permissions to create this schema, and create tables within it.

Snowflake

Run the following SQL snippet
GRANT CREATE SCHEMA ON DATABASE <your_database> TO <hightouch_role>

BigQuery

Grant the bigquery.datasets.create permission to the Hightouch user.

Redshift / Postgres

Run the following SQL snippet
GRANT CREATE ON DATABASE <your_database> TO <hightouch_user>

Precreating the Warehouse Schema

If you don't want to grant Hightouch access to create schemas in your warehouse, you may also precreate the schema and just give Hightouch access to create and delete tables within it.
Note: When precreating schema, make sure you also pre-create hightouch_audit schema. This is used for Warehouse Sync Logs, and when checking the permission Hightouch will check whether hightouch_audit exists too. If warehouse sync history is not enabled, Hightouch will not write anything to hightouch_audit schema.

BigQuery

To enable this for BigQuery:
  1. Create the schema by running the following SQL
    CREATE SCHEMA hightouch_planner
    CREATE SCHEMA hightouch_audit
    
  2. Give your GCP service account access to run jobs and list the schema by granting it the following roles:
  • roles/bigquery.jobUser
  • roles/bigquery.metadataViewer
  1. Give your GCP service account access to manage tables within the hightouch_planner schema by running the following SQL
    GRANT `roles/bigquery.dataViewer`, `roles/bigquery.dataEditor`
    ON SCHEMA hightouch_planner
    TO "serviceAccount:$USER@$PROJECT.iam.gserviceaccount.com"
    
    GRANT `roles/bigquery.dataViewer`, `roles/bigquery.dataEditor`
    ON SCHEMA hightouch_audit
    TO "serviceAccount:$USER@$PROJECT.iam.gserviceaccount.com"
    

Snowflake

To enable this for Snowflake:
  1. Create the schema by running the following SQL
    CREATE SCHEMA hightouch_planner
    CREATE SCHEMA hightouch_audit
    
  2. Grant Hightouch user access to planner and audit schema
    GRANT OWNERSHIP ON SCHEMA hightouch_planner TO hightouch_user
    
    GRANT OWNERSHIP ON SCHEMA hightouch_audit TO hightouch_user
    

Enabling Warehouse Planning For Your Syncs

To enable Warehouse Planning in Hightouch, make sure that you've granted the proper permissions for one of our supported sources (see above), then navigate to your respective source and click the "Enable" button within the Warehouse Planning section:

Migrating from remote diffing

If you enable Warehouse Planning on an existing source, any syncs on the old diffing method will be gracefully migrated to Warehouse Planning on the next run. Therefore, any previously synced data will not be unnecessarily resynced. This works by writing the sync history back into the warehouse.
Here are some tips if you're performing this migration:
  • Be aware that Warehouse Planning requires your primary key to be unique. This is a requirement for syncs to work well in general, but it is only strictly enforced in Warehouse Planning. If you're unsure if your primary key is unique, you may want to check that it's unique before migrating.
  • If you have a large number of syncs that will be migrated, please reach out so that we can help you migrate incrementally.
  • If you hit any errors, they will likely be fixed with a full resync if your sync is idempotent and there aren't any removes that need to be executed.

Common Errors

Primary Key Is Not Unique

Warehouse Planning requires that every row in your Model have a unique primary key value. This is because the primary key is used to uniquely identify each row (e.g. whether it failed previously).
Models with duplicate primary keys can be deduped in SQL with the ROW_NUMBER() function. This generally looks like:
WITH your_data_model AS (
  // copy/paste your model here
)

WITH your_data_model_with_rank AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY your_primary_key) as rank FROM your_data_model
)

SELECT * FROM your_data_model_with_rank WHERE rank=1
The example above de-duplicate your model by partitioning on your primary key. It will choose an arbitrary record from your duplicates. You can use ORDER BY or WHERE to filter and select the data you want from duplicates.

    Need help?

    Our team is relentlessly focused on your success. We're ready to jump on a call to help unblock you.

    • Connection issues with your data warehouse?
    • Confusing API responses from destination systems?
    • Unsupported destination objects or modes?
    • Help with complex SQL queries?

    or

    Feature Requests?

    If you see something that's missing from our app, let us know and we'll work with you to build it!

    We want to hear your suggestions for new sources, destinations, and other features that would help you activate your data.

On this page

OverviewGranting Required Warehouse PermissionsSnowflakeBigQueryRedshift / PostgresPrecreating the Warehouse SchemaBigQuerySnowflakeEnabling Warehouse Planning For Your SyncsMigrating from remote diffingCommon ErrorsPrimary Key Is Not Unique

Was this page helpful?