Search documentation...

K

Warehouse Sync Logs

Warehouse Sync Logs are only available on Business Tier plans.

The Warehouse Sync Logs feature writes sync metadata back into your data warehouse. It makes per-row information from the live debugger available in your warehouse so that you can perform complex analysis, rather than just inspect syncs row-by-row.

When you enable Warehouse Sync Logs, Hightouch logs a corresponding row for every row processed during the sync. This includes the row's status and any errors from processing the row. You can then explore these logs using SQL or BI tools you use on top of your warehouse.

For example, you can:

  • Categorize all the errors in your sync using regular expressions and find unexpected errors.
  • Filter out previously failed rows from your model using a JOIN.
  • Aggregate the sync history to see what rows are changing the most. Flapping rows can be a sign of data integrity issues.
  • Visualize row changes in your models over time. For example, you may be interested in seeing how targeted users in an ad campaign changed over the campaign duration.

Refer to the example queries section for concrete examples.

Schema

Hightouch writes sync logs into three tables within the hightouch_audit schema:

  • Changelog: This table contains a row for every operation performed by Hightouch. It includes the result of the operation and any error messages from syncing.
  • Snapshot: This table contains each row's latest status in your model. The information is similar to the Changelog table, but since it contains the latest status, it's easier to query for some use cases.
  • Sync runs: This table contains a log of all the sync runs. You can JOIN the changelog and snapshot tables to this table for more information on when the sync occurred and how it was configured.

Information across all syncs is written into these same three tables—you can differentiate which rows were part of which sync using the sync_id column.

See the detailed schema section for a detailed description of the available columns.

Setup

Enabling Warehouse Syncs logs requires you to enable the Lightning sync engine first. Hightouch supports using Warehouse Sync Logs with the following sources:

  • Snowflake
  • BigQuery
  • Redshift
  • Databricks
  • Postgres

Required permissions

The user you used to connect your source to Hightouch must be able to write into the hightouch_audit schema. You shouldn't require any additional permissions once you've set up the Lightning sync engine.

Enable Warehouse Sync Logs for a sync

Warehouse Sync Logs are off for all syncs by default. To enable them on a particular sync:

  1. Ensure the Lightning sync engine is enabled for the source.
  2. Go to the Sync Logs tab in the sync's overview page.
  3. Enable your desired tables: Snapshot, Changelog, and/or Sync runs.

Enabling sync log ables

Example queries

The following example queries are written for Snowflake, but you could create similar queries for other sources. Check out Hightouch's dbt package for more use cases.

Get the most common sync error

This SQL groups and counts rows by failure_reason, enabling you to find the most common sync error.

select
  failure_reason,
  count(*) as c
from hightouch_audit.sync_snapshot
where failure_reason is not null
group by failure_reason
order by c desc

Result of most common errors query

Track when users entered and exited a model

This SQL tracks when users enter and exit a model. It's particularly useful when used with Customer Studio audiences and visualized in a BI tool.

with details as (
  select
    model_name,
    row_id,
    op_type as type,
    started_at as timestamp,
    lag(op_type) over(partition by model_name, row_id order by started_at) as lag_type
  from hightouch_audit.sync_changelog c
  join hightouch_audit.sync_runs r on c.sync_id = r.sync_id
  where op_type != 'changed'
  order by model_name, row_id
)

select
  row_id as user_id,
  model_name as audience,
  type,
  timestamp
from details
where (lag_type != type or lag_type is null)
order by model_name, row_id, timestamp

Result of audience changes query

Get the current rows in all models

This SQL finds all current (most recently synced rows that didn't fail) across all models. It's particularly useful to find all memmers for audiences created in Customer Studio.

with model_names as (
  select distinct
    sync_id,
    model_name
  from hightouch_audit.sync_runs
)

select
  model_name,
  row_id as user_id
from hightouch_audit.sync_snapshot s
join model_names r on s.sync_id = r.sync_id
where s.status != 'failed'
qualify row_number() over (partition by user_id, model_name order by null) = 1
order by user_id

Result of current audiences query

Detailed schema

Hightouch writes to the sync_changelog,sync_snapshot, and sync_runs tables after each sync.

Changelog table

This hightouch_audit.sync_changelog table is an append-only log of all changes across all sync runs. If the same row is synced in multiple sync runs, it has multiple entries in this table.

COLUMNDESCRIPTION
sync_idThe ID of the sync
sync_run_idThe ID of the sync run
op_typeWhether the row was added, changed, or removed relative to the last run. This is computed by Hightouch when planning the sync run
row_idThe value of the row's primary key as defined from the model
statusWhether the row was successfully synced into destination. They value may be: succeeded - the row was successfully synced, failed - Hightouch attempted to sync the row, but it failed to sync, or aborted - Hightouch planned to sync the row, but didn't attempt to sync. This may happen if the sync may have been cancelled, or the sync encountered a fatal error that terminated the run early.
failed_reasonIf the status is failed, this field contain a string describing why the row failed to sync.
fieldsA JSON object of the raw data from the model that is being synced into destination. Note that this is the raw data from the warehouse, not the payload that Hightouch sent to the destination. This column isn't supported on Redshift.
split_group(Optional) The split group name from /audiences/splits. If no syncs are using Audience Splits, this column isn't created.

Snapshot table

This hightouch_audit.sync_snapshot table stores the current status of each row in the most recent sync run, even if the row wasn't synced in the most recent run.

After each run, the old statuses for the sync are dropped and replaced with updated statuses.

COLUMNDESCRIPTION
sync_idThe ID of the sync
op_typeWhether the row was added, changed, or unchanged relative to the last run
row_idThe value of the row's primary key as defined from the mode
statusThe status of the row. See the sync_changelog.status description for a list of possible statuses
failed_reasonIf the status is failed, this contains a string describing why the row failed to sync
fieldsThe fields from the model for this row. See the sync_changelog.fields description for more information
split_group(Optional) The split group name from /audiences/splits. If no syncs are using Audience Splits, this column isn't created.

Sync runs table

This hightouch_audit.sync_runs table stores general metadata information about each sync run. You can join the sync_changelog and sync_snapshot tables using the sync_id column.

COLUMNDESCRIPTION
sync_idThe ID of the sync
sync_run_idThe ID of the sync run
primary_keyThe primary key column of your sync as defined on the model attached to the sync.
destinationThe destination type, for example, Salesforce or Braze
model_nameThe name of the model attached to the sync
model_idThe ID of the model attached to the sync
statusThe status of the sync run. This will be either succeeded or failed. In general, the per-row results of the sync are a better indication of status.
errorThe sync-level error if the sync terminated early
started_atWhen the sync run started
finished_atWhen the sync run finished
num_planned_addThe number of planned adds.
num_planned_changeThe number of planned changes.
num_planned_removeThe number of planned removes.
num_attempted_addThe number of planned adds that were actually attempted.
num_attempted_changeThe number of planned changes that were actually attempted.
num_attempted_removeThe number of planned removes that were actually attempted.
num_succeeded_addThe number of planned adds that were successfully synced to the destination
num_succeeded_changeThe number of planned changes that were successfully synced to the destination
num_succeeded_removeThe number of planned removes that were successfully synced to the destination
num_failed_addThe number of planned adds that were attempted, but failed to get synced into destination
num_failed_changeThe number of planned changes that were attempted, but failed to get synced into destination
num_failed_removeThe number of planned removes that were attempted, but failed to get synced into destination

FAQ

What's the performance impact of enabling Warehouse Sync Logs?

The performance impact of enabling Warehouse Sync Logs is low since it reuses data already present from the Lightning sync engine. Hightouch only writes the rows after Hightouch syncs to the destination, meaning there is no effect on destination throughput.

Pruning entries in the history tables is safe. Doing so doesn't affect future syncs, though the deleted rows won't be be rewritten into the history tables.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Last updated: Mar 18, 2023

On this page

SchemaSetupRequired permissionsEnable Warehouse Sync Logs for a syncExample queriesGet the most common sync errorTrack when users entered and exited a modelGet the current rows in all modelsDetailed schemaChangelog tableSnapshot tableSync runs tableWhat's the performance impact of enabling Warehouse Sync Logs?

Was this page helpful?