ChangelogBook a demoSign up

Change data capture (CDC)

Overview

Hightouch optimizes syncs by sending only the rows that have changed since the last run.

This process is called change data capture (CDC), or diffing.

Without CDC, every sync would resend the entire dataset, causing unnecessary API calls, slower syncs, and wasted downstream processing.


How CDC works

Whenever a new sync is triggered, Hightouch compares the previous sync run to the current set of query results. To do this, Hightouch keeps a record of the data sent in the last sync. This record is the diff file.

CDC only considers mapped model columns when creating and updating the diff file. The only exception is custom destinations, such as the HTTP Request destination, which consider all columns.

Steps in the CDC process:

  1. Hightouch queries the source using the defined model.
  2. Hightouch compares the current results to the results from the previous sync.
  3. Hightouch identifies differences by primary key:
    • New rows
    • Updated rows
    • Removed rows
  4. Hightouch syncs only the changes to the destination, including any failed rows from the previous run.
  5. A new diff file is created for the next comparison.

CDC simplified diagram


Requirements

CDC relies on a unique primary key for every row in a model.
If primary keys are missing or duplicated, Hightouch can’t reliably track changes.

See Primary key requirements →.


CDC by sync mode

CDC behavior depends on the sync mode:

  • Insert mode: only syncs rows whose primary key wasn’t present in the previous run.
  • Update / Upsert modes: syncs rows where values have changed.
  • All and Archive modes: do not perform CDC; every row is sent every run.

See Sync types and modes →.


Difference-based CDC

Hightouch uses a method called difference-based CDC because it compares the full before/after query results.

This is required when syncing from warehouses, since they cannot produce CDC logs for arbitrary SQL queries or dbt models.

In contrast, OLTP databases (like Postgres or MySQL) log incremental changes natively, which ETL tools often use when sending data into warehouses.

Because Hightouch does the reverse (sending data from warehouses), log-based CDC isn’t possible. This is a key distinction between ETL and reverse ETL.


When CDC occurs

CDC happens after Hightouch receives the query results from the source and before data is written to the destination.

If you see the sync status Querying in the UI, Hightouch is in one of these states:

  • Waiting for query results
  • Saving the diff file
  • Performing CDC computation

Where CDC is computed

By default, CDC is performed on Hightouch-managed infrastructure.

For supported sources, you can enable the Lightning Sync Engine to compute CDC directly in your warehouse.

  • Basic engine: CDC runs on Hightouch infra.
  • Lightning engine: CDC runs in your warehouse, enabling faster syncs at higher volumes.

Basic vs Lightning sync


Where CDC data is stored

Hightouch stores previous query results (diff files) to compute changes.

  • By default, diff files are stored in an encrypted Hightouch-managed bucket.
  • Some plans allow you to bring your own bucket → to store diffs in your own infrastructure.

Bring your own bucket

Diff files are retained for 30 days in Hightouch-managed storage.
If a sync doesn’t run for more than 30 days, you’ll need to reset CDC or run a full resync.


Resetting or resyncing CDC

Sometimes you may need to reset how CDC tracks changes. Common cases:

  • Primary key changes: Hightouch prompts you to reset CDC when you change a model’s primary key.
  • Resync full query: forces Hightouch to resend all rows, then resume CDC tracking.
  • Reset CDC: ignores existing state and starts fresh on the next run (no data is sent during the reset run).

See Resync and reset docs →.


FAQ

What happens if I change my model configuration?

As explained in the primary key updates section, if you alter a model's primary key by selecting a different column, you will be prompted to reset the change data capture for all syncs that depend on that model. Be careful when modifying primary keys. If you keep the same column name but alter the way its values are calculated, some records may be added or deleted in your destination, depending on how your sync is configured.

Learn more about changes to your model configuration in the model column changes section.

Does Hightouch keep historical CDC files?
No. Only the most recent diff file is stored and compared against.

What happens if my sync fails?
CDC re-attempts failed rows on the next run.

What happens if I stop syncing for more than 30 days?
Diff files in managed storage expire after 30 days. You’ll need to reset CDC or perform a full resync.

What happens if I change sync mappings?
Changing mappings may cause a full reprocess. See Field mapping updates.


Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Privacy PolicyTerms of Service

Last updated: Oct 2, 2025

On this page
  • Overview
  • How CDC works
  • Requirements
  • CDC by sync mode
  • Difference-based CDC
  • When CDC occurs
  • Where CDC is computed
  • Where CDC data is stored
  • Resetting or resyncing CDC
  • FAQ
  • Related articles

Was this page helpful?