Learn more about what data Hightouch stores, whether with Hightouch or your own infrastructure, to power your syncs.
Data needs to be stored at-rest for two purposes:
Depending on your privacy and compliance needs, Hightouch can be configured to store all data-at-rest within your Virtual Private Cloud (VPC), or in a secure, encrypted bucket hosted by Hightouch.
- If you’re on the Free, Starter, or Pro tier, this bucket will be hosted by Hightouch. For more info, see Managed by Hightouch.
- If you’re on the Business Tier, you can self-host in your own infrastructure (Amazon S3 or Google Cloud Storage). For more info, see Managed by customer.
After each sync, Hightouch stores query results and execution plans. When the next sync runs, Hightouch will use these previous sync files to determine incremental changes that should be sent downstream. These changes fall within three operation categories: added, changed, or removed.
Note: Hightouch also offers warehouse planning (on all tiers) which calculates this diffs directly within your data warehouse. Choosing this option will depend on your performance requirements (especially for larger syncs) and where you want the compute performed. Thanks to warehouse planning, Hightouch can either have write or read-only access to your warehouse with no loss in diffing functionality.
Learn more about how Change Data Capture & Diffing works in the Hightouch Core Concepts.
In addition to storing previous query results, Hightouch will also store row-level log metadata including success & failures, operations performed, and API request & response payloads. This data powers the in-app debugger and can be stored either in your VPC or Hightouch’s encrypted bucket.
If you’re on the Free, Starter, or Pro tier, query data-at-rest used to power Hightouch is stored in a secure, encrypted bucket managed by Hightouch. Data at-rest used for Change Data Capture & Diffing can be configured in your warehouse via warehouse planning; data powering the in-app debugger will rest in Hightouch’s infrastructure. If you require data-at-rest to live entirely in your VPC, see Managed by customer.
Data is automatically expired from Hightouch-managed buckets after 30 days.
If Change Data Capture & Diffing is done in Hightouch-managed buckets, syncs that have not run in over 30 days will require a Full Resync since Hightouch depends on diffing files to detect changes in the data model.
Business Tier customers can configure Hightouch to store all customer data-at-rest within your own external storage bucket, hosted in your Amazon S3 or Google Cloud Storage account. Doing so enables Hightouch to only process data-in-transit. Hightouch will use this bucket to power its core functionalities.
When using a customer-managed storage bucket, Hightouch places full control over object lifecycle, security, and expiration into your hands. We will not expire objects automatically, or modify your object encryption settings. Ensure that you've configured object expiration, encryption, and access control settings according to your needs.
If you've already run a sync after setting up a custom storage bucket, you will be unable to make further changes to your storage config. This is because changing your external storage configuration is disruptive to Hightouch syncs. If you need to make such a change, please reach out to customer support.
Before getting started, connect Hightouch with your Amazon Web Services account.
In Amazon S3, create your bucket. We recommend the name
Make sure to:
- Block all public access to the bucket.
- Enable Amazon S3 key encryption (SSE-S3).
- Disable bucket versioning.
- Configure your bucket object lifecycle, to enhance security and cut down on costs.
Hightouch supports authenticating with AWS using Cross-account roles (via STS AssumeRole), or with an Access Key ID / Secret Access Key that you provide. We strongly encourage you to use Cross-account roles, as it does not require Hightouch to hold any of your secrets.
To set up your Hightouch AWS credential, follow the documentation here.
Hightouch needs the following IAM actions to store and retrieve items from your bucket:
|Grants permission to retrieve objects from Amazon S3|
|Grants permission to add an object to a bucket|
|Grants permission to list some or all of the objects in an Amazon S3 bucket (up to 1000)|
Access the external bucket settings under Settings > Storage.
Select your AWS region, enter your bucket name, and select the AWS credentials you set up on step 2.
Once you save your settings, your new syncs will automatically start using your bucket.
Run a sync to test it out!
Before getting started, connect Hightouch with your Google Cloud account.
We recommend the name
<company>-hightouch-bucket. Copy the bucket name and save it for later.
Configure your bucket object lifecycle, to enhance security and cut down on costs.
Hightouch supports authenticating with GCP using Hightouch-managed service accounts, or by using a service account that you control.
To set up your Hightouch GCP credential, follow the documentation here.
Hightouch needs the following IAM permissions to store and retrieve items from your bucket:
|Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.|
|Grants permission to create, replace, and delete objects; list objects in a bucket; read object metadata when listing (excluding IAM policies); and read bucket metadata, excluding IAM policies.|
|* Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.|
Back in Hightouch, under Settings > Storage, enter the project name and bucket name. Select the GCP credentials you set up in Step 2.
Don't forget to click 'save'.
After you've saved your Google Cloud bucket settings in the external storage area in Hightouch, run a few syncs and visit your Google Cloud bucket to see the files that are saved there. Please contact us if you have any trouble.