Search documentation...

K

Google BigQuery

Overview

Hightouch lets you pull data stored in your Google BigQuery warehouse and push it to downstream destinations.

Connecting Hightouch to BigQuery requires some setup in both platforms. It's recommended to set up a service account with the correct permissions in BigQuery before configuring the connection in Hightouch.

You need to allowlist Hightouch's IP addresses to let our systems contact your warehouse. Reference our networking docs to determine which IPs you need to allowlist.

BigQuery credential setup

Setup in BigQuery has three main steps:

  1. Enable BigQuery for your Google Cloud project
  2. Create a service account
  3. Grant the Hightouch service account access to your project

Create a project and enable BigQuery API

  1. Login to the Google Developers Console.
  2. Configure the Cloud Platform:
  3. Copy your Project ID for later use.
  4. Find the location of your BigQuery dataset or sets. You can find this by querying the INFORMATION_SCHEMA.SCHEMATA view or by visiting the Google Cloud web console and clicking on a BigQuery dataset in the Explorer panel. You need both Project ID and Data location when connecting Hightouch to BigQuery.

Dataset Location in the Google Cloud Console

Make sure billing is enabled on your project, otherwise Hightouch can't write into the cluster.

Create a service account

To create a service account, follow the setup instructions in our Google Cloud Provider (GCP) documentation.

Grant access

By default, your GCP service account doesn't have permission to read data from BigQuery. You can grant this access by assigning the bigquery.user and bigquery.dataViewer roles to your service account.

You can do this in the Google Cloud web console or by running these snippets in the Cloud Shell.

Grant permission to read metadata and list tables

gcloud projects add-iam-policy-binding <YOUR_PROJECT_NAME> \
--member serviceAccount:<YOUR_SERVICE_ACCOUNT> \
--role roles/bigquery.user

Grant permission to read data from tables and views

gcloud projects add-iam-policy-binding <YOUR_PROJECT_NAME> \
--member serviceAccount:<YOUR_SERVICE_ACCOUNT> \
--role roles/bigquery.dataViewer

Connection configuration

To get started, go to the Sources overview page and click the Add source button. Select BigQuery and follow the steps below.

Configure your service account

Select the GCP credentials you previously created or click Create new. To learn more about these credentials, see the Google Cloud Provider (GCP) documentation.

Configure your source

Enter the Project ID for the project you enabled the BigQuery API for and the Dataset location.

Choose your sync engine

For optimal performance, Hightouch tracks incremental changes in your data model—such as added, changed, or removed rows—and only syncs those records. You can choose between two different sync engines for this work.

The standard engine requires read-only access to BigQuery. Hightouch executes a query in your database, reads all query results, and then determines incremental changes using Hightouch's infrastructure. This engine is easier to set up since it requires read—not write—access to BigQuery.

The Lightning engine requires read and write access to BigQuery. The engine stores previously synced data in a separate schema in BigQuery managed by Hightouch. In other words, the engine uses BigQuery to track incremental changes to your data rather than performing these calculations in Hightouch. Therefore, these computations are completed more quickly.

Standard vs Lightning engine comparison
Hightouch recommends using the Lightning sync engine when syncing more than one million rows of data.

If you select the standard engine, you can switch to the Lightning engine later. Once you've configured the Lightning engine, you can't move back to the standard engine without recreating BigQuery as a source.

To learn more, including migration steps and tips, check out the Lightning sync engine docs.

Standard versus Lightning engine comparison

Standard sync engineLightning sync engine
Required permissions in BigQueryRead-onlyRead and write
Location of change data captureHightouch infrastructureBigQuery schemas managed by Hightouch
Sync speedSlowerQuicker
Ease of setupSimplerMore involved
Ability to switchYou can move to the Lightning engine at any timeYou can't move to the standard engine once you've configured Lightning

Lightning engine permissions

To enable the Lightning engine, you need to provide your service account additional permissions to create schemas read/write data.

Run the following snippet to provision the hightouch_planner and hightouch_audit schemas, which are used for storing logs of previously synced data.

CREATE SCHEMA IF NOT EXISTS `hightouch_audit`;
CREATE SCHEMA IF NOT EXISTS `hightouch_planner`;
GRANT `roles/bigquery.dataViewer`, `roles/bigquery.dataEditor` ON SCHEMA `hightouch_planner` TO "serviceAccount:<YOUR_SERVICE_ACCOUNT>";
GRANT `roles/bigquery.dataViewer`, `roles/bigquery.dataEditor` ON SCHEMA `hightouch_audit` TO "serviceAccount:<YOUR_SERVICE_ACCOUNT>";

Test your connection

When setting up a source for the first time, Hightouch validates the following:

  • Network connectivity
  • BigQuery credentials
  • Permission to list schemas and tables
  • Permission to write to hightouch_planner schema
  • Permission to write to hightouch_audit schema

All configurations must pass the first three, while those with the Lightning engine must pass all of them.

Some sources may initially fail connection tests due to timeouts. Once a connection is established, subsequent API requests should happen more quickly, so it's best to retry tests if they first fail. You can do this by clicking Test again.

If you've retried the tests and verified your credentials are correct but the tests are still failing, don't hesitate to .

Next steps

Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from BigQuery.

The BigQuery source supports these modeling methods:

When syncing large amounts of data, it can take a long time for your model to reflect changes made in BigQuery. To speed up BigQuery model updates in the Hightouch UI, you can preview a model and save it. New or updated columns should then be reflected in the Hightouch UI.

You may also want to consider storing sync logs in BigQuery. Like using the Lightning sync engine versus the standard one, this feature lets you use BigQuery instead of Hightouch infrastructure. Rather than performance gains, it makes your sync log data available for more complex analysis. Refer to the warehouse sync logs docs to learn more.

You must enable the Lightning sync engine to store sync logs in your warehouse.

Tips and troubleshooting

To date, our customers haven't experienced any errors while using this source. If you run into any issues, please don't hesitate to . We're here to help.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Last updated: Jan 31, 2023

On this page

OverviewBigQuery credential setupCreate a project and enable BigQuery APICreate a service accountGrant accessConnection configurationConfigure your service accountConfigure your sourceChoose your sync engineTest your connectionNext stepsTips and troubleshooting

Was this page helpful?