BigQuery is a fully-managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.
Overview
Hightouch lets you sync data from BigLake tables using the Apache Iceberg REST catalog to downstream destinations.
BigLake Iceberg tables use the Apache Iceberg open table format, with data stored as Parquet files in Google Cloud Storage (GCS). Table metadata is managed by the BigLake Metastore and accessible via the Iceberg REST Catalog, enabling interoperability between BigQuery, Apache Spark, and other Iceberg-compatible engines.
Hightouch discovers your tables and schemas via the BigLake Iceberg REST Catalog API, and uses BigQuery as the query engine to read your data.
If you already have a standard BigQuery source in Hightouch, you still need to create a separate BigQuery Iceberg source. The two source types use different table discovery mechanisms and cannot be combined.
GCP prerequisites
Before configuring the source in Hightouch, you need to set up several GCP resources. Each step below shows both Google Cloud Console and gcloud CLI methods—you only need to follow one.
Step 1: Enable required APIs
BigLake Iceberg requires the following APIs on your GCP project:
- BigQuery API (
bigquery.googleapis.com) - BigLake API (
biglake.googleapis.com)
Console
- Go to the APIs & Services dashboard.
- Click Enable APIs and Services.
- Search for BigQuery API and click Enable (if not already enabled).
- Repeat for BigLake API.
gcloud CLI
gcloud services enable bigquery.googleapis.com \
--project=YOUR_PROJECT_ID
gcloud services enable biglake.googleapis.com \
--project=YOUR_PROJECT_ID
Step 2: Create a Cloud Storage bucket
Iceberg table data (Parquet files and metadata) is stored in a GCS bucket. If your Iceberg tables already exist in a bucket, you can use that bucket. Otherwise, create a new one.
The bucket must not use any of the following features, which are incompatible with BigLake Iceberg:
- Hierarchical namespaces
- Object versioning
- Object lock or bucket lock
- Customer-supplied encryption keys (CSEK)
Console
- Go to Cloud Storage > Buckets.
- Click Create.
- Enter a globally unique bucket name (e.g.,
your-company-iceberg-data). - Choose a Location type that matches where your BigQuery jobs run (e.g.,
us-central1orUSmulti-region). - For Default storage class, select Standard.
- For Access control, select Uniform.
- Leave all other settings at their defaults—do not enable versioning, retention policies, or object lock.
- Click Create.
gcloud CLI
gcloud storage buckets create gs://YOUR_BUCKET_NAME \
--project=YOUR_PROJECT_ID \
--location=YOUR_REGION \
--uniform-bucket-level-access
Step 3: Create a service account for Hightouch
Hightouch needs a GCP service account with permissions to query BigQuery, read from the Iceberg REST catalog, and access GCS. If you already have a service account from a standard BigQuery source, you can reuse it—just add the additional roles below.
| Role | Purpose |
|---|---|
BigQuery User (roles/bigquery.user) | Run queries |
BigQuery Data Viewer (roles/bigquery.dataViewer) | Read table data via BigQuery |
BigLake Viewer (roles/biglake.viewer) | Read table metadata from the Iceberg REST catalog |
Storage Object Viewer (roles/storage.objectViewer) | Read Parquet data files from GCS |
Service Usage Consumer (roles/serviceusage.serviceUsageConsumer) | Required for REST catalog API billing attribution |
Console
- Go to IAM & Admin > Service Accounts.
- Click Create Service Account.
- Enter a name (e.g.,
hightouch-bigquery-iceberg) and click Create and Continue. - Add the roles listed above and click Continue.
- Click Done.
- Click on the newly created service account, go to the Keys tab, and click Add Key > Create new key > JSON. Download and save the key file.
gcloud CLI
SA_EMAIL="hightouch-bigquery-iceberg@YOUR_PROJECT_ID.iam.gserviceaccount.com"
gcloud iam service-accounts create hightouch-bigquery-iceberg \
--display-name="Hightouch BigQuery Iceberg" \
--project=YOUR_PROJECT_ID
for role in roles/bigquery.user roles/bigquery.dataViewer roles/biglake.viewer roles/serviceusage.serviceUsageConsumer; do
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:$SA_EMAIL" \
--role="$role"
done
gcloud storage buckets add-iam-policy-binding gs://YOUR_BUCKET_NAME \
--member="serviceAccount:$SA_EMAIL" \
--role=roles/storage.objectViewer
gcloud iam service-accounts keys create hightouch-key.json \
--iam-account="$SA_EMAIL"
Step 4: Verify your setup
Verify that your bucket is accessible as an Iceberg REST catalog. The BigLake Metastore automatically maps GCS buckets to Iceberg catalogs—no explicit catalog creation is needed.
You can verify by querying the REST catalog config endpoint (requires gcloud authentication):
curl -s -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: YOUR_PROJECT_ID" \
"https://biglake.googleapis.com/iceberg/v1/restcatalog/v1/config?warehouse=gs://YOUR_BUCKET_NAME"
A successful response includes a prefix field and a list of supported endpoints:
{
"overrides": {
"prefix": "projects/YOUR_PROJECT_NUMBER/catalogs/YOUR_BUCKET_NAME"
},
"endpoints": ["GET /v1/{prefix}/namespaces", ...]
}
If you already have Iceberg tables in your bucket (created by Spark, Flink, or another engine), they will be automatically discoverable through the REST catalog. No migration is needed.
Connection configuration
To get started, go to the Sources overview page and click the Add source button. Select BigQuery Iceberg and follow the steps below.
Configure your service account
Select the GCP credentials you previously created or click Create new. To learn more about these credentials, see the Google Cloud Provider (GCP) documentation.
Configure your source
Enter the following required fields into Hightouch:
- Project ID: Your GCP project ID.
- Dataset location: The geographic location of your data (e.g.,
us-central1orUS). - Iceberg catalog warehouse URI: The GCS bucket URI that backs your Iceberg REST catalog (e.g.,
gs://your-company-iceberg-data). Hightouch uses this to discover namespaces, tables, and schemas.
Choose your sync engine
BigQuery Iceberg currently supports the Basic sync engine only. The Lightning sync engine will be supported in a future release.
Test your connection
When setting up a source for the first time, Hightouch validates the following:
- Network connectivity
- BigQuery Iceberg credentials
- Permission to list schemas and tables
- Permission to write to
hightouch_plannerschema - Permission to write to
hightouch_auditschema
All configurations must pass the first three, while those with the Lightning engine must pass all of them.
Some sources may initially fail connection tests due to timeouts. Once a connection is established, subsequent API requests should happen more quickly, so it's best to retry tests if they first fail. You can do this by clicking Test again.
If you've retried the tests and verified your credentials are correct but the tests are still failing, don't hesitate to .
Next steps
Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from your Iceberg tables.
The BigQuery Iceberg source supports these modeling methods:
- writing a query in the SQL editor
- using the visual table selector
The table selector browses tables discovered from your Iceberg REST catalog. The SQL editor allows you to write arbitrary BigQuery SQL, including queries that join Iceberg tables with standard BigQuery tables.
Known limitations
- Lightning sync engine: Not yet supported. BigQuery DML (INSERT, UPDATE, DELETE) is not available on REST-catalog-managed Iceberg tables. This is a Google Cloud limitation expected to be resolved in a future BigQuery release.
- Write operations: Hightouch reads from Iceberg tables but does not write to them. Data must be loaded into Iceberg tables using an Iceberg-compatible writer (Spark, Flink, PyIceberg, etc.).
Tips and troubleshooting
If you encounter an error or question not listed below and need assistance, don't hesitate to . We're here to help.
"Not found" errors when querying tables
BigQuery queries against REST catalog tables use four-part naming: project.catalog.namespace.table. If you see "Dataset not found" errors, verify that:
- Your GCS bucket name is correct in the Iceberg catalog warehouse URI field
- The Hightouch service account has the
biglake.viewerandserviceusage.serviceUsageConsumerroles - The BigLake API is enabled on your project
Tables not appearing in the table selector
If your Iceberg tables don't appear in the table selector:
- Verify the tables exist in the REST catalog by querying the namespaces endpoint (see Step 4)
- Ensure the tables are in a namespace other than
hightouch_plannerorhightouch_audit(these are filtered from discovery) - Try refreshing the schema in the Hightouch UI