BigQuery Iceberg

BigQuery is a fully-managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.

Overview

This source is currently in preview. Please for more details.

Hightouch lets you sync data from BigLake tables using the Apache Iceberg REST catalog to downstream destinations.

BigLake Iceberg tables use the Apache Iceberg open table format, with data stored as Parquet files in Google Cloud Storage (GCS). Table metadata is managed by the BigLake Metastore and accessible via the Iceberg REST Catalog, enabling interoperability between BigQuery, Apache Spark, and other Iceberg-compatible engines.

Hightouch discovers your tables and schemas via the BigLake Iceberg REST Catalog API, and uses BigQuery as the query engine to read your data.

If you already have a standard BigQuery source in Hightouch, you still need to create a separate BigQuery Iceberg source. The two source types use different table discovery mechanisms and cannot be combined.

GCP prerequisites

Before configuring the source in Hightouch, you need to set up several GCP resources. Each step below shows both Google Cloud Console and gcloud CLI methods—you only need to follow one.

Step 1: Enable required APIs

BigLake Iceberg requires the following APIs on your GCP project:

BigQuery API (bigquery.googleapis.com)
BigLake API (biglake.googleapis.com)

Console

Go to the APIs & Services dashboard.
Click Enable APIs and Services.
Search for BigQuery API and click Enable (if not already enabled).
Repeat for BigLake API.

gcloud CLI

gcloud services enable bigquery.googleapis.com \
  --project=YOUR_PROJECT_ID

gcloud services enable biglake.googleapis.com \
  --project=YOUR_PROJECT_ID

Step 2: Create a Cloud Storage bucket

Iceberg table data (Parquet files and metadata) is stored in a GCS bucket. If your Iceberg tables already exist in a bucket, you can use that bucket. Otherwise, create a new one.

The bucket must not use any of the following features, which are incompatible with BigLake Iceberg:

Hierarchical namespaces
Object versioning
Object lock or bucket lock
Customer-supplied encryption keys (CSEK)

Console

Go to Cloud Storage > Buckets.
Click Create.
Enter a globally unique bucket name (e.g., your-company-iceberg-data).
Choose a Location type that matches where your BigQuery jobs run (e.g., us-central1 or US multi-region).
For Default storage class, select Standard.
For Access control, select Uniform.
Leave all other settings at their defaults—do not enable versioning, retention policies, or object lock.
Click Create.

gcloud CLI

gcloud storage buckets create gs://YOUR_BUCKET_NAME \
  --project=YOUR_PROJECT_ID \
  --location=YOUR_REGION \
  --uniform-bucket-level-access

Step 3: Create a service account for Hightouch

Hightouch needs a GCP service account with permissions to query BigQuery, read from the Iceberg REST catalog, and access GCS. If you already have a service account from a standard BigQuery source, you can reuse it—just add the additional roles below.

Role	Purpose
BigQuery User (`roles/bigquery.user`)	Run queries
BigQuery Data Viewer (`roles/bigquery.dataViewer`)	Read table data via BigQuery
BigLake Viewer (`roles/biglake.viewer`)	Read table metadata from the Iceberg REST catalog
Storage Object Viewer (`roles/storage.objectViewer`)	Read Parquet data files from GCS
Service Usage Consumer (`roles/serviceusage.serviceUsageConsumer`)	Required for REST catalog API billing attribution

Console

Go to IAM & Admin > Service Accounts.
Click Create Service Account.
Enter a name (e.g., hightouch-bigquery-iceberg) and click Create and Continue.
Add the roles listed above and click Continue.
Click Done.
Click on the newly created service account, go to the Keys tab, and click Add Key > Create new key > JSON. Download and save the key file.

gcloud CLI

SA_EMAIL="hightouch-bigquery-iceberg@YOUR_PROJECT_ID.iam.gserviceaccount.com"

gcloud iam service-accounts create hightouch-bigquery-iceberg \
  --display-name="Hightouch BigQuery Iceberg" \
  --project=YOUR_PROJECT_ID

for role in roles/bigquery.user roles/bigquery.dataViewer roles/biglake.viewer roles/serviceusage.serviceUsageConsumer; do
  gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="$role"
done

gcloud storage buckets add-iam-policy-binding gs://YOUR_BUCKET_NAME \
  --member="serviceAccount:$SA_EMAIL" \
  --role=roles/storage.objectViewer

gcloud iam service-accounts keys create hightouch-key.json \
  --iam-account="$SA_EMAIL"

Step 4: Verify your setup

Verify that your bucket is accessible as an Iceberg REST catalog. The BigLake Metastore automatically maps GCS buckets to Iceberg catalogs—no explicit catalog creation is needed.

You can verify by querying the REST catalog config endpoint (requires gcloud authentication):

curl -s -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "x-goog-user-project: YOUR_PROJECT_ID" \
  "https://biglake.googleapis.com/iceberg/v1/restcatalog/v1/config?warehouse=gs://YOUR_BUCKET_NAME"

A successful response includes a prefix field and a list of supported endpoints:

{
  "overrides": {
    "prefix": "projects/YOUR_PROJECT_NUMBER/catalogs/YOUR_BUCKET_NAME"
  },
  "endpoints": ["GET /v1/{prefix}/namespaces", ...]
}

If you already have Iceberg tables in your bucket (created by Spark, Flink, or another engine), they will be automatically discoverable through the REST catalog. No migration is needed.

Connection configuration

To get started, go to the Sources overview page and click the Add source button. Select BigQuery Iceberg and follow the steps below.

Configure your service account

Select the GCP credentials you previously created or click Create new. To learn more about these credentials, see the Google Cloud Provider (GCP) documentation.

Configure your source

Enter the following required fields into Hightouch:

Project ID: Your GCP project ID.
Dataset location: The geographic location of your data (e.g., us-central1 or US).
Iceberg catalog warehouse URI: The GCS bucket URI that backs your Iceberg REST catalog (e.g., gs://your-company-iceberg-data). Hightouch uses this to discover namespaces, tables, and schemas.

Choose your sync engine

BigQuery Iceberg currently supports the Basic sync engine only. The Lightning sync engine will be supported in a future release.

Test your connection

When setting up a source for the first time, Hightouch validates the following:

Network connectivity
BigQuery Iceberg credentials
Permission to list schemas and tables
Permission to write to hightouch_planner schema
Permission to write to hightouch_audit schema

All configurations must pass the first three, while those with the Lightning engine must pass all of them.

Some sources may initially fail connection tests due to timeouts. Once a connection is established, subsequent API requests should happen more quickly, so it's best to retry tests if they first fail. You can do this by clicking Test again.

If you've retried the tests and verified your credentials are correct but the tests are still failing, don't hesitate to .

Next steps

Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from your Iceberg tables.

The BigQuery Iceberg source supports these modeling methods:

writing a query in the SQL editor
using the visual table selector

The table selector browses tables discovered from your Iceberg REST catalog. The SQL editor allows you to write arbitrary BigQuery SQL, including queries that join Iceberg tables with standard BigQuery tables.

Known limitations

Lightning sync engine: Not yet supported. BigQuery DML (INSERT, UPDATE, DELETE) is not available on REST-catalog-managed Iceberg tables. This is a Google Cloud limitation expected to be resolved in a future BigQuery release.
Write operations: Hightouch reads from Iceberg tables but does not write to them. Data must be loaded into Iceberg tables using an Iceberg-compatible writer (Spark, Flink, PyIceberg, etc.).

Tips and troubleshooting

If you encounter an error or question not listed below and need assistance, don't hesitate to . We're here to help.

"Not found" errors when querying tables

BigQuery queries against REST catalog tables use four-part naming: project.catalog.namespace.table. If you see "Dataset not found" errors, verify that:

Your GCS bucket name is correct in the Iceberg catalog warehouse URI field
The Hightouch service account has the biglake.viewer and serviceusage.serviceUsageConsumer roles
The BigLake API is enabled on your project

Tables not appearing in the table selector

If your Iceberg tables don't appear in the table selector:

Verify the tables exist in the REST catalog by querying the namespaces endpoint (see Step 4)
Ensure the tables are in a namespace other than hightouch_planner or hightouch_audit (these are filtered from discovery)
Try refreshing the schema in the Hightouch UI

BigQuery Iceberg

Overview

GCP prerequisites

Step 1: Enable required APIs

Console

gcloud CLI

Step 2: Create a Cloud Storage bucket

Console

gcloud CLI

Step 3: Create a service account for Hightouch

Console

gcloud CLI

Step 4: Verify your setup

Connection configuration

Configure your service account

Configure your source

Choose your sync engine

Test your connection

Next steps

Known limitations

Tips and troubleshooting

"Not found" errors when querying tables

Tables not appearing in the table selector

Ready to get started?

Need help?

Feature requests?