Greenplum reduces data silos by providing you with a single, scale-out environment for converging analytic and operational workloads, like streaming ingestion.
Hightouch lets you pull data stored in your Greenplum database and push it to downstream destinations. Most of the setup occurs in the Hightouch UI, but you need access to your Greenplum instance for information like your host, port, database name, and credentials.
You need to allowlist Hightouch's IP addresses to let our systems contact your warehouse. Reference our networking docs to determine which IPs you need to allowlist.
To get started, go to the Sources overview page and click the Add source button. Select Greenplum Database and follow the steps below.
Hightouch can connect directly to Greenplum over the public internet or via an SSH tunnel. Since data is encrypted in transit via TLS, a direct connection is suitable for most use cases. You may need to set up a tunnel if your Greenplum instance is on a private network or virtual private cloud (VPC).
Hightouch supports both standard and reverse SSH tunnels. To learn more about SSH tunneling, refer to Hightouch's tunneling documentation.
Enter the following required fields into Hightouch:
- Host: The hostname or IP address of your Greenplum server.
- Port: The port number of your Greenplum server. The default port number is 5432, but yours may be different.
- Database: This specifies the database to use when Hightouch executes queries in Greenplum.
For optimal performance, Hightouch tracks incremental changes in your data model—such as added, changed, or removed rows—and only syncs those records. You can choose between two different sync engines for this work.
The standard engine requires read-only access to Greenplum. Hightouch executes a query in your database, reads all query results, and then determines incremental changes using Hightouch's infrastructure. This engine is easier to set up since it requires read—not write—access to Greenplum.
The Lightning engine requires read and write access to Greenplum. The engine stores previously synced data in a separate schema in Greenplum managed by Hightouch. In other words, the engine uses Greenplum to track incremental changes to your data rather than performing these calculations in Hightouch. Therefore, these computations are completed more quickly.
If you select the standard engine, you can switch to the Lightning engine later. Once you've configured the Lightning engine, you can't move back to the standard engine without recreating Greenplum as a source.
To learn more, including migration steps and tips, check out the Lightning sync engine docs.
The Lightning sync engine requires granting write access to your data warehouse, which makes its setup more involved than the standard sync engine. However, it is more performant and reliable than the standard engine. This makes it the ideal choice to guarantee faster syncs, especially with large data models. It also supports more features, such as Warehouse Sync Logs, Match Booster, and Identity Resolution.
|Criteria||Standard sync engine||Lightning sync engine|
|Ideal for large data models (over 100 thousand rows)||No||Yes|
|Resilience to sync interruptions||Normal||High|
|Extra features||None||Warehouse Sync Logs, Match Booster, Identity Resolution|
|Ease of setup||Simpler||More involved|
|Location of change data capture||Hightouch infrastructure||Greenplum schemas managed by Hightouch|
|Required permissions in Greenplum||Read-only||Read and write|
|Ability to switch||You can move to the Lightning engine at any time||You can't move to the standard engine once Lightning is configured|
To set up the Lightning engine, you need to grant Hightouch write access to Greenplum. You can do so by running the following SQL snippet.
CREATE USER hightouch_user WITH PASSWORD '********'; CREATE SCHEMA IF NOT EXISTS hightouch_audit; CREATE SCHEMA IF NOT EXISTS hightouch_planner; GRANT CREATE, USAGE ON SCHEMA hightouch_audit TO hightouch_user; GRANT CREATE, USAGE ON SCHEMA hightouch_planner TO hightouch_user;
The snippet creates a dedicated Greenplum user for Hightouch. It also provisions two schemas (
hightouch_audit) for storing logs of previously synced data.
Enter the following fields into Hightouch:
- User: This can be your personal Greenplum login or a dedicated user for Hightouch. At minimum, this user must have read access to the data you wish to sync. If using the Lightning sync engine, you must also grant this user additional permissions as described above.
- Password: The password for the user specified above.
When setting up a source for the first time, Hightouch validates the following:
- Network connectivity
- Greenplum credentials
- Permission to list schemas and tables
- Permission to write to
- Permission to write to
All configurations must pass the first three, while those with the Lightning engine must pass all of them.
Some sources may initially fail connection tests due to timeouts. Once a connection is established, subsequent API requests should happen more quickly, so it's best to retry tests if they first fail. You can do this by clicking Test again.
If you've retried the tests and verified your credentials are correct but the tests are still failing, don't hesitate to .
Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from Greenplum.
The Greenplum source supports these modeling methods:
- writing a query in the SQL editor
- using the visual table selector
- leveraging existing dbt models
- leveraging existing Looker Looks
- leveraging existing Sigma workbooks
To date, our customers haven't experienced any errors while using this source. If you run into any issues, please don't hesitate to . We're here to help.