With the SQL editor modeling method, you can write a SQL query to define your model. The SQL editor can execute whatever SQL is native to your warehouse or database.
Your query depends on the data in your model. For example, if you are using the Sample B2B SaaS Source, you can use the following query to select the entire
SELECT * FROM public.users
Before you can continue, you must click Preview to see a preview of the data. By default, Hightouch limits the preview to the first 100 rows. Once you've confirmed that the preview contains the data you expect, click Continue.
You can't save a model if your query doesn't return any results. If you need to save a model with such a query because you expect there to be results in the future, see the save a model without results section.
Click Finish to complete your model setup.
All models require a unique primary key. If your dataset doesn't inherently include any truly unique columns, you can use SQL to either:
- filter out duplicate rows, if you're not concerned about losing data from one or more of the duplicated rows
- create a composite column to use for your primary key
As explained in the primary key updates section, if you update a model's primary key by selecting a different column, you need to trigger a full resync for all syncs that use that model. Otherwise, change data capture can't process your model data correctly, which can make your syncs fail.
You don't need to manually trigger a full resync if you change the primary key column's data type. If you change the primary key's data type in the model configuration, your sync will process normally. If you make this change in your source or in the SQL editor, the entire model query result set is automatically resynced as if you triggered a full resync. As outlined in the full resync prerequisites section, this can create duplicates in your destination data.
When you write a
SELECT statement to define your model, you can use a
GROUP BY clause to retrieve only the unique rows based on the column you want to use for your primary key:
SELECT * FROM users GROUP BY primary_key_column
This query returns only one row for each unique
primary_key_column value, while ignoring the duplicates.
This query returns one row for each
primary_key_column value, so if there are multiple rows with the same
primary_key_column value, only one of them will be returned.
If you want your model's query results to return every row in your dataset, it's better to create a composite primary key.
If your dataset doesn't inherently include a truly unique primary key, you can create one. For example, in a CRM, you may have users that are identified by email, but belong to multiple organizations:
firstname.lastname@example.org belongs to both
org_id:3. Their email address is unique within each organization but non-unique across organizations. Therefore, if you were to select
To ensure a sync with this type of data includes all records, you can use a hash function for the model's primary key. The hash should combine enough columns in the data to create a unique value, for example,
SELECT org_id, email, created_at, HASH(CONCAT(org_id, '-', email)) AS composite_key FROM users
This query selects the
created_at columns, and creates a
The composite_key column contains a hash of the concatenation of
You would then select the
composite_key column as the primary key for the model.
Because the composite keys for the
email@example.com rows are unique, Hightouch would sync both rows to the destination.
You can't save a model if your query doesn't return any results. If you need to save a model with such a query because you expect there to be results in the future, add the following SQL to the end of your query:
UNION ALL select 'ignore', 'ignore', 'ignore'
This SQL adds a row of 'ignore' values that don't match against anything in your destination but are always present so you can save the query.
If your column datatype isn't a
string, you can replace 'ignore' with a
The number of 'ignore' or
null values needs to be the same as the number of columns in your model.
If you want to exclude rows with
null values at a model level, you can include the following SQL in your model defintiion:
SELECT * FROM your_table WHERE column1 IS NOT NULL AND column2 IS NOT NULL AND column3 IS NOT NULL;
Be sure to change
column2, etc. to your column names and to include any columns where you believe there could be