After implementing Hightouch Events alongside your existing system, it's crucial to validate that your new setup captures data correctly and consistently and can power your existing use cases.
There are two main steps in validation:
Verify that event data is flowing from Hightouch Events with the correct properties and data types.
Check that the data from Hightouch Events is comparable to the data from your existing provider in volume and in values, such as user IDs and event properties.
Compare the data between your current platform and Hightouch Events to make sure that Hightouch Events is instrumented correctly and sending the data you expect. We recommend checking both event volume and values.
If you're maintaining your data model during your migration or only making minor changes, validating data values will be more straightforward. Significant changes to the model model during migration will make validation more complex and time-consuming.
Let's assume that you're migrating from Segment to Hightouch Events, and have data flowing from Segment into tables per Segment's schema and from Hightouch Events into tables per Hightouch's schema. We'll also assume events from during the migration have a migrationId assigned through an analytics wrapper function, described in Step 2.
We could use the following query—or something adjusted to your warehouse and setup—to look at the count of identifies events within the last 7 days. While we expect to see roughly the same count of events, there can be some variation—we'll cover why differences can occur later in this step of the guide.
WITH segment_identifies AS (
SELECTDATE(timestamp) AS event_date, migrationId, COUNT(*) AS segment_count
FROM SEGMENT.identifies
WHEREtimestamp>= DATEADD(day, -7, CURRENT_DATE())
GROUPBYDATE(timestamp), migrationId
),
hightouch_identifies AS (
SELECTDATE(timestamp) AS event_date, migrationId, COUNT(*) AS hightouch_count
FROM HIGHTOUCH.identifies
WHEREtimestamp>= DATEADD(day, -7, CURRENT_DATE())
GROUPBYDATE(timestamp), migrationId
)
SELECTCOALESCE(s.event_date, h.event_date) AS event_date,
s.migrationId,
s.segment_count,
h.hightouch_count,
s.segment_count - h.hightouch_count AS count_difference,
CASEWHEN s.segment_count = h.hightouch_count THEN'Match'ELSE'Mismatch'ENDAS comparison_result
FROM segment_identifies s
FULLOUTERJOIN hightouch_identifies h ON s.event_date = h.event_date AND s.migrationId = h.migrationId
WHERE s.migrationId ISNOTNULLOR h.migrationId ISNOTNULLORDERBY event_date, migrationId;
We also need to validate that Hightouch Events is collecting the same values as your prior provider.
The query below looks at a selection of properties relevant to identifies calls and compares between the two platforms over a 7-day period.
You can modify this query to examine different properties, use a different time window, or look at data in a narrower set of dates.
WITH segment_data AS (
SELECT
id AS segment_id,
migrationId,
anonymous_id,
user_id,
timestamp,
email,
name
FROM SEGMENT.identifies
WHEREtimestamp>= DATEADD(day, -7, CURRENT_DATE())
),
hightouch_data AS (
SELECT
id AS hightouch_id,
migrationId,
anonymous_id,
user_id,
timestamp,
email,
name
FROM HIGHTOUCH.identifies
WHEREtimestamp>= DATEADD(day, -7, CURRENT_DATE())
)
SELECT
s.migrationId,
COALESCE(s.user_id, h.user_id) AS user_id,
s.segment_id,
h.hightouch_id,
s.anonymous_id AS segment_anonymous_id,
h.anonymous_id AS hightouch_anonymous_id,
s.timestamp AS segment_timestamp,
h.timestamp AS hightouch_timestamp,
CASEWHEN s.email = h.email THEN'Match'ELSE'Mismatch'ENDAS email_comparison,
CASEWHEN s.name = h.name THEN'Match'ELSE'Mismatch'ENDAS name_comparison,
s.email AS segment_email,
h.email AS hightouch_email,
s.name AS segment_name,
h.name AS hightouch_name
FROM segment_data s
FULLOUTERJOIN hightouch_data h ON s.migrationId = h.migrationId
WHERE
s.email != h.email
OR s.name != h.name
OR s.anonymous_id != h.anonymous_id
OR s.migrationId ISNULLOR h.migrationId ISNULLORDERBY s.timestamp DESC
LIMIT 100; -- Limit to 100 rows for a manageable sample
Event volume should be approximately the same, but
some discrepancies between systems are normal. Deployment rollout, ad blockers, and network errors could all cause events to appear in one tool but not another, leading to differences in volume.
Timing differences: Events may be processed in slightly different orders or with small time variations.
Dropped events: Network issues might cause events to be lost in one system but not the other.
Duplicate events: Some events might be sent twice in edge cases (for example, page reloads).
Blocking behavior: Other providers might not block events as strictly as Hightouch does when running type checks.
Hightouch Events should collect the same values as your previous event collection provider, though there may be minor variations in automatically collected fields.