After implementing Hightouch Events alongside your existing system, it's crucial to validate that your new setup captures data correctly and consistently and can power your existing use cases.
There are two main steps in validation:
- Verify that event data is flowing from Hightouch Events with the correct properties and data types.
- Check that the data from Hightouch Events is comparable to the data from your existing provider in volume and in values, such as user IDs and event properties.
Verifying your setup
- Check event reception:
- Confirm that events are being received by both your current system and Hightouch Events.
- Use the Hightouch debugger to view incoming events in real-time.
- Verify event structure:
- Ensure all expected properties are present in the Hightouch events.
- Check that data types are correct (for example, numbers aren't being sent as strings).
- Test all event types:
- Manually trigger each type of event (page views, user identifications, custom events) in your application.
- Verify that they appear correctly in both systems.
Checking data quality
Compare the data between your current platform and Hightouch Events to make sure that Hightouch Events is instrumented correctly and sending the data you expect. We recommend checking both event volume and values.
If you're maintaining your data model during your migration or only making minor changes, validating data values will be more straightforward. Significant changes to the model model during migration will make validation more complex and time-consuming.
Compare event volume
Let's assume that you're migrating from Segment to Hightouch Events, and have data flowing from Segment into tables per Segment's schema and from Hightouch Events into tables per Hightouch's schema. We'll also assume events from during the migration have a migrationId
assigned through an analytics wrapper function, described in Step 2.
We could use the following query—or something adjusted to your warehouse and setup—to look at the count of identifies
events within the last 7 days. While we expect to see roughly the same count of events, there can be some variation—we'll cover why differences can occur later in this step of the guide.
WITH segment_identifies AS (
SELECT DATE(timestamp) AS event_date, migrationId, COUNT(*) AS segment_count
FROM SEGMENT.identifies
WHERE timestamp >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY DATE(timestamp), migrationId
),
hightouch_identifies AS (
SELECT DATE(timestamp) AS event_date, migrationId, COUNT(*) AS hightouch_count
FROM HIGHTOUCH.identifies
WHERE timestamp >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY DATE(timestamp), migrationId
)
SELECT
COALESCE(s.event_date, h.event_date) AS event_date,
s.migrationId,
s.segment_count,
h.hightouch_count,
s.segment_count - h.hightouch_count AS count_difference,
CASE
WHEN s.segment_count = h.hightouch_count THEN 'Match'
ELSE 'Mismatch'
END AS comparison_result
FROM segment_identifies s
FULL OUTER JOIN hightouch_identifies h ON s.event_date = h.event_date AND s.migrationId = h.migrationId
WHERE s.migrationId IS NOT NULL OR h.migrationId IS NOT NULL
ORDER BY event_date, migrationId;
Compare property values
We also need to validate that Hightouch Events is collecting the same values as your prior provider.
The query below looks at a selection of properties relevant to identifies
calls and compares between the two platforms over a 7-day period.
You can modify this query to examine different properties, use a different time window, or look at data in a narrower set of dates.
WITH segment_data AS (
SELECT
id AS segment_id,
migrationId,
anonymous_id,
user_id,
timestamp,
email,
name
FROM SEGMENT.identifies
WHERE timestamp >= DATEADD(day, -7, CURRENT_DATE())
),
hightouch_data AS (
SELECT
id AS hightouch_id,
migrationId,
anonymous_id,
user_id,
timestamp,
email,
name
FROM HIGHTOUCH.identifies
WHERE timestamp >= DATEADD(day, -7, CURRENT_DATE())
)
SELECT
s.migrationId,
COALESCE(s.user_id, h.user_id) AS user_id,
s.segment_id,
h.hightouch_id,
s.anonymous_id AS segment_anonymous_id,
h.anonymous_id AS hightouch_anonymous_id,
s.timestamp AS segment_timestamp,
h.timestamp AS hightouch_timestamp,
CASE WHEN s.email = h.email THEN 'Match' ELSE 'Mismatch' END AS email_comparison,
CASE WHEN s.name = h.name THEN 'Match' ELSE 'Mismatch' END AS name_comparison,
s.email AS segment_email,
h.email AS hightouch_email,
s.name AS segment_name,
h.name AS hightouch_name
FROM segment_data s
FULL OUTER JOIN hightouch_data h ON s.migrationId = h.migrationId
WHERE
s.email != h.email
OR s.name != h.name
OR s.anonymous_id != h.anonymous_id
OR s.migrationId IS NULL
OR h.migrationId IS NULL
ORDER BY s.timestamp DESC
LIMIT 100; -- Limit to 100 rows for a manageable sample
What differences to expect between your old provider and Hightouch Events
Event volume should be approximately the same, but some discrepancies between systems are normal. Deployment rollout, ad blockers, and network errors could all cause events to appear in one tool but not another, leading to differences in volume.
- Timing differences: Events may be processed in slightly different orders or with small time variations.
- Dropped events: Network issues might cause events to be lost in one system but not the other.
- Duplicate events: Some events might be sent twice in edge cases (for example, page reloads).
- Blocking behavior: Other providers might not block events as strictly as Hightouch does when running type checks.
Hightouch Events should collect the same values as your previous event collection provider, though there may be minor variations in automatically collected fields.
In the next section, we'll explore how to unify your historical data with new Hightouch data.