Identity resolution is only available on Business tier plans. You can use it with or without Customer Studio.
To set up identity resolution rules, you must first configure your input models and select the appropriate identifiers. Please review the information in the Model Configuration page before proceeding.
This page goes through the merge and limit rule configuration:
Merge rules instruct Hightouch how should it should try to find connections between records. For example, two users may have the same email, or two events may have the same anonymous_id.
You can build complex merge rules using the merge rule builder to nest and/or conditions.
Hightouch fully supports exact comparison with up to roughly 1 billion input rows (total number of rows from input models).
🧪 Hightouch also has an experimental feature for fuzzy comparison that works with small input row sizes in Snowflake and Databricks. Please reach out to your account rep if you want to test fuzzy comparison.
You can also use Hightouch's out-of-the-box transformations / data cleaning mechanisms to improve the likelihood of accurate matches:
Case insensitive
Normalize (convert multiple consecutive spaces to a single space and remove spaces from the beginning and end of strings)
Rule sets allow you to group and evaluate merge rules in sequence. This locks in the results of each rule set (assuming no limit rules are violated) before moving on to the next set to try and merge in additional records.
Low confidence rules (e.g. match on first and last name) can merge different actual identities together and introduce limit rule violations.
Rule sets allow you to use these rules more confidently by running them after higher confidence rules (e.g. match on user ID) and undoing them if they merge records together that violate limit rules.
If you combine low and high confidence rules into a single rule set like this:
The profiles (represented by the HT_ID being the same) would look like this—note that none of the rows related to the John Does got merged together because of the limit rule violation:
The issue here is that you want the rows with john.doe@acme.com and john.doe@dundermifflin.com to be merged into 2 separate profiles with their respective emails because email is a high confidence match. The merge rule on first and last name, however, merges the 2 profiles with separate emails and their associated rows together, causing the limit rule to be hit. This happens because 1 row from each profile matches the other on first and last name.
With rule sets, you can separate the low and high confidence rules into different rule sets:
The result looks like this—notice that we now have three HT_IDs representing three unique profiles. We first merge records on email. Then we try merging records on first name and last name, and only for those merges that would introduce limit rule violations, we fall back to the profiles from the previous rule set:
After each rule set gets evaluated, we check for profiles that exceed any limit rules and, if found, don't merge in any of the new records from that rule set's evaluation for that profile.
Once the limit rule check completes, the profiles formed from that rule set get locked in so that if subsequent rule sets introduce limit rule violations, those only unmerge records that were merged during that particular rule set's evaluation, not previous rule sets' evaluation.
Ready to get started?
Jump right in or a book a demo. Your first destination is always free.