Search documentation...

K
ChangelogBook a demoSign up

Match rules

Identity resolution is only available on Business tier plans. You can use it with or without Customer Studio.

To set up identity resolution rules, you must first configure your input models and select the appropriate identifiers. Please review the information in the Model Configuration page before proceeding.

This page goes through the merge and limit rule configuration:

Merge rules

Merge rules instruct Hightouch how should it should try to find connections between records. For example, two users may have the same email, or two events may have the same anonymous_id.

You can build complex merge rules using the merge rule builder to nest and/or conditions.

Merge rules

Hightouch supports the following comparison operators:

  1. Exact
  2. Fuzzy
  1. Phonetic (Soundex)

You can also use Hightouch's out-of-the-box transformations / data cleaning mechanisms to improve the likelihood of accurate matches:

  • Case insensitive
  • Normalize (convert multiple consecutive spaces to a single space and remove spaces from the beginning and end of strings)
  • Number (ignore non-numeric characters)

Data cleaning

Limit rules

Limit rules allow you to prevent merging records if they would violate some business rule that is important to your data.

For example, if you never want to merge two records if they have different user_id values, then you can specify a limit rule of 1 user_id per record.

Limit rule

Rule sets

Rule sets allow you to group and evaluate merge rules in sequence. This locks in the results of each rule set (assuming no limit rules are violated) before moving on to the next set to try and merge in additional records.

Example

Low confidence rules (e.g. match on first and last name) can merge different actual identities together and introduce limit rule violations.

Rule sets allow you to use these rules more confidently by running them after higher confidence rules (e.g. match on user ID) and undoing them if they merge records together that violate limit rules.

For example, imagine you have the following data:

EmailFirst NameLast NameSourceHT_ID
john.doe@acme.comJohnDoeProfile
john.doe@acme.comJDoeEvent
john.doe@dundermifflin.comJohnDoeProfile
john.doe@dundermifflin.comJohnathanDoeEvent
barack@barackobama.comBarackObamaProfile
BarackObamaEvent

If you combine low and high confidence rules into a single rule set like this:

Rule sets counter example

The profiles (represented by the HT_ID being the same) would look like this—note that none of the rows related to the John Does got merged together because of the limit rule violation:

EmailFirst NameLast NameSourceHT_ID
john.doe@acme.comJohnDoeProfile73bd...
john.doe@acme.comJDoeEvent17a3...
john.doe@dundermifflin.comJohnDoeProfile5b6e...
john.doe@dundermifflin.comJohnathanDoeEvent372b...
barack@barackobama.comBarackObamaProfile9p8h...
BarackObamaEvent9p8h...

The issue here is that you want the rows with john.doe@acme.com and john.doe@dundermifflin.com to be merged into 2 separate profiles with their respective emails because email is a high confidence match. The merge rule on first and last name, however, merges the 2 profiles with separate emails and their associated rows together, causing the limit rule to be hit. This happens because 1 row from each profile matches the other on first and last name.

With rule sets, you can separate the low and high confidence rules into different rule sets:

Rule sets example

The result looks like this—notice that we now have three HT_IDs representing three unique profiles. We first merge records on email. Then we try merging records on first name and last name, and only for those merges that would introduce limit rule violations, we fall back to the profiles from the previous rule set:

EmailFirst NameLast NameSourceHT_ID
john.doe@acme.comJohnDoeProfile73bd...
john.doe@acme.comJDoeEvent73bd...
john.doe@dundermifflin.comJohnDoeProfile5b6e...
john.doe@dundermifflin.comJohnathanDoeEvent5b6e...
barack@barackobama.comBarackObamaProfile9p8h...
BarackObamaEvent9p8h...

How it works

After each rule set gets evaluated, we check for profiles that exceed any limit rules and, if found, don't merge in any of the new records from that rule set's evaluation for that profile.

Once the limit rule check completes, the profiles formed from that rule set get locked in so that if subsequent rule sets introduce limit rule violations, those only unmerge records that were merged during that particular rule set's evaluation, not previous rule sets' evaluation.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Last updated: Jun 18, 2024

On this page

Merge rulesLimit rulesRule sets

Was this page helpful?