ChangelogBook a demoSign up

Probabilistic matching

Audience: Platform admin, data or analytics engineer
Prerequisites: IDR overview →, Setup steps →, Prepare your data →, Golden Record →

Use probabilistic matching to link records that likely belong to the same person—even when values don’t match exactly.


What is probabilistic matching?

Probabilistic matching uses data normalization, fuzzy comparison, and AI to connect similar records. It compares multiple fields—like email, phone, name, and address—and assigns a confidence score to each match.

For example:

  • john.doe@hightouch.com and johndoe@gmail.com might be linked if other traits (like phone or zip) overlap.

You define how strict or loose the logic is by setting match thresholds.

When to use it

ScenarioExampleWhy It Helps
Typos and misspellingsJohn Doe vs. Jhon DoeFuzzy scoring tolerates inexact values
Multiple accounts for the same personjohndoe@gmail.com vs. john.doe@company.comScores improve when combining multiple identifiers
Format variations(415) 555-1234 vs. 4155551234Probabilistic matching applies normalization logic
Sparse or user-entered recordsLead forms, event RSVPs, loyalty sign-upsMatches based on partial or inconsistent information
Cross-channel identity stitchingCDP → CRM → POSLinks identities even when shared IDs aren’t available
Multiple identifiers per personName + phone + ZIP codeHigher match confidence from overlapping traits

How it works

Probabilistic matching links records that likely refer to the same person, even when the data is inconsistent or incomplete. Behind the scenes, matching happens in three steps:

  1. Normalize the data
    We clean and standardize field values to make them easier to compare–for example, handling nicknames, email casing, and formatting differences.
  2. Compares fields
    We compare values across key fields (like name, email, or address) and use AI to generate similarity scores for each field pair.
  3. Score the record pair
    The individual field scores are evaluated by our proprietary AI model to form a single record-level similarity score that reflects how likely the records belong to the same person.

You decide what counts as a match

In the final step, you choose the confidence level that fits your use case. Records that meet or exceed your threshold are grouped into the same identity. Even if two records don’t match directly, they can be linked through shared matches.

Confidence tiers

Confidence tiers let you control how strict or flexible your matching is:

TierDescriptionUse Case
ExactNear-identical recordsOperations and transactional emails
StrictStrong match with minor variationLifecycle messaging, retargeting
LoosePossible match, broader reachAds, retargeting, analytics

Lower tiers capture more matches but increase the chance of false positives. Higher tiers keep matching more conservative. You can adjust tiers to fit your data quality and business goals.

How to enable It

Probabilistic matching is optional, and can be added to any IDR model.

  1. When configuring your identity model, toggle on Probabilistic Matching
  2. Choose your match thresholds for Exact, Strict, and Loose
  3. Use as many probabilistic identifiers as possible (e.g. name, email, phone, address)

You can adjust thresholds anytime based on QA results.

Learn more → Prepare your data to build a customer identity graph

How to QA your results

After enabling probabilistic matching:

  • Open the Summary tab to view match rates by confidence tier
  • Use the Profiles tab to inspect what contributed to each match
  • Adjust thresholds if you're over- or under-merging records

Look for improved match rates compared to deterministic-only baselines.

Learn more → Match summary & profile review

When to activate probabilistic matches

Use probabilistic matching when:

  • ✅ Your data has inconsistencies, nicknames, or formatting issues
  • ✅ You need broader reach for campaigns
  • ✅ You want to unify across systems without shared IDs

Avoid using Loose matches without QA:

  • ❌ Don’t sync all Loose records without validation
  • ❌ Avoid Loose for operational and transactional use cases

What’s next?

Now that you understand deterministic and probabilistic matching, you’re ready to:

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Privacy PolicyTerms of Service
On this page

Was this page helpful?