ChangelogBook a demoSign up

Prepare your data to build a customer identity graph

Audience: Platform admin, data or analytics engineer
Prerequisites: IDR overview →, Setup steps →

Before you can resolve identities in Hightouch, you'll need to prepare your source data. This article walks you through how to structure and configure your data for use in an Identity Resolution (IDR) project, which powers your customer identity graph.


What is a customer identity graph?

A customer identity graph connects identifiers (like emails, device IDs, and phone numbers) across your datasets to form unified customer profiles. Each graph is built from an IDR project that defines:

  • Which source tables to include

  • Which columns represent identifiers

  • How records are matched across models

  • Whether to use deterministic, probabilistic, or hybrid matching strategies

The result is a set of deduplicated identities, each with a unique HT_ID, that you can use across Hightouch for targeting, analytics, and personalization.

What you'll prepare

ElementDescription
Input modelA primary table where each row represents an individual (e.g. users)
Identifier mappingsMap model columns (e.g. email, phone_number) to identifier types used for matching
Input modelsSupporting datasets (e.g. orders, devices, web events) joined via shared identifiers
Match strategySelect deterministic, probabilistic, or both based on your data
Confidence thresholds(Optional) Define match strength tiers (Exact / Strict / Loose) for probabilistic matching
Golden Record(Optional) Rules for selecting the most trusted value per trait

Choose a match strategy

Your data quality and structure will determine which match strategy to use:

Use caseRecommended strategy
Stable IDs, clean login eventsDeterministic
Messy, user-entered data (e.g. lead forms)Probabilistic
Mixed-quality data across systemsHybrid

You can use deterministic matching alone—or enable probabilistic matching to improve coverage.

Probabilistic matching uses similarity across multiple identifiers (e.g. name, email, phone) and assigns confidence scores to each match.

Step-by-step: prepare your data for Identity Resolution

  1. Select a data source

  • Go to Identity Resolution and click Add identity graph

  • Choose a Lightning-supported data warehouse (Snowflake, Databricks, and BigQuery) that contains the data you want to use.

Info: Identity graphs are warehouse-specific. To build graphs across multiple sources, create one per warehouse.

  1. Choose your models

  • Choose your input model (e.g. users, customers, contacts).
    • Each model must include a timestamp column for incremental processing:
      • Use an event timestamp for event models
      • Use a last_updated_at or similar field for static records
      • If no timestamp exists, define one in your model SQL (e.g. CURRENT_TIMESTAMP)
  1. Map identifier columns

Within each model, map relevant columns to standard identifier types. These mappings determine which identifiers Hightouch uses when evaluating record matches.

What Are Identifiers?

Identifiers are fields that help link records across systems. Common examples include:

  • Email address

  • Phone number

  • Full name

  • User ID or customer ID

  • Anonymous ID (e.g. session ID)

  • Mailing address or postal code

Model mappings

Be sure to review the consistency and formatting of identifiers across models.
  1. Configure identifier rules

Once you've mapped identifier columns, configure identifier rules to control how each field contributes to matching.

What Are Identifier Rules?

Identifier rules determine how Hightouch uses your mapped identifiers in deterministic and probabilistic matching.

For deterministic matching, you'll define:

  • Priority order: Which identifiers should be used first when evaluating exact matches

  • Limit rules: Optional boundaries to prevent identifiers from over-linking across unrelated people (e.g. shared devices or generic emails)

For probabilistic matching, identifiers are automatically combined into a weighted model that calculates match confidence.

Supported Identifier Types

Identifier TypeExample FieldsMatching Supported
Emailemail, user_emailDeterministic + Probabilistic
Phonephone_numberDeterministic + Probabilistic
Namefirst_name, last_nameProbabilistic only
Addressstreet_address, state, city, postal_codeProbabilistic only
User IDuser_id, customer_idDeterministic only
Anonymous IDanonymous_idDeterministic only

Tip: Probabilistic matching works best when each record has at least a few identifiers.

  1. Define match strategy and thresholds

  • Deterministic only (exact matches, enabled by default)
  • Probabilistic Matching (similarity scoring, must toggle on)

Graph settings

If using probabilistic matching, set your confidence thresholds:

  • Exact: High precision, e.g. for transactional use cases

  • Strict: Balanced precision and recall, e.g. for personalization

  • Loose: High recall, e.g. for analytics

These can be adjusted over time as you monitor match quality.

Confidence levels

Confidence levels

6. Save and build your graph

Click Save to create your identity graph. Hightouch will generate a unique HT_ID for each resolved profile.

What's next?

Once your data is prepared and your graph is built, you can:

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Privacy PolicyTerms of Service
On this page

Was this page helpful?