Supercluster detection

Identity resolution is only available on Business tier plans. You can use it with or without Customer Studio.


Audience	Admins responsible for managing identity graphs
Prerequisites	An existing identity graph At least one completed graph run

Supercluster reprocessing helps you identify over-merged identity clusters, review high-impact identifiers, and decide whether to allow or block them before reprocessing your graph.

For non-supercluster issues (for example run failures, unresolved rows, or SQL errors on IDR output tables), see Troubleshoot Identity Resolution.

What you’ll learn

After reading this article, you’ll know how to:

Identify when IDR flags potential superclusters
Understand how gating affects graph updates
Locate supercluster warnings in the UI
Understand which identifiers are causing over-merging
Decide whether to keep or block flagged identifiers
Reprocess affected clusters safely and intentionally

Overview

Supercluster detection helps you detect and resolve over-merged identity clusters caused by high-cardinality or low-quality identifiers, such as shared emails or placeholder IDs.

When IDR detects potential over-merging, it flags the affected clusters and prompts you to review the identifiers involved before reprocessing those clusters.

This feature lets you:

Identify identifiers causing excessive merging
Allow valid high-volume identifiers
Block problematic identifiers and reprocess affected clusters

What is a supercluster?

A supercluster is an identity cluster that grows unusually large and oftentimes merges different identities together.

This can be caused by:

Shared or default identifier values
Corrupted or reused IDs
Unexpected upstream data issues

How IDR detects potential superclusters

During each graph run, IDR monitors how identifier values contribute to cluster growth.

If a single identifier value links together an unusually large number of source rows within the same cluster, IDR flags that cluster as a potential supercluster. This helps surface cases where shared, default, or corrupted identifiers may be causing accidental over-merging.

When a potential supercluster is detected, IDR's behavior depends on whether Gating is enabled.

Example

Let's say the threshold for detecting a supercluster was 3 source rows per identifier value.

1. Ingest source rows

The graph ingests rows containing the identifiers email and user_id:

event_id	email	user_id
1	real1@email.com	user1
2	real1.alt@email.com	user1
3	real2@email.com	user2
4	fake@email.com	user3
5	fake@email.com	user4
6	fake@email.com	user5

2. Build identity clusters

During resolution, rows are grouped into clusters based on shared identifiers.

All rows containing fake@email.com are merged into the same cluster.

IDR then counts how many source rows each identifier value appears in within that cluster:

Identifier	Value	Cluster ID	Source rows
email	fake@email.com	ht3	3
user_id	user3	ht3	1
user_id	user4	ht3	1
user_id	user5	ht3	1

3. Flag the cluster

Because the identifier value fake@email.com appears in 3 source rows, it exceeds the example threshold used here.

As a result:

The cluster containing that identifier is flagged as a potential supercluster
IDR surfaces the problematic identifier for review
You’re prompted to choose whether to Keep (allow) or Block it before reprocessing

IDR does not automatically change or break up clusters without an explicit Keep or Block decision.

Gating

Gating is a safety mechanism that prevents updates to your identity graph when a potential supercluster is detected.

Gating Enabled (Default): If a potential supercluster is detected, IDR stops the update. The graph remains in its previous state until you review the flagged clusters and reprocess. This protects your graph from accidental over-merging.
Gating Disabled: IDR updates the graph even if potential superclusters are found. The system still flags them, but over-merged identities will be live in your graph.

Gating is enabled by default to ensure data quality. To opt out, please contact Hightouch Support.

Resolve superclusters

The following steps walk through how to review and resolve flagged superclusters.

Step 1: Run your identity graph

Supercluster detection runs automatically as part of each identity graph run.

If potential superclusters are detected:

With gating enabled, the run will fail with a Superclusters detected indicator
With gating disabled, the run completes with a Superclusters detected indicator

Step 2: Open the supercluster review flow

You can access flagged superclusters from several places:

Graphs list view — Shows the most recent run with a Superclusters detected status.
Graph summary tab — Displays a warning banner.
Graph runs — Labels affected runs as Superclusters detected.