Skip to main content
Log inGet a demo

What is Entity Resolution?

Learn why and how you should deduplicate and consolidate data for the key concepts that drive your business.

Nate Wardwell.

Nate Wardwell

May 30, 2023

11 minutes

Entity resolution unifies disparate records into one 360 degree view.

Every business can be defined by just a few nouns, such as who you want to sell to and what you sell. In your underlying data, these key nouns, or “entities,” may be referenced thousands of times in slightly different ways based on when and where customers interacted with you and your products. Left alone, your data can grow into a tangled mess of unusable facts without rhyme or reason.

If you want to make better decisions about your customers and products, you need to have a holistic view of each of these entities. You need to see at a glance what a customer has done with your company and pass that data to your downstream tools. You need to know how each of your products performs and which products each customer has bought. This is the mission of entity resolution: to deduplicate your data and ultimately present you with a complete 360-degree view of each “noun” you care about.

What is Entity Resolution?

Entity resolution (also known as entity matching) is the process of stitching together data related to the same real-world thing, such as a person, business, or household, so you can better understand how each one relates to one another, reconcile data inconsistencies and merge your data together. It's a crucial component of the data onboarding process when it comes to linking offline and online data together.

A real-world entity can be anything your business wants to measure as a discrete unit. If you work for a cable company, you may care about individual people as entities. Still, you’ll likely also count larger “households” as entities to which you can ultimately sell and deliver services. The end output of this process is a unified record for each entity, containing all the relevant information about that entity consolidated into one place, without duplicate data or conflicting records.

Let’s elaborate with the example of a household entity measured by a cable company. On Monday, one individual who lives in a household visits the cable company’s website and uses a customer-service chat feature. On Tuesday, a different individual from the same household calls a customer service phone number. The customer service agent who takes the call from the second user will want to know what the first user discussed on the web chat and, similarly, any past billing or interaction history for members of that household. The data teams at this cable company should set up entity resolution rules and processes to unify all household-related records to better enable customer service and sales efforts.

household entity resolution

What’s the Difference Between Entity Resolution and Identity Resolution?

Identity resolution is a type of entity resolution. Entity resolution refers to the overall practice of joining related records; identity resolution is simply entity resolution where the targeted entity is an individual user. The goal of identity resolution is a “Customer 360”, a full view of what a customer has done on the site. The end goal of entity resolution is a full view of whatever entity you are looking at, which may or may not be an individual user.

The concept of identity resolution is more mainstream in business discourses than entity resolution. Marketers, advertisers, and other business users have wanted to build effective a single view of the customer for years, and competing Customer Data Platforms (CDPs) attempt to solve this problem for them. Entity resolution has, to date, primarily been a concept discussed by data teams. The broader practice of entity resolution can encompass the benefits of identity resolution and the benefits of clear views of other essential entities for your company, such as user accounts or households.

The fundamental approaches for identity resolution and entity resolution are similar. If your organization can resolve entities in your data warehouse, you’re already fully equipped to resolve identities there as well.

Common Types of Entities

Each company will care about a custom set of entities unique to its business model. Generally speaking, the most important entity for a B2C company is an individual customer, which the company can sell to. Similarly, in B2B companies, the most critical entity is likely a business account or target company to sell to, an aggregated group containing many individual users.

Relationships between entities can grow more complex and niche for each business use case. A digital pet food company likely measures individual pets as entities, which they then link to their owners to target owners with custom messages for food tailored to each of their pets.

Common B2C entities include:

  • Customers
  • Products
  • Subscriptions

Common B2B entities include:

  • Users
  • Teams
  • Companies

Why Does Entity Resolution Matter?

Companies need accurate, comprehensive data about the essential entities (such as user accounts) to drive revenue and save costs. A well-known concept that summarizes these benefits is the “1:10:100 rule”. It might cost you $1 to correct a data record as it is ingested. However, waiting to clean that data up later leads to more significant effort that might cost $10–and if you do nothing and never resolve that duplicate data you could miss out on $100, simply from bad decision making and hard-to-use data. Not performing entity resolution is the equivalent of leaving dollars on the table.

1:10:100 rule in data quality

Source: https://www.grepsr.com/blog/1-10-100-rule-data-quality/

In addition to driving simple analytics for clear decision-making, having clean data for each entity is also a prerequisite for machine learning (ML) and artificial intelligence (AI) applications. If you want to predict behaviors and outcomes for a specific entity, you’ll need to model your data so that your ML algorithms can digest all the essential information about an entity. This is because entity resolution aligns with feature engineering that ML and AI applications require.

Entity Resolution Use Cases

These examples provide details on a few of the many ways that entity resolution can empower companies to make better decisions and earn more revenue:

  • Unify customer records: Whether a “customer” is a person, a household, or an entire company, entity resolution helps you understand the actions your customers are taking. This enables you to personalize experiences with your customers based on their activities to maximize future revenue and avoid churn.
  • Unify product records: You can better match the products you sell to your customers by holistically understanding each product’s performance to personalize product offerings and optimize your future products by better understanding the impact of your existing products
  • Unify account records: You can understand multiple accounts that each customer may have (e.g., checking and savings accounts if you’re a bank) and analyze the performance of each account separately. You can link these account records to your unified customer records to better serve your customers and treat them like discrete product entities to offer a more cohesive product experience and optimize your overall account offerings.

Create 360° User Profiles in your Warehouse

Learn how to stitch your existing customer data into rich, actionable profiles directly in your data warehouse without writing a line of code.

Download this document to learn more about Hightouch's Identity Resolution feature and how warehouse-native identity and entity resolution empowers companies with the best-possible uses for their data.

B2B Entity Resolution Sample Use Case

Entity resolution is essential to understanding how customers interact with your product. It requires resolving every entity and aggregating those entities to both parent and related entities.

For example, the team at Hightouch measures the following entities:

  • Product Entities
    • Sources
    • Models
    • Destinations
    • Syncs
  • User Entities:
    • Users
    • Accounts
    • Workspaces
    • Organizations

To fully understand how an organization is using Hightouch, the company needs to perform entity resolution on all of the more granular entities for both the products and for users. Hightouch rolls that product and user information up to the organization entity, which is the level that the company ultimately books deals at. Entity resolution ensures that everyone at the company uses the same terminology and metrics for each level of the entity pyramid, and that there’s a full 360° view of each user, workspace, and organization.

What is an Entity Relationship Diagram?

An Entity Relationship Diagram (ERD) is an entity-relationship model that maps the relationships between different entities you care about. You can use an ERD to make a conceptual plan for the other entities your company cares about and ultimately figure out how you want to use the data from these interrelated entities to inform your record linking.

For example, let’s say you run a business selling plants online. In this case, the primary entity you care about is a customer who can buy your plants. You also will want to measure a separate entity for each plant a customer has purchased. This will allow you to personalize future offers to that user for related plants or products that will help them care for their existing plants.

Finally, you’ll want to tie those individual events that the user has taken back to the user entity. An ERD like the one below shows the entity representation between a user entity, a plant entity, and discrete user events like products viewed on the website.

Entity-Relationship-Diagram

What’s the difference between Deterministic and Probabilistic Entity Resolution?

Deterministic entity resolution, also known as “rules-based matching,” relies on defining precise table rows that can be used to unify and deduplicate existing records. Deterministic entity resolution is relatively straightforward and quick to implement and works best in simple use cases where your data follows a similar structure. For example, matching records and unifying zip codes on household entities is a good use case for rules-based entity resolution.

Probabilistic entity resolution, or “fuzzy matching,” relies on machine learning, AI, or predictive models to identify and unify entities via record deduplication. For many entity resolution use cases, data can be stored in many different formats and locations, and it would be impossible to define the precise rules to unify records proactively. Most entity resolution at enterprise-scale companies relies on fuzzy matching logic.

How Does Entity Resolution Work?

At a high level, entity resolution is comprised of four steps:

  1. Ingestion: Ensuring data is accessible to your entity resolution programs or machine-learning models in the same place. Often, unifying data into a data warehouse is the starting place of entity resolution.
  2. Deduplication: Consolidating any records that are true copies of each other to reduce the complexity and redundancy of each entity.
  3. Record Linkage: Using rules-based or fuzzy-matching logic from within the remaining data to identify which records relate to the same entity but contain distinct data, such as different interactions on different days.
  4. Canonicalization: Unifying and consolidating your data from the previously linked records to store all related data points within that entity.

Entity Resolution Steps

Source: https://medium.com/d-one/entity-resolution-the-secret-sauce-to-data-quality-people-centred-ai-5bf8a1cb613c

What Companies and Tools Can Help Implement Entity Resolution?

If your company has a robust data team, you can resolve entities directly in your data warehouse. Numerous solutions groups, such as Big Time Data, can also assist with implementation. Hightouch has also built a robust rules-based identity resolution feature that also can solve for any entity you define, allowing users to resolve profiles in a code-free interface within their data warehouse.

Several machine learning algorithms are publicly available to assist with entity resolution, including:

Depending on your use cases or implementation needs, several companies also offer software to assist with entity resolution, including:

Finally, regardless of the state of your underlying data structures, data activation platforms like Hightouch enable you to extract the best value from your data and sync that data to downstream business tools. You can define models from multiple tables with a SQL-based interface or join related models and entities in a no-code schema builder to curate datasets based on linked entities for marketing teams to build audiences.

Schema Builder in Hightouch

Final Thoughts

Entity resolution is the foundation that companies will rely on to understand the essential things that they care about, such as customers, households, and products. Whether companies build their custom solutions for entity resolution or leverage third-party algorithms or platforms, they need a 360° view of the entities that drive their business.

Finally, companies need to act on their entity data. Hightouch activates data directly from company data stores to tools that support business use cases. Hightouch can help data teams link entity data from disparate sources and create syncs to 200+ tools that business users rely on. To learn more about how Hightouch can help, talk to a Hightouch Solutions Engineer to build a plan to model and activate your data.

Create 360° User Profiles in your Warehouse

Learn how to stitch your existing customer data into rich, actionable profiles directly in your data warehouse without writing a line of code.

Download this document to learn more about Hightouch's Identity Resolution feature and how warehouse-native identity and entity resolution empowers companies with the best-possible uses for their data.

More on the blog

  • What is Identity Resolution?.

    What is Identity Resolution?

    Learn how to create a 360-degree view of your customer by stitching user profiles together in your data warehouse.

  • Identity Resolution - Why CDPs Fall Short.

    Identity Resolution - Why CDPs Fall Short

    Learn about what Identity Resolution is, and why you should be managing this process yourselves within your data warehouse.

  • Identity Resolution in SQL.

    Identity Resolution in SQL

    What is identity, and how does it relate to customer data? Identity can have many different meanings but essentially, it involves unifying different pieces of data about your customers. Read on to learn more.

Recognized as an industry leader
by industry leaders

G2

Reverse ETL Category Leader

Snowflake

Marketplace Partner of the Year

Gartner

Cool Vendor in Marketing Data & Analytics

Fivetran

Ecosystem Partner of the Year

G2

Best Estimated ROI

Snowflake

One to Watch for Activation & Measurement

G2

CDP Category Leader

G2

Easiest Setup & Fastest Implementation

Activate your data in less than 5 minutes