Identity Resolution - Why CDPs Fall Short
Perform identity resolution directly in your data warehouse, not in a Customer Data Platform.
August 15, 2023
In the world of customer data, “identity resolution” can be a daunting topic to grasp. Published content across the internet is mired in marketing jargon and CDP-sponsored language that simply stokes fear, uncertainty, and doubt on the subject.
In this post, we’ll break down what identity resolution is, how we’ve observed CDPs vastly oversell and underdeliver on this promise–and most importantly, how identity resolution in the data warehouse can resolve these shortcomings.
What is Identity Resolution?
Identity resolution is the process of unifying different data sets to build a single profile of each customer. Businesses have customer data originating from many places - in application databases, CRMs and marketing tools, offline interactions, and behavioral events collected directly from digital products. Deduplicating and merging these datasets to define unique customers is essential to achieve the holy grail every SaaS vendor talks about - a 360° view of your customer! More tangibly, the output of identity resolution is an “identity graph” - a single unified table that merges all user identifiers.
The Customer 360 looks very different based on the industry. A B2C organization's Customer 360 will focus on individual users and their interactions with the brand. Key identifiers for individual customers may include email, device ID, and basic PII like name, address, phone, and more. For B2B organizations, entity resolution for business accounts and/or workspaces can be as important as resolving individual user identities. These organizations must build custom logic to unify records, including data from sources inside their app, their CRM, lead forms, and more.
Building a Customer 360 allows organizations to discover valuable insights, align teams with unified data, and improve and personalize customer experiences. The critical question remains: how (and where) should organizations perform identity resolution?
Identity Resolution in The CDP vs. The Warehouse
There are two main strategies to solve identity resolution:
- Buy a third-party tool that stores its own “golden record” of customers. These tools fall into the Customer Data Platform (CDP) space.
- Manage identity directly in the data warehouse. Organizations can solve this with SQL or an identity resolution product (Hightouch has one).
CDP solutions like mParticle, Segment, Tealium, and Simon Data offer identity resolution within their black box systems. They require organizations to collect data within their rigid schemas and do not allow them to easily configure custom resolution logic tailored to their particular industry or use case. They only have access to certain data, missing out on the complete source of truth that companies build in the warehouse. Note: the traditional CDP market contains many players, each with slightly different approaches to identity resolution. Most of these traditional CDPs share the same strengths and shortcomings, but there may be details that vary from vendor to vendor.
Solving identity resolution in the data warehouse allows companies to leverage all of their data. Warehouse-native strategies may require more initial thought investment within your company but are highly configurable and flexible, empowering data teams to tailor each Customer 360 to their business use cases. Customer definitions differ in every business; these models become the engine that powers all the analytics, decision-making, and personalization a business performs.
CDP Limitations for Identity Resolution
Many of us here at Hightouch came from the CDP industry, where we saw hundreds of companies trying to solve identity resolution using an off-the-shelf CDP; it’s always a headache. To be more specific, we’ve observed the following significant limitations when companies rely on CDPs for identity resolution.
- Incomplete - CDPs often only have access to the behavioral event data collected within your digital products and fail to consider all the other datasets across your business. Data within your internal databases, other SaaS tools, offline sources, and more are not incorporated into CDPs. How can your Customer 360 be complete if it’s only based on a fraction of your data?
- Rigid - Traditional CDPs enforce a proprietary data model representing the “user profile.” While this may help simple businesses get set up quickly, these tools immediately struggle when you need to represent anything other than individual users. CDPs cannot manage other entities like teams, workspaces, or accounts for a B2B use case and especially struggle with managing associations within a tiered hierarchy. Even in B2C business models, you’ll often want to associate users with other entities (such as subscriptions, memberships, or courses), and managing these many relationships is nearly impossible within a tool that enforces its proprietary data model. Furthermore, most CDPs only perform deterministic identity resolution, meaning they can’t “fuzzy” match text-based fields that aren’t precise.
- Fragile - If you ever need to change the identity configuration within a traditional CDP, you cannot do this after live data flows through it. There is no undo or unmerge button. In most cases, to fix a historical mistake, there is no option but to completely nuke the whole instance, reconfigure settings, and then reload all historical events into a new instance of the CDP. This is time-consuming and requires you to recreate all the traits and audiences you already had in the previous instance. Further, since your data team cannot directly access the CDP identity resolution models, you depend entirely on expensive CDP support teams. We saw cases at Segment and other CDPs where literally 100+ hours of professional services had to go into fixing a customer’s identity graph due to web tracking bugs.
- Locked up - There’s often no easy way to export the merged identities within a CDP directly to the data warehouse. Purchasing a CDP only grants you access to their version of a Customer 360; you cannot use that identity graph for any of your other needs. Without direct access to your customer identities, it becomes impossible for data teams to do any deeper analysis themselves.
- Locked in - Since CDPs own your identity graph and the models that built it, it’s often hard for companies to leave them. CDPs are forcing vendor lock-in by keeping identity resolution firmly within their black-box systems. If you ever want to leave a CDP, you must start from scratch to solve identity resolution.
- Insecure - Traditional CDPs require organizations to store data within black box systems for identity resolution. Although these tools have their own mature security measures, this duplicative data store inherently adds risk. This is why most off-the-shelf CDP vendors refuse to sign BAAs for HIPAA Compliance (to manage healthcare data). By the nature of their architecture, CDPs have too many threat vectors to assume the liability/risk to safely process and store clinical info.
- Unenriched Events - Events that are collected and then flow over to destinations within a CDP do not natively consult the identity graph for enrichment on the way over. This is a common misconception, and it requires a lot of hacked-together loops with lambda functions to enrich events with info (or identifiers) from the CDP-managed identity graph.
Warehouse-First Benefits for Identity Resolution ✅
Managing identity resolution directly in the warehouse addresses the critical failures of the traditional CDP approach.
- Comprehensive - The data warehouse contains all of a company’s data, not just clickstream events. Datapoints in the warehouse that a CDP lacks come from internal databases, SaaS tools, offline interactions, and more. Many times the most critical and valuable insights come from these customer touchpoints.
- Configurable - Warehouse-native identity resolution is fully configurable, whether powered by Hightouch or SQL. You can build robust definitions of users and any other entity your business cares about and define relationships between these different entities. Whatever data you have, whatever profiles you want to merge: warehouse-native identity resolution can meet those needs.
- Flexible - Warehouse-native identity resolution can adapt as your business evolves. This flexibility allows you to constantly improve your models as you introduce new data sources. You can manipulate and merge anything you want, train and test probabilistic methodologies, and test out any open-source projects on your data sets without ever being limited by the tooling of a SaaS vendor.
- Owned - Storing an identity graph directly in the data warehouse allows you to use it for any business use case. You aren’t limited to resolving identities simply for the destinations CDPs send data to. You can use your resolved identities directly from the warehouse for internal applications, analytics, and more.
- Composable - Building identity graphs in the warehouse avoids vendor lock-in. You have full access to your identity graphs and have full visibility and control over the methods used to create them. If you choose to change vendors, you can do so with confidence in your historical data and your ability to build comparable identity resolution models.
- Secure - Managing customer identities in your private cloud significantly reduces your footprint for security risk. You won’t have a third party storing your data. Instead, you’ll have the enterprise-grade controls provided by modern cloud platforms to control and govern access. This will become more critical as data privacy regulations mature and demand more regionalization and stewardship.
- Enriched Events - Businesses that load raw events directly into the warehouse can enrich them with additional identifiers or related metadata before forwarding them to a final destination. Warehouse-native identity resolution powers this event enrichment and provides a marked improvement over CDP event streams, which pass directly to end destinations without the benefit of referencing identity graphs. These enriched events are especially valuable when sent to advertising conversion endpoints like the Facebook CAPI or Google’s Enhanced Conversions; the identifiers added to each event will improve match rates.
CDPs force their customers to adhere to strict entity relationships, limited customization and data manipulation, and blackbox identity resolution. Companies are not built on simple data and processes. We live in a complex world where business models, customer journeys, and go-to-market motions cannot be accurately represented by these inflexible schemas. By centralizing and transforming your data with a warehouse-first approach, you allow the individuals that know the inner workings of your business to build your data activation framework.
Big Time Data
Solving Identity Resolution in the Warehouse
Organizations can configure identity resolution in the warehouse in two ways: with in-house SQL or dedicated tools.
If your organization opts to develop an in-house method using SQL, we recommend starting simple and focusing first on your highest-value data sets. We walked through a basic example of this SQL logic in our earlier blog post titled Identity Resolution in SQL. Data teams must model datasets into single “source of truth” dimension tables representing each “entity” within the business and then manage the process to keep these up to date. To help address this transformation process at scale with SQL, many companies also leverage dbt. This is a viable solution for warehouse-native identity resolution, but it requires high levels of technical skill and consistent resource dedication.
If your organization wants to develop identity resolution solutions faster and doesn’t want to maintain custom SQL for each model, Hightouch built a fully-configurable warehouse-native identity resolution feature. This offers a code-free, rules-based editor, allowing users to resolve multiple identity and entity graphs directly within any data warehouse. This interface can help teams quickly establish identity resolution models and iterate on them with full visibility. Data teams can build identity graphs directly in their warehouse without writing and maintaining complex SQL or code.
Future-Proofing against Vendor Lock-In
The technologies powering identity resolution are rapidly evolving alongside advancements in the data warehouse and machine learning. There are more ways than ever to solve identity resolution directly in the data warehouse, and options here will only continue to expand. For example, Amazon recently announced AWS Entity Resolution, one of many new solutions that use machine learning to help match records. As new technologies become available, your company will want to remain flexible. You should be to able to iterate and try out new solutions without being locked into one way of operating.
Unlike a traditional CDP, warehouse-native identity resolution allows for this flexibility. If all inputs and outputs for identity resolution processes live in your own data warehouse, you can quickly experiment with different models and methods from one single source of truth. In other words: warehouse-native identity resolution is future-proofed and flexible, allowing your organization to iterate on identity resolution approaches as technology evolves.
Identity resolution aggregates all your data and answers the question: “Who is my customer?”
Historically, off-the-shelf CDPs used to be the only accessible options for identity resolution, but with the boom of the cloud data warehouse, there’s a new sheriff in town. You can build more complete, accurate, and secure customer models by cultivating your own definitions in your data warehouse.
Finally, now that you’ve generated a Customer 360, you need to put that data to work. Hightouch can sync your customer data from the warehouse to 200+ downstream tools, allowing you to power all of your operations with your unified data. Speak to our solutions engineers to learn how we can help with identity resolution and Data Activation.