What Is Reverse ETL? The Definitive Guide (Updated Aug 2022)

Learn everything there is to know about Reverse ETL, how it fits into the modern data stack, and why it's different than ETL.

By Tejas Manohar and Luke Kline on August 9th, 2022Data

The data ecosystem has changed drastically over the last six years, and we've witnessed the rise and fall of several different technologies. However, there's one constant that's remained the same, the cloud data warehouse.

Thanks to modern data platforms like Snowflake and Google BigQuery, consolidating your data into a centralized platform and addressing your analytical workloads is easier than ever. The problem is that your data warehouse is only accessible to your technical users who know how to write SQL, so the platform you purchased to eliminate data silos has inevitably become a data silo. This is precisely why Reverse ETL is so important.

This post will cover the following:

  • What Is Reverse ETL?
  • What’s the Difference Between ETL & Reverse ETL?
  • Who Is Reverse ETL For?
  • The Modern Data Stack & Reverse ETL
  • Reverse ETL vs. Point-Point-Integrations
  • Reverse ETL vs. CDPs
  • Why You Need Reverse ETL
  • Reverse ETL Use Cases
  • Reverse ETL: Build vs. Buy
  • Choosing a Reverse ETL Tool

The Definition of Reverse ETL

Reverse ETL is the process of copying data from your central data warehouse to your operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.

However, why would you want to move data out of your warehouse after spending so many resources to get it in there in the first place? The answer is relatively straightforward – your data warehouse houses all of your unique customer data. Usually, this includes:

  • Behavioral data: product usage data collected via your app or website (e.g., pages viewed, last login date, messages sent, products selected, items in cart, etc.)
  • Demographic data: information about your customers and users (e.g., revenue, industry, country, job title, email, first name, last name, etc.)
  • Historical data: essential information about your past interactions (e.g., orders, first meeting date, number of demos, phone calls, emails, etc.)

In addition, you probably have several core metrics unique to your business that your data team has defined via a data mode in your warehouse. For example, if you’re a B2B business you’ll likely have definitions or models in your warehouse for workspaces, churn rate, lead score, ARR (annual recurring revenue), MRR (monthly recurring revenue), PQL (product qualified lead), SQL (sales qualified lead) and MQL (marketing qualified lead).

If you’re a B2C business, you could have user metrics like DAU (daily active users), CAC (customer acquisition costs), LTV (lifetime value), etc. Reverse ETL is all about syncing the data in your warehouse to your downstream business tools.

Image of data being synced to Salesforce

Data Being Synced to Salesforce

Instead of reacting to your data as it's persisted into a dashboard, Reverse ETL enables you to take a proactive approach by syncing data to your downstream systems so your business teams can react and take action on your data in real-time as it's transformed for analysis in your warehouse.

What’s the Difference Between ETL & Reverse ETL?

The traditional ETL process has been around since the 1970s, and data pipelines have essentially remained unchanged. For those unfamiliar, ETL stands for extract, transform, and load. It's the process of automatically extracting, transforming, and loading data into your desired destination (e.g., a data warehouse or data lake).

Fully managed SaaS platforms have made this process even easier by offering pre-built connectors to extract and load your data. Dedicated transformation tools like dbt have given rise to a new technique known as ELT, where data is transformed after it is loaded into the warehouse. The question is: Why can't you use conventional ETL or ELT to move data out of your warehouse?

Image of ETL Process

The ETL Process

Fully managed SaaS platforms have made this process even easier by offering pre-built connectors to extract and load your data. Dedicated transformation tools like dbt have given rise to a new technique known as ELT, where data is transformed after it is loaded into the warehouse. The question is: Why can't you use conventional ETL or ELT to move data out of your warehouse?

At face value, Reverse ETL simply queries against your data warehouse. However, most people don't know that Reverse ETL requires you to write Reverse SQL, so moving data out of your warehouse and back into your operational systems and SaaS tools can be very challenging.

Image of Reverse ETL Process

The Reverse ETL Process

An ELT tool like Fivetran or Stitch is primarily used for powering dashboards. In contrast, Reverse ETL powers workflows, marketing campaigns, and general business processes where time sensitivity is critical.

Since ELT is mainly focused on merging data based on "updated_at" fields, time is the only parameter to consider. However, since Reverse ETL queries directly against your warehouse, there are no "updated_at" fields, meaning you have to diff the changes between sync runs to ensure you are only syncing fresh data. With Reverse ETL, your syncing data from your warehouse to specific fields in your end destination that you define.

You also have to be worried about multiple writers because your business users are constantly updating fields in your end destination. Additionally, you also have to do a variety of behind-the-scenes transformations on your data because every destination has its nuances.

If you make a mistake with conventional ELT, you can delete your table and re-ingest your data. This is impossible with Reverse ETL because you're syncing data directly to your operational tools, and platforms like Hubspot do not have a time machine feature that lets you roll back to a previous state.

Managing and maintaining your syncs once you set them up is another challenge. Ultimately, there are many technical differences between ELT and Reverse ETL, and if you're interested in getting into the weeds, you can read our post here.

Who Is Reverse ETL For?

There are two core audiences for Reverse ETL: data and business teams. Unless your data team is exceptionally unique, there's a high probability that they don't have the engineering resources to build and manage custom in-house Reverse ETL pipelines. On the other hand, your business teams need and want access to the unique customer data that lives in your warehouse, and that's why leading Reverse ETL tools have interfaces for both technical and non-technical users.

Reverse ETL removes the friction between your data and business teams so both can work towards the same goals. With Reverse ETL, your data team can build the parameters to enable your business teams to self-serve in their favorite downstream business tools.

To be specific, leading Reverse ETL tools offer interfaces for both your technical users and non-technical users. Your data team can easily define data using standard SQL, existing data models (ex: dbt models), a table selector, or even their favorite Business Intelligence (BI) tool.

Image of Hightouch Model Selector

Hightouch Model Selector

Once this data model is defined, your business teams can filter those models (ex: filter for all users who made a purchase in the last 30 days) using a visual audience builder (no SQL skills required). They can also create the Sync themselves (deciding which fields to update in their favorite tool).

Image of Hightouch Audiences

Hightouch Audiences

Reverse ETL establishes a clear handoff, ensuring that your data team owns and manages your data, while your business teams can use it for activation to build better customer experiences and drive more value for your business.

The Modern Data Stack & Reverse ETL

For the most part, every modern data stack has several core components that are the same across analytics teams, and usually, it looks something like this.

Image of the modern data stack

The Modern Data Stack 2.0

  • Data Acquisition: the initial collection point of data (e.g., source systems, internal databases, operational systems, business tools, SaaS applications)
  • Event Tracking: behavioral data on your website or app (e.g., signed in, workspace created, subscription type, product viewed)
  • Data Integration: extracting and ingesting your data into a central analytics repository
  • Storage/Analytics: the analytics layer where all of your disparate sources are consolidated to establish a single source of truth
  • Data Transformation: transforming, standardizing, and formatting your data into a model that fits your business
  • Business Intelligence (BI): the visualization layer where you can consume the unique data models your team has built to power better business decisions
  • Data Orchestration: the process of managing the dependencies between all of your various data flows (e.g., scheduling, automation, monitoring)
  • Data Governance: the process of monitoring and managing all of your unique data assets

While the technologies for each of these layers might differ from company to company, the overall components largely remain the same across industries. However, the modern data stack has left a gaping hole other technologies have tried to fill over the years but have failed. Reverse ETL has arisen as the tool of choice for the "last-mile problem" of helping you activate your data for operational analytics.

Reverse ETL is not a new concept by any means. Companies have been trying to activate their data for years. In the past, moving data out of the warehouse required you to either manually download/upload CSV files or build custom integrations and pipelines for every single one of your SaaS applications and end systems. Neither option was scalable.

Reverse ETL vs. Point-Point-Integrations

Point-to-point tools and integration technologies like Zapier, Tray, and Workato can be an attractive option for tackling Reverse ETL use cases because they let you send data from one platform to another without code. However, these platforms don't scale or integrate well with your current data stack. If you have just four applications, you'll quickly find yourself with 16 different pipelines (e.g., 4x4 = 16).

All of the platforms in this space work similarly; they perform actions based on a trigger you define (e.g., sending a marketing email in Hubspot when a lead is created in Salesforce). You have to build custom workflows for every integration in your data stack, which can become an absolute nightmare as you weave in various dependencies, triggers, if/then clauses, and fail-safes (look at this example of a workflow in Tray.)

Reverse ETL creates a hub and spoke approach, where the warehouse is your central source of truth, completely eliminating the complex web of pipelines and workflows that come with conventional point-to-point solutions.

Image of Point-to-Point Approach Compared to Reverse ETL

Point-to-Point vs. Hub & Spoke

Reverse ETL vs. CDPs

In the world of customer data, you're probably familiar with customer data platforms (CDPs). Platforms like Segment made a name for themselves in the marketing world by creating a single platform where you can house your customer data and activate it across your various operational teams and business tools.

The main advantage of these platforms is that they provide built-in data ingestion, identity resolution, audience management, and data sharing. CDPs have several flaws, though.

Firstly, you don't own the data. CDPs force you to store data outside of your cloud infrastructure, which can have significant implications around GDPR, CCPA, or HIPAA. A CDP doesn't replace your data warehouse; it just creates a second source of truth based on your data warehouse.

Secondly, CDPs are extremely expensive. In most cases, pricing is based on your total number of customer records, meaning you pay based on volume. You're also required to pay for an additional storage layer even though all your customer data already lives in your warehouse. With Reverse ETL, there's no storage because you're leveraging the data in your warehouse.

In addition to this, CDPs are extremely rigid because they don't play well with other technologies. You'll often find yourself deleting your whole instance so that you can reconfigure your settings or reload your data. On top of this, most CDPs force you to use proprietary data models representing only users and accounts.

This is not helpful if you have unique objects like workspaces, subscriptions, and playlists. They also have limited transformation capabilities, so you're often forced to file a support ticket if you need to clean your data set beyond their capabilities. With Reverse ETL, you can leverage all of your existing transformation capabilities and existing data models.

Implementing a CDP can take over six months, not even mentioning the time it takes to train your different teams on how to use one. At their core, CDPs are rigid black boxes that are not easily configurable in the context of a modern data stack. You need to leverage Reverse ETL to truly own your data from end to end.

Why You Need Reverse ETL

While on the surface, it can seem like Reverse ETL is just focused on syncing data, there are three primary use cases for Reverse ETL: data activation, data automation, and data infrastructure.

Reverse ETL Powers Data Activation

Data Activation is the method of unlocking the knowledge sorted within your data warehouse and making it actionable by your business users in the end tools that they use every day. In doing so, Data Activation helps bring data people toward the center of the business, directly tying their work to business outcomes.

It's no longer enough to simply understand past behavior. You need to predict and identify common patterns and attributes in your customers to take action immediately.

Since Reverse ETL is focused on syncing real-time data into your operational tools, you can rest assured that your teams have a holistic view of your customer data and the correct data to make the right decision.

Every company wants to be more data-driven. Yet the most daunting question for every organization is "how"? Deriving insights from data is part one, but the last mile of "analytics enablement" (e.g., translating those insights into action) is a different ball game.

Analytics enablement is typically seen as a people problem, which is valid to some extent – but how you present data can play an equally significant role.

Imagine you're a B2B company trying to figure out which accounts your sales reps should focus their efforts on. In most scenarios, your data analyst would use SQL to derive characteristics of high-value leads and present them to you in a BI report. The problem is that this data isn't actionable and to your analyst's dismay, the report is rarely even opened.

A traditional analytics enablement outlook to this problem would be to train sales reps on how to leverage BI reports as part of their day-to-day workflow. In practice, this is tough because data enablement is why most data projects fail.

Instead of training your sales reps to use BI reports, what if you could empower your analysts to feed lead scores from your data warehouse into a custom field in Salesforce? This same thought process can be applied to basically any operational analytics use case.

Reverse ETL Enables Data Automation

Data Activation is flashy, but companies are filled with far less glamorous problems when it comes to data. In any sizable organization, tons of manual requests for data are floating around, and with any manual process, there's always the question of how to automate it. Here are a few common examples of simple data requests from various teams:

  • Sales wants the list of webinar attendees to import as leads into Salesforce.
  • Marketing wants to sync a list of new users to Google Ads for retargeting.
  • Support wants search Zendesk for accounts with premium support.
  • Product wants a Slack feed of customers who have enabled a feature.
  • Accounting wants customer attributes to be synced to NetSuite.
  • Finance wants a CSV of rolled-up transaction data to use in Excel or Google Sheets.

Image of Reverse ETL powering business teams

How Reverse ETL Powers Business Teams

There's a high probability that you've had to deal with at least one of these requests. The data is likely already available in your data warehouse. With Reverse ETL, SQL is all you need to extract and sync that data to your external tools – thus making it the simplest solution.

Reverse ETL Is a Core Piece of Data Infrastructure

Reverse ETL has also emerged as a general-purpose solution in data infrastructure and software engineering, and there are two primary use cases powering this:

  1. Personalizing customer experiences
  2. Accessing disparate data sources

Personalizing Customer Experiences with Reverse ETL

The most obvious use case for Reverse ETL in software engineering is activating your analytics and data science models to build personalized customer experiences.

E-commerce is a great example. Pretend your data science team calculates a lead score on top of your data warehouse or data lake to define a user's likelihood of buying a product. And your growth team wants to drive more purchases by offering discounts to users who are deemed unlikely to make a purchase. Since your warehouse is too slow to serve user-facing experiences, your engineers could use a Reverse ETL tool to sync a propensity score in your warehouse to your production database -- thus giving you the ability to serve in-app customers with personalized experiences.

Accessing Disparate Data Sources

Today, customer data is spread across dozens – if not hundreds of disparate systems. Sometimes, your cloud applications need to access information from disparate data sources.

Pretend you're a B2B SaaS company with enterprise customers on a contract. When a new enterprise is onboarded, your sales deal desk records each customer's credit allotment in Salesforce. Your customers keep asking to see their credit allotment inside your web app, but your developers don't want to integrate with Salesforce. However, there's a high probability that Salesforce data is already available in your data warehouse (via an ELT tool like Fivetran).

With Reverse ETL, you can sync relevant Salesforce information from your warehouse to the production database that powers your app, giving your customers direct access to their billing information.

Reverse ETL is becoming a core part of the software engineering toolkit and isn't limited to "data projects."

Reverse ETL Use Cases

Although it's relatively easy to see why you need Reverse ETL, it's not always as straightforward to see what that entails. There are a near-limitless number of use cases for Reverse ETL, but in many scenarios, the use cases tend to be centered around your data and business teams.

Marketing Teams

Advertising is arguably the backbone of any marketing team, and figuring out how to increase match rates, improve return on ad spend (ROAS), and decrease customer acquisition costs (CAC) is at the forefront of every decision. Enriching your advertising platforms with rich behavioral data about your customers for retargeting campaigns and lookalike audiences can be challenging. The typical process involves manually downloading and uploading individual data sets.

Lucid, the visual collaboration suite and visual diagramming platform, faced this exact problem. Since adopting Reverse ETL, Lucid has seen a 56% increase in ROAS in Google and a 37% increase in new users.

Imperfect Foods, the leading online grocer at the forefront of eliminating food waste, saw similar benefits, reducing the CAC by 15% and increasing customer reactivations by 53%.

Sales Teams

The dream of every sales rep is to have a centralized platform with a holistic view of the customer. Customer relationship management platforms (CRMs) were the first to tackle this challenge, but these platforms only capture sales and marketing interactions.

In reality, your sales team needs and wants access to the unique behavioral and product usage data in your warehouse so they can track every step in the customer journey and accelerate deal cycles.

Gorgias, the e-commerce helpdesk platform, uses Reverse ETL for this exact purpose and syncs important behavioral data about prospects and accounts directly to Hubspot. Individual sales reps can then prioritize high-value accounts, enroll contacts in customized email sequences in real-time, and monitor product usage spikes. Since implementing Reverse ETL, Gorgias has grown its outbound pipeline by 60-70% and seen a 2x in quarterly revenue.

Product Teams

The key to improving your product and driving adoption is experimentation and iteration. To do this, your product teams need access to the vital behavioral data that lives in your warehouse so they can answer questions like:

  • Who are our most active users?
  • What is our most popular feature/product?
  • How can we increase conversions for qualified users?
  • What is our least popular feature/product?
  • Where do customers drop off in the onboarding process?

This is the same challenge that CircleCI faced. As one of the largest CI/CD platforms with over 30,000 organizations using the product, CircleCI generates tons of new users. Understanding user behavior, KPIs, and analyzing experiments was nearly impossible because all of CircleCI's core data models live in the data warehouse.

With Reverse ETL, CircleCI is able to sync existing data models and key events to Amplitude and empower the product team to iterate and experiment at a moment's notice and answer questions that can have a huge impact on the underlying business.

Support Teams

Lowering customer churn is always at the forefront of customer success teams, so you need to be able to proactively identify red flags and act on them before they happen. Doing this means giving your success teams a clearer picture of your product usage data and account level activity.

Blend, a publicly traded fintech company that handles more than $5 billion in daily transactions, uses Reverse ETL to solve this same problem. Leveraging Reverse ETL to sync customer data from the data warehouse to Asana and Salesforce, Blend can ensure visibility across teams, assess ROI, and better understand when and where accounts are under-serviced.

Data Teams

Your business teams are not the only ones to benefit from Reverse ETL. It can also have a massive impact on your data teams. No data engineer wants to build or maintain custom integrations; instead, they'd rather be building custom data models and optimizing your current technology stack.

Nando's (the world-famous peri-peri chicken restaurant) experienced a similar problem, but since implementing Reverse ETL, engineering time has dropped from 80% to 20%.

Seesaw, the online learning platform lowered the time it took to integrate with Salesforce from days to minutes.

Reverse ETL Is Not Just Syncing Data

While at face value, it seems like most of the use cases for Reverse ETL are primarily focused on syncing data to downstream applications, this is not the case. Reverse ETL can also power notifications in messaging tools like Slack and workflows in various SaaS applications like Hubspot.

Vendr, SaaS platforms for buying SaaS products, uses Reverse ETL for this exact use case. Vendr leverages Reverse ETL to escalate messages directly to customers in Slack and notify employees whenever a sales rep saves customers over 25% in savings. Every time a transformation job finishes running in the warehouse (via dbt), Vendr syncs data directly to Hubspot to trigger workflows for specific emails to buyers and sellers.

Reverse ETL: Build vs. Buy

If you've ever bought enterprise software, you'll know there are always pros and cons to purchasing a purpose-built solution and building one in-house. If you're leaning toward the DIY camp, you'll likely need spare data engineering resources (if you have extra resources, you are one of the few).

Building one-off integrations in-house can get complicated very quickly because every third-party system is equipped with a unique API that is constantly updating and changing. That means you'll either have to download/upload manual CSV files or be forced to build a unique integration for every tool in your data stack.

Image of third-party APIs and CSVs

Third-party APIs & CSVs

You'll also have to monitor and manage each integration because a single change in an API can easily break your data flows, and this isn't even mentioning all of the factors you have to consider when integrating with third-party APIs, including:

  • Authentication
  • Reading
  • Writing
  • Deployment
  • Mapping fields
  • Querying source data
  • Rate limits
  • Batching
  • Parallelizing
  • Error handling
  • Monitoring

There are a lot more complexities involved with Reverse ETL. Depending on the resources in your engineering team, it might make sense to go this route. However, there's a lot you need to consider when integrating with third-party APIs.

Choosing a Reverse ETL Tool

If you're evaluating Reverse ETL platforms, there are a lot of factors to consider:

  • Ease of Use: Your Reverse ETL tool should be SQL based and accessible to non-technical users who want to self-serve.
  • Integrations: Your Reverse ETL tool should offer integrations that support your specific use cases.
  • Sync Flexibility: Data syncs run the risk of running up against API limits, so it's imperative that your Reverse ETL provider is fast, efficient, and reliable, only syncing the data that's been changed or updated.
  • Version Control: Your data syncs are just as valuable as your production code, and that means your Reverse ETL platform should integrate natively with Git so you can track incremental changes, roll back errors, and support bi-directional updates.
  • Live Debugging: Your syncs will inevitably fail, so a live debugger is extremely important for identifying what, where, and when something went wrong.
  • Configurable Alerting: You need to have direct control over your alerts. Alerting should not be limited to any specific tool, and you should be able to choose how you receive alerts.
  • Visual Audience Builder: Your Reverse ETL platform should give your marketers an easy way to define complex cohorts for activation using related data models and behavioral data.
  • Compliance: If your Reverse ETL provider doesn't meet industry-specific standards like SOC 2 Type 2, HIPAA, GDPR, or Privacy Shield, you shouldn't even consider them.
  • Multi-Region: Data residency laws are constantly changing, so your Reverse ETL tool should not be limited to a single cloud region.
  • Community/Vendor Support: Things inevitably go wrong, and when they do, you need access to rich documentation, real-time customer support, and 99.9% SLAs to reduce downtime.
  • Transparent & Scalable Pricing: Pricing should not be prohibitive based on the volume of your data, and it should scale as your organization grows.

Implementing Reverse ETL enables your data teams to focus on tasks that impact your business rather than doing the tedious job of building and maintaining Reverse ETL pipelines. However, when it comes to choosing a Reverse ETL tool, there are several factors you need to consider like data sync performance, version control, live debugging, number of integrations, configurable alerting, multi-region, etc. The good news is we put together a Complete Reverse ETL Buyer's Guide to help you get started.

Final Thoughts

Reverse ETL is a brand new category in the data space, and like any hot category, many companies will try and ride this wave. If you prefer investing in best-in-class tools and want to have a fully managed Reverse ETL solution up and running in a matter of minutes, sign up for a free Hightouch workspace today!

Sign up for more articles like this

Ready to leverage your customer data?

Hightouch logo

Your data warehouse is your source of truth for customer data. Hightouch syncs this data to the tools that your business teams rely on.

Copyright © 2022 Carry Technologies, Inc. dba Hightouch.
All rights reserved.

501 Folsom St3rd FloorSan Francisco, CA 94105United States