Skip to main content
Log in

Integrations

What is Reverse ETL? The Definitive Guide

Learn everything there is to know about Reverse ETL, how it fits into the modern data stack, and why it's different than ETL.

Tejas Manohar.

Tejas Manohar

Luke Kline.

Luke Kline

September 6, 2022

15 minutes

What is Reverse ETL? The Definitive Guide.

Don't want to read? Watch our video overview of Reverse ETL instead.

The data ecosystem has changed drastically over the last six years, and we've witnessed the rise and fall of several different technologies. However, one constant remains the same: the cloud data warehouse.

Thanks to platforms like Snowflake and Google BigQuery, it's never been easier to consolidate all of your data into a centralized platform to tackle your analytics use cases. The problem is your data warehouse is only accessible to your technical users who know how to write SQL, so the platform you purchased to eliminate data silos has inevitably become a data silo. This is precisely why Reverse ETL is so important.

What Is Reverse ETL?

When data teams say ETL, they're usually referring to replicating data from a data source to a data warehouse – either through a custom pipeline they've built in-house or through a tool like Fivetran.

Reverse ETL is the opposite.

Reverse ETL is the process of copying data from your central data warehouse to your operational tools, including but not limited to SaaS tools used for growth, marketing, sales, and support.

At this point, you're probably thinking:

"I've spent all of this money and time getting my data into a warehouse to serve as my single source of truth, and now you're telling me I need to take my data back out of the warehouse?"

You probably have several core metrics unique to your business that your data team has defined via a data model in your warehouse. If you're a B2B business this might include metrics like workspaces, churn rate, lead score, annual recurring revenue, etc. If you're a B2C business, this might include metrics like daily active users, customer acquisition costs, lifetime value, etc. Reverse ETL is all about syncing the data in your warehouse to your downstream business tools.

Image of data being synced to Salesforce

Data Being Synced to Salesforce

What’s the Difference Between ETL and Reverse ETL?

The traditional ETL process has been around since the 1970s, and data pipelines have essentially remained unchanged. For those unfamiliar, ETL stands for extract, transform. It's the process of automatically extracting, transforming, and loading data into your desired destination (e.g., a data warehouse or data lake).

Image of ETL Process

The ETL Process

Fully managed SaaS platforms like Fivetran have simplified this process by offering pre-built connectors to extract and load your data. And dedicated transformation tools like dbt have shifted this process to ELT, where your data is transformed after it's loaded into your warehouse. The question is: Why can't you just use conventional ETL to move data out of your warehouse?

Conventional ETL is a one-way door. You can't use ETL pipelines to read and write data out of your warehouse. Reverse ETL, as the name implies, is, in fact, "reverse, " meaning you have to write reverse SQL.

Tools like Fivetran can't move data out of your warehouse because these platforms focus on merging data and loading it into tables based on "updated_at" fields. If you make a mistake, you can just delete your table and re-ingest your data.

With Reverse ETL, you're syncing rows of data, which means you have to diff your data and compare the values of your current warehouse query to what you've previously synced; otherwise, you end up spending a ton of money and wasting a tremendous amount of time. Since you're sending data to downstream tools and integrating with third-party APIs, you also have to be concerned with rate limits and sync failures.

Image of Reverse ETL Process

The Reverse ETL Process

An ETL tool like Fivetran is primarily used for powering dashboards. In contrast, Reverse ETL powers workflows, marketing campaigns, and business processes where time sensitivity is critical.

Ultimately, there are many technical differences between ETL and Reverse ETL, and if you're interested in getting into the weeds, you can read our post here.

At face value, Reverse ETL simply queries against your data warehouse. However, most people don't know that Reverse ETL requires you to write Reverse SQL, so moving data out of your warehouse and back into your operational systems and SaaS tools can be very challenging.

Where Does Reverse ETL Fit in the Modern Data Stack?

For the most part, every modern data stack has several core components that are the same across analytics teams, and usually, it looks something like this.

Image of the modern data stack

The Modern Data Stack 2.0

You're probably collecting data from several different sources and ingesting that information directly into your data warehouse so your analysts can transform it. From here, you persist this data into a dashboard where it sits indefinitely. As such, Reverse ETL has arisen to tackle the "last-mile problem" of helping you activate your data.

Reverse ETL is not a new concept by any means. Companies have been trying to activate their data for years. In the past, moving data out of the warehouse required you to either manually download/upload CSV files or build custom integrations and pipelines for every single one of your SaaS applications and end systems. Neither option was scalable.

Instead of reacting to your data as it's persisted into a dashboard, Reverse ETL allows you to take a proactive approach and put it in the hands of your business users to take action.

Reverse ETL vs. Point-to-Point-Solutions

Point-to-point tools or iPaaS (integration platform as a service) technologies like Zapier, Tray, and Workato can be an attractive option for tackling Reverse ETL use cases because they let you send data from one platform to another without code, but they create a web of complex pipelines that aren't scalable. If you have just four applications, you'll quickly find yourself with 16 potential pipelines pipelines (e.g., 4x4 = 16).

All iPaaS tools work similarly; they perform actions based on a trigger you define. You have to build custom workflows for every integration in your data stack, which becomes a nightmare as you weave in various dependencies and if/then clauses (just look at this example of a Tray workflow.)

Reverse ETL creates a hub-and-spoke approach, where the warehouse is your central source of truth, completely eliminating the complex web of pipelines and workflows that come with conventional point-to-point solutions.

Image of Point-to-Point Approach Compared to Reverse ETL

Point-to-Point vs. Hub & Spoke

Reverse ETL vs. CDPs

You're probably familiar with customer data platforms (CDPs.) Platforms like Segment have made a name for themselves in the marketing world by creating a single platform where you can store all your customer data and activate it across your various business teams and operational systems.

The main advantage of these platforms is that they provide built-in data ingestion, identity resolution, audience management, and data sharing. CDPs have several flaws, though.

Firstly, you don't own the data. CDPs force you to store data outside your cloud infrastructure, which can have significant implications around GDPR, CCPA, or HIPAA. A CDP doesn't replace your data warehouse; it just creates a second source of truth based on your data warehouse.

Secondly, CDPs are extremely expensive. In most cases, pricing is based on your total number of customer records, meaning you pay based on volume. You inevitably end up paying for an additional storage layer even though all your customer data already lives in your warehouse.

CDPs are incredibly rigid. You'll often find yourself deleting your whole instance just so you can reconfigure your settings or reload your data. On top of this, most CDPs force you to use proprietary data models representing only users and accounts. Most companies have unique objects like workspaces, subscriptions, playlists, daily active users, etc.

CDPs also have limited transformation capabilities, so you're often forced to file a support ticket if you need to clean your data set beyond their capabilities. With Reverse ETL, you can leverage all of your existing transformation capabilities and existing data models.

Implementing a CDP can take over six months, not even mentioning the time it takes to train your different teams on how to use one. At their core, CDPs are rigid black boxes that are not easily configurable.

Why You Need Reverse ETL

Every company wants to be more data-driven. Yet the most daunting question for every organization is "how"? Deriving insights from data is part one, but the last mile of "analytics enablement" (e.g., translating those insights into action) is a different ball game. While on the surface, it can seem like Reverse ETL is just focused on syncing data, there are three primary use cases for Reverse ETL: data activation, data automation, and data infrastructure.

Data Activation

Data Activation is the method of unlocking the knowledge sorted within your data warehouse and making it actionable by your business users in the end tools they use every day. In doing so, Data Activation helps bring data people toward the center of your business, directly tying their work to business outcomes.

Analytics enablement is typically seen as a people problem, which is valid to some extent – but how you present data can play an equally significant role.

Imagine you're a B2B company trying to figure out which accounts your sales reps should focus their efforts on. In most scenarios, your data analyst would use SQL to derive characteristics of high-value leads and present them to you in a BI report. The problem is that this data isn't actionable and to your analyst's dismay, the report is rarely even opened.

A traditional analytics enablement outlook to this problem would be to train sales reps on how to leverage BI reports as part of their day-to-day workflow. In practice, this is tough because data enablement is why most data projects fail.

Instead of training your sales reps to use BI reports, what if you could empower your analysts to feed lead scores from your data warehouse into a custom field in Salesforce? You can apply this same thought process to any operational analytics use case.

Data Automation

Data Activation is flashy, but companies are filled with far less glamorous problems when it comes to data. In any sizable organization, tons of manual requests for data are floating around, and with any manual process, there's always the question of how to automate it. Here are a few common examples of simple data requests from various teams:

  • Sales wants the list of webinar attendees to import as leads into Salesforce.
  • Marketing wants to sync a list of new users to Google Ads for retargeting.
  • Support wants search Zendesk for accounts with premium support.
  • Product wants a Slack feed of customers who have enabled a feature.
  • Accounting wants customer attributes to be synced to NetSuite.
  • Finance wants a CSV of rolled-up transaction data to use in Excel or Google Sheets.

Image of Reverse ETL powering business teams

How Reverse ETL Powers Business Teams

There's a high probability that you've had to deal with at least one of these requests. The data is likely already available in your data warehouse. With Reverse ETL, SQL is all you need to extract and sync that data to your external tools – thus making it the simplest solution.

Internal ETL

Reverse ETL has also emerged as a general-purpose solution in data infrastructure and software engineering and there are two primary use cases powering this:

Reverse ETL has also emerged as a general-purpose solution in data infrastructure and software engineering, and there are two primary use cases powering this:

  1. Personalizing customer experiences
  2. Accessing disparate data sources

Reverse ETL has also emerged as a data infrastructure solution for building personalized in-app customer experiences by syncing data from your warehouse to Production databases like Postgres and MongoDB.

E-commerce is a great example. Pretend your data science team calculates a lead score on top of your data warehouse or data lake to define a user's likelihood of buying a product. And your growth team wants to drive more purchases by offering discounts to users who are deemed unlikely to make a purchase. Since your warehouse is too slow to serve user-facing experiences, your engineers could use Reverse ETL to sync a propensity score in your warehouse to your production database – thus giving you the ability to serve in-app customers with personalized experiences.

B2B SaaS is another example. Pretend you're a company with enterprise customers on a contract. After you onboard a new enterprise, your sales deal desk records each customer's credit allotment in Salesforce. Your customers keep asking to see their credit allotment inside your web app, but your developers don't want to integrate with Salesforce. However, there's a high probability that Salesforce data is already available in your data warehouse.

With Reverse ETL, you can sync that data to your production database to give your customers direct access to their billing information.

Reverse ETL Use Cases

Although it's relatively easy to see why you need Reverse ETL, it's not always as straightforward to see what that entails because there are a near-limitless number of use cases for Reverse ETL.

Marketing Teams

Figuring out how to increase match rates with ad platforms, optimize return on ad spend (ROAS), and decrease customer acquisition costs (CAC) is extremely important. With Reverse ETL, you can sync custom audiences built in your warehouse to any of your marketing channels to improve personalization (e.g., target customers who abandoned their shopping cart in the last seven days.)

Sales Teams

Your sales team wants and needs access to the unique behavioral and product usage data in your warehouse (e.g., workspaces, subscriptions, items in cart, page views, last login date, etc.) With Reverse ETL, you can send this data directly to your CRM so your sales teams can take action in real-time to grow your bottom line.

Product Teams

The key to improving your product and driving adoption is experimentation and optimization. However, to build personalized experiences for your customers, your product team needs to be able to answer questions like:

  • Who are our most active users?
  • What is our most popular feature/product?
  • How can we increase conversions?
  • When do customers abandon their cart?
  • Where do users drop off in the onboarding process?

With Reverse ETL, you can sync key attributes about your customers directly to your production database to power onsite personalization.

Support Teams

Prioritizing the right tickets and reducing churn should be the end goal of every customer success team. Suppose you've already defined key metrics like lifetime value, annual recurring revenue, or churn rate in your data warehouse. In that case, you can use Reverse ETL to sync this data directly to your customer support tool so your success teams can prioritize tickets with the highest impact.

Data Teams

Your engineering teams don't want to build and maintain custom pipelines all day. They'd rather be building custom data models and optimizing your current technology stack. With Reverse ETL, your data team can enable your business teams to self-serve and focus on business outcomes.

Reverse ETL Drives Action

While at face value, it might seem like Reverse ETL is primarily focused on syncing data to your downstream tools, this is not the case. You can leverage reverse ETL to power notifications and alerts in messaging tools like Slack or complex workflows in tools like Hubspot.

Build vs. Buy

If you've ever bought enterprise software, you'll know there are always pros and cons to purchasing a purpose-built solution and building one in-house. If you're leaning toward the DIY camp, you'll most likely need spare data engineering resources, which are non-existent.

Building custom Reverse ETL pipelines can become complicated very quickly. Every third-party API is constantly updating and changing, so you'll either have to download/upload manual CSV files or build a unique integration for every tool in your data stack.

Image of third-party APIs and CSVs

Third-party APIs & CSVs

You'll also have to monitor and manage each integration because a single API change can break your entire data flow, and this isn't even mentioning all of the other factors you have to consider like:

  • Authentication
  • Reading
  • Writing
  • Deployment
  • Mapping fields
  • Querying Source Data
  • Rate limits
  • Batching
  • Parallelizing
  • Error handling
  • Monitoring

Reverse ETL is complex, and there's a lot you need to consider when integrating with third-party APIs.

Final Thoughts

Reverse ETL is a brand new category in the data space, and like any hot category, many companies will try and ride this wave. If you prefer investing in best-in-class tools and want to have a fully managed Reverse ETL solution up and running in a matter of minutes, sign up for a free Hightouch workspace today!

Reverse ETL Whitepaper

Download our whitepaper to learn:

  • How to leverage Reverse ETL to eliminate data silos and activate your data warehouse to improve business outcomes
  • Specific use cases for Reverse ETL and how it fits into the modern data stack
  • New ways all teams can meet goals with better access to data in their operational tools
  • The core must-haves in a Reverse ETL solution

More on the blog

  • ETL vs. Reverse ETL: The Technical Differences.

    ETL vs. Reverse ETL: The Technical Differences

    Discover the technical differences between Reverse ETL and ETL/ELT and learn how they work behind the scenes.

  • The future of data integration: what iPaaS & Workflow Automation tools got wrong.

    The future of data integration: what iPaaS & Workflow Automation tools got wrong

    Learn about the shortcomings of iPaaS/Workflow Automation solutions and why the future of data integration is declarative

  • What is Data Activation?.

    What is Data Activation?

    Learn everything to know about Data Activation, what it is, why it matters, and how you can get started activating your data today.

Share

Sign up for our newsletter

Ready to activate your data?

Get startedBook a demoBook a demo

Recognized as an industry leader
by industry leaders

We are proud to be recognized as a leader in Reverse ETL and Marketing & Analytics by customers, technology partners, and industry analysts.

Gartner 'Cool Vendor', 2022..
Snowflake 'Marketplace Partner of the Year', 2022..
G2 'Leader', Fall 2022.
Snowflake 'One to Watch for Activation and Measurement', 2022.
Fivetran 'Ecosystem Partner of the Year', 2022.