Skip to main content
Log in

Integrations

What is Data Taxonomy?

Find out what data taxonomy is and how it can benefit your data and your business.

Craig Dennis.

Craig Dennis

January 10, 2023

10 minutes

A diagram showing data taxonomy.

We’re entering a time when collecting data has become easy. It often doesn’t matter the number of data sources you have or where you want to put data; some tools can make the whole process easy.

One practice that this new era enables is the “composable CDP.” A composable CDP involves collecting customer data from sources, cleaning it and storing it in a central location, and then taking action on it, all with best-of-breed toolings such as Fivetran, Snowflake, and Hightouch.

When collecting all your company data, a key part is ensuring that people know what each data point represents. Failure to do this can result in poor decisions and negative business impact.

That’s why building a data taxonomy is so important. It clearly defines a business’s data, so everyone has the same understanding, and discussions can be made without any confusing semantics getting in the way.

What is Data Taxonomy?

Data taxonomy is a way of organizing and classifying data. It involves creating a hierarchy of categories and subcategories that can be used to classify and organize data consistently and logically so datasets can be understood quickly and the same regardless of whoever is looking at it.

The Benefits of Implementing Data Taxonomy

Better Decision-Making

We’ve all been in a situation where the power of hindsight would have been helpful when deciding to create a more positive outcome. However, you can access a similar superpower to help in these situations simply by accessing the correct information.

Getting the correct information to people within your company can provide potential answers or insights without them having to change context into other systems. However, sometimes they might not even know that certain information is available. And even if they know it exists, it could be hidden within a maze of folder structures or on a dashboard the user has forgotten about.

Data taxonomy allows these critical pieces of data to be organized to clarify what they are and how to find them. And better yet, you can use data activation to get the data right into the hands of the person who needs it by sending it into downstream tools using Reverse ETL technology.

Better Clarity and Communication

When the definitions of specific terms are clear, the data’s sources and meanings are clear. With these definitions and data organized, you can achieve more clarity and communication within your business.

A data taxonomy can make essential data more accessible because it’s made clear to more people within your business where it is. For example, for data around product features, it can be embarrassing if a salesperson learns about a new feature of your company from one of your own customers. And there is nothing worse than wasting energy strongly discussing something which isn’t an issue if both of you agree with the definition of a single term.

Data taxonomy can create harmony in your business and ensure everyone sings from the same hymn sheet.

Achieve Better Data Quality

Building a data taxonomy within your business can give you greater confidence in the quality of your data. Because there is a clear understanding of the structure of your data and a consistent naming convention and concise definitions, it can help to avoid errors. It can highlight if there are any inconsistencies in the data.

Avoid Duplication

Another benefit of data taxonomy is the avoidance of duplication. A clear understanding of your data can show you what datasets have been created so you can easily see if what you need already exists. For example, if someone in marketing was looking for the lifetime value of a set of customers, they could check to see if this exists before requesting it.

How to Build a Data Taxonomy

Now you should understand some benefits of why building a data taxonomy is a worthwhile investment. Now, let’s look at the steps to build a data taxonomy.

Identify the Goals and Objectives of the Taxonomy

The first step is identifying the goals and objectives of building a data taxonomy. This lets you clearly understand what you are trying to achieve and track your progression.

Create a Clear and Consistent Structure for the Data Taxonomy

Next, you need to plan a structure for the data taxonomy. You can see what data might be needed to help achieve your goals and objectives. It would be best to create a data hierarchy where tables contain related and similar information, which is formed logically so it’s easy for anyone to understand.

Define and Classify the Terms and Categories Within the Taxonomy

This may be one of the hardest parts of building a data taxonomy. Once you have an agreed structure, there needs to be an agreement on the terms and categories. You may think this is the easy bit, but in reality, typically, naming things can be a significant cause of debate, and it can be challenging to make sure you’re choosing something clear, concise, and consistent.

You want to try and keep terms and categories as short as possible so they don’t get complicated. Something like “customer_height_collected_via_app_signup_in_CM_to_two decimal_places” is super accurate but contains too much information. A better way of expressing this would be simply “customer_height_CM.”

Another important naming convention is to make things consistent. If you settle on snakecase (where you use “_” to separate words, for example, customer_name), make sure it’s used for everything. If you don’t have consistency, it can be difficult to find specific tables when you need to search for them.

One thing to remember when going through this exercise is you want whatever you name to be clear to everyone, even someone that has never seen the structure before. Going for names such as table1, table2, etc., is not helpful to anyone as it doesn't reveal the table's information.

Implement the Taxonomy in the Appropriate Systems and Platforms

Now you have all the hard work done, it's time to implement your data taxonomy. This could take a lot of work if you're updating your current systems and platforms and may require detailed planning to ensure that you don't run into any issues.

It's ideal to handle this before you start collecting any data, but typically it can be difficult to get buy-in for something that, at first glance, doesn't add value and could add delays. Another challenge is that it might be difficult to know what data you will collect until operations start.

Tips and Best Practices for Creating an Effective Data Taxonomy

Involve Multiple Stakeholders in the Creation Process

It's always a great idea to get all the multiple stakeholders involved in building a data taxonomy, especially those that will update and make amendments. The more these people are engaged with the process, the more likely they will maintain the standard once it is up and running.

Regularly Review and Update the Taxonomy to Reflect Changes in Data and Business Needs

Once you have a data taxonomy in place, it's not the case to set it and forget it. Things can change in your business, from the data you collect to your business’s needs. It makes sense to regularly set time aside to review and update your data taxonomy and make a plan if any future changes may make an impact.

Data Taxonomy Example

To show you what a potential data taxonomy would look like, here’s a diagram showing the structure of a film rental company and the hierarchy of their data.

Data Taxonomy Example

Data Taxonomy vs. Other Data Models and Definitions

Data Taxonomy vs. Data Ontologies

Data ontology is a high-level term that encompasses fields such as computer science, information technology, database management, and data analysis. It involves the representation, formal naming, and definition of data classes. Data ontology may be represented as a data taxonomy chart or a data model, depending on the complexity of the database. Data taxonomy, on the other hand, only focuses on hierarchy and does not necessarily require observation IDs or attributes.

Data Taxonomy vs. Data Hierarchies

Data hierarchy is an inherent concept in data taxonomy, as it involves classifying data into a hierarchical structure. This hierarchy can be visualized in a data taxonomy chart but does not have its own separate representation.

Data Taxonomy vs. Metadata

Metadata is an inherent concept in data taxonomies, as taxonomies summarize data within their classifications. However, metadata typically refers to a comprehensive summary of a dataset, such as a data dictionary, while the metadata in a data taxonomy is usually limited to the observation ID. This way, metadata provides more detailed information about the data than a data taxonomy.

Data Taxonomy vs. Data Classifications

Data classification is a broad term that encompasses all activities related to the organization and structure of data within a dataset. Data taxonomy is a specific sub-discipline of data classification that focuses on giving hierarchy to data by classifying it into a structured system. In this way, data taxonomy is a part of the larger field of data classification.

Data Taxonomy vs. Data Dictionaries

While data taxonomy charts may be similar to data models, the concept of data taxonomy is similar to a data dictionary. A data dictionary is a table that describes the columns of another table based on shared traits such as name, definition, and data type. This allows users to understand complex databases without investigating each column individually. Data dictionaries provide a summary of information about the data, while data taxonomy organizes the data into a hierarchical structure.

Data Taxonomy vs. Data Catalog

The way that a data catalog is different from data taxonomy is that a data catalog is a central repository or database that stores metadata about an organization's data assets, information such as name, description, location, and format of each data asset. One of the main roles of a data catalog is to help users discover and understand the available data and to facilitate the use and reuse of data across an organization. While data taxonomy is the way of organizing data.

Data Taxonomy Tools

Data taxonomy tools can be simple. The tool needs to be able to display the data taxonomy structure that you have planned. If your structure is simple enough, you can achieve this with a pen and paper.

Excel is another option that can help you plan your data taxonomy. However, mapping relationships can be difficult.

A great choice for planning and displaying relationships is using a tool like drawSQL. DrawSQL lets you create the needed tables, easily visually organize your data, and show the future relationships between data.

DrawSQL has other benefits, such as leaving notes and easily sharing diagrams with an online link.

Conclusion

Data taxonomy is one of the tasks which often doesn’t get carried out due to the lack of perceived value. However, having access to clean data which a good data taxonomy structure can really make the difference in helping your business understand its data better and more effectively use data to do things like obtain a complete 360-degree view of a customer.

More on the blog

  • What is Reverse ETL? The Definitive Guide .

    What is Reverse ETL? The Definitive Guide

    Learn everything there is to know about Reverse ETL, how it fits into the modern data stack, and why it's different than ETL.

  • The CDP As We Know It Is Dead: Introducing the Composable CDP.

    The CDP As We Know It Is Dead: Introducing the Composable CDP

    Learn why CDPs are dead and how you can take advantage of the data warehouse.

  • What is Data Activation?.

    What is Data Activation?

    Learn everything to know about Data Activation, what it is, why it matters, and how you can get started activating your data today.

Share

Sign up for our newsletter

Ready to activate your data?

Get startedBook a demoBook a demo

Recognized as an industry leader
by industry leaders

We are proud to be recognized as a leader in Reverse ETL and Marketing & Analytics by customers, technology partners, and industry analysts.

Gartner 'Cool Vendor', 2022..
Snowflake 'Marketplace Partner of the Year', 2022..
G2 'Leader', Fall 2022.
G2 'Leader', Winter 2023.
Snowflake 'One to Watch for Activation and Measurement', 2022.
Fivetran 'Ecosystem Partner of the Year', 2022.