The Data Catalog: Shopping for the Most Valuable Data in Your Organization

Karla Ortiz Flores
7 min readDec 16, 2021

A good data catalog is like a mall for all of your company’s valuable assets. It allows people to shop around and find what they need in one central location, whether it just be metadata about how something was collected or where the original source material came from. There are many best practices that go into building an effective Data Catalog which this article will discuss more fully later on.

But for now, the most important thing to remember is that data cataloging should be seen as an ongoing process and not a one-time event. Like data sources, data catalogs evolve and change over time.

So whether you’re just starting to build a data strategy for your organization or looking at what data assets are currently available, this article will highlight some helpful steps along the way so that success is inevitable.

A data catalog can be an extremely valuable tool for improving data governance as well. By providing a single source of truth for all data within the organization, you can help ensure that everyone is using data consistently and data governance is being enforced.

It’s also a great way to build trust within your organization by showing everyone that you’re committed to data quality, data security, and data accessibility. In other words: it shows you care about the company as much as they do!

Why does a company need a data catalog?

A data catalog is a valuable asset for any organization, but it’s especially important for companies that are looking to implement a data-driven strategy. By creating a data catalog, you’re able to make your data more accessible and find data that you didn’t even know existed. This can help improve data quality and speed up data requests from business managers, which is especially important for companies looking to make data-driven decisions.

A typical data stack of a company could include data from the following sources:

- Data warehouses. This data is usually the most organized and structured data in an organization. It’s typically used for decision support and historical analysis.

- Operational data stores (ODSs). This data is the result of day-to-day transactions and is used to make real-time decisions.

- Unstructured data, such as text files, data gathered from data sensors, and data obtained through social media.

- Data lakes. This data is often data that didn’t fit into a data warehouse or ODS. It’s typically used for advanced analytics but is not as organized and structured as data from data warehouses.

- Big data stores such as Hadoop, NoSQL and Teradata. This data is used for advanced data analytics and data science projects.

Where to start?

When putting together a data catalog, you should capture the following details: data type, data source, data contributor (employee responsible for entering data into the system), data quality. Additionally, you should also document data governance policies, data quality standards, metadata requirements for data consumers and a process for approving data requests from data scientists as well as data updates made by any employees who have access to your company’s systems.

When creating a data catalog, there are a few key things to keep in mind:

- The metadata for each data set should be thorough and accurate. This will make it easier for data consumers to understand the data and determine if it’s appropriate for their needs.

The metadata for each data set should be thorough and accurate. This will make it easier for data consumers to understand the data and determine if it’s appropriate for their needs.

- Data teams should create a metadata template that data catalog managers can use for each data set. This will make it easier to keep the metadata consistent and thorough.

- The data within the catalog should be organized in a way that makes sense for your business. You may want to organize data by department, data type, or other criteria that is relevant to your organization.

- The data catalog should be easy to use and navigate. This will make it easier for data consumers to find the data they need and reduce the time it takes them to get upFirst, data teams should create a metadata template that data catalog managers can use for each data set. This will make it easier to keep the metadata consistent and thorough.

- The data catalog should be updated regularly. This will ensure that the data is up-to-date and accurate.

Now let’s break this down into steps on how you can create your own data catalog.

Step 1: Define governance around data catalog

The first step is to establish governance around the data catalog. This includes setting policies for how data will be collected, how data will be updated, and what data may or may not be included in the catalog. Other governance areas that you need to consider include data quality standards, metadata requirements for data consumers, and a process for approving both data requests from data analysts as well as data updates made by any employees who have access to your company’s systems.

Step 2: Create metadata template

Once you’ve established data governance, data teams should start creating a metadata template. This will make it easier to keep the metadata consistent and thorough while ensuring data quality standards are met at all times. For each data set in your catalog, this includes identifying what type of data it is (structured or unstructured), how it was collected, data sources, data contributors (employees responsible for entering data into the system), and data quality. Your metadata template needs to include data governance policies, data quality standards, metadata requirements for data consumers and a process for approving data requests from data scientists as well as data updates made by any employees who have access to your company’s systems.

Now that you have a metadata template, data teams should start creating your data catalog by populating each data set with all of this information and more if necessary.

Step 3: Organize data by relevant criteria

The data within your data catalog should be organized in a way that makes sense for your business. This could mean organizing data by department, data type, or other criteria that is relevant to your organization. Doing this will make it easier for data consumers to find the data they need and reduce the time it takes them to get up-to-speed on your data. There are other ways you can organize your data catalog as well, including data quality (how up-to-date it is), data format (spreadsheets or databases), data type (financial data or marketing data for example) and data source. However you decide to organize your catalog, make sure that the criteria makes sense for your business.

Step 4: Make data catalog easy to use and navigate

The data catalog should be easy to use and navigate. This will make it easier for data consumers to find the data they need and reduce the time it takes them to get up-to-speed on your data. The data catalog should also include search functionality, so data consumers can quickly find the data they’re looking for. A best practice for making your data catalog easy to use and navigate is to use the same terminology for data sets and fields across all data sets. Additionally, consistent naming convention will help data consumers quickly identify the data they need.

Step 5: Update data catalog regularly

The data catalog should be updated regularly. This will ensure that the data is up-to-date and accurate. Data updates can come from data scientists who have requested data, employees who are responsible for entering data into the system, or any other source of data within your organization. Having an up-to-date data catalog is critical for data accuracy and will help data consumers make better decisions with data-driven insights. Too often companies fail to update their data catalogs, which can lead to data inaccuracies and missed opportunities.

Step 6: Publish data catalog

Once your data catalog is populated and organized, it’s ready to be published. This will make it available to all employees in your organization who need access to it. You may also want to consider publishing your data catalog on an intranet or internet site for easy access.

Now that you have a better understanding of the steps involved in creating a data catalog, let’s take a look at some of the benefits that come with having one.

3 Benefits of Data Catalogs

  1. First, data catalogs improve data governance. This is because data governance policies are able to be enforced throughout the organization with a data catalog in place. Data consumers can quickly find data they need while data contributors have clear guidelines about what type of data should be included within the system and how it should be labeled for consistency purposes.
  2. Second, easier data requests are made possible with data catalogs. Data consumers are able to make data requests faster and more efficiently because they have a data catalog available for reference when looking for data sets or specific fields within data sets. This also reduces the chance of duplicate data requests, which saves time and resources that could be used elsewhere in your organization
  3. Finally, improved quality standards because data is organized and easy to find. Data consumers can quickly identify data sets that do not meet quality standards and take appropriate steps to remedy the situation. Poor data quality can lead to inaccurate insights, which in turn can impact business decisions. A data catalog helps to mitigate this risk by providing a place for data contributors and data consumers to collaborate on improving data quality

Creating a data catalog can be a time-consuming process, but it’s worth it in the end. Not only will it make data more accessible and easier to use, but it can also help improve data quality and speed up data requests from business managers. So if you’re looking to create a data-driven organization or data-driven team, it’s important to consider building a data catalog. In conclusion, a data catalog is a powerful tool for data consumers to find the data they need and reduces the time it takes them to get up-to-speed on your data.

--

--

Karla Ortiz Flores

Director of Technology and Data at a New York Multifamily Office | AI Tinkerer | Former Fortune 500 Management Consultant