Digital Maturity Level 2 Guide

In our Data Maturity Roadmap, we made a high level plan to reach Data Maturity Level 2. In this article, we'll break down the plan into actionable steps that you can follow to achieve Level 2 Data Maturity.

We now have a design and a plan. It's time to get to the nitty gritty details of each step.

We'll use the example of PanPan, a fictional backery with 3 physical stores and delivery service. Panpan currently has sales data in spreadsheets downloaded from their point of sale system by hand, customer data in a CRM, and inventory data in a 3rd party system. They have no alerting or automation in place. Reports are manually created once a week by exporting data from each system into Excel. Panpan wants to be able to see all their data in a single place, trust and understand were the data comes from, and be able to easily create the tools they need to take their business to the next level.

1. Analyze & Design

Map existing data flows

First we want to have an understanding of the business. Make sure you have an organizational diagram, a list of key business processes, and a map of existing data flows. This will help you understand how data is currently being used, where it is coming from, and where it needs to go.

Identifying key data sources

Think about all the sources of data that are relevant to your business. These can include internal tools like Excel spreadsheets, databases, CRM's, ERP's, marketing platforms, customer support systems, and external data sources like social media, market data, and public datasets.

Select a few key data sources to start with. These should be the ones that have the most impact on your business and are feasible to integrate at this stage. Use the data flow map to help you identify these sources.

For PanPan, these will be their point of sale system, CRM, and inventory management system.

Define business goals

Define what are 2-3 key high level business metrics that will help you measure success. These should be aligned with your business goals and challenges. Make sure these metrics are specific, measurable, achievable, relevant, and time-bound (SMART). At the end we'll make sure we can track these goals with our data system.

In his book "Measure What Matters", John Doerr popularized the concept of Objectives and Key Results (OKR's). This is a simple framework to help organizations define and track objectives and their outcomes. It has become a widely adopted method to set goals and measure progress.

For PanPan, these could be:

Increase monthly revenue by 15% in the next quarter.

Reduce inventory costs by 10% in the next 6 months.

Improve customer retention rate by 20% in the next 6 months.

Define Reporting needs

Identify the information that you want to track and report on. This can include sales performance, customer behavior, inventory levels, and operational data. Define the key performance indicators (KPI's) that will help you measure progress towards your business goals.

For PanPan, these could be:

Daily and monthly sales and customer counts per store.

Customer acquisition and retention rates.

Inventory turnover rates and stock levels.

Design data model

This is probably the most important step of the process. A good data model will make your life much easier in the future. A bad data model will make things harder than necessary. Start by identifying the main entities in your business. These can include customers, products, sales, inventory, and employees.

To fully understand this step, you need to know about data modeling basics like strucutred vs unstructured data, normalization and relationship types (one to one, one to many, many to many). There are many resources online that can help you understand these concepts. Here is one from K21 Academy to learn about types of data. And here is one from Google to learn about data normalization. If you want to learn how to do Entity Relationship Diagrams (ERD's), this is a good resource from Lucidchart.

We'll create a simple relational data model. Define the attributes for each entity and the relationships between them. Create an entity-relationship diagram (ERD) to visualize the data model. This will allow you to define a set of tables with columns and relationships that will make it easy to store and query your data in a structured way.

For PanPan, the main entities will be Customers, Products, Sales, and Inventory. The relationships will be:

A Customer can have many Sales (one to many).

A Product can be part of many Sales (many to many).

A Product has one Inventory record (one to one).

Plan for automation

With our data model defined, we can now plan how we'll automate the data flow. Identify the data sources that need to be ingested, transformed, and loaded into the central repository. Define the frequency of data updates (real-time, hourly, daily, weekly). Select the tools and technologies that will be used for data ingestion, transformation, and loading (ETL/ELT).

For PanPan, we'll extract data from our sources using n8n workflows that run daily and save the data into a central PostgreSQL database. If you dont't know what N8n or PostgreSQL are, you might want to know! Check our N8N Guide and our PostgreSQL Guide to learn more about these amazing open source tools.

Define basic governance policies

For a first version. We want to make sure that the data is locked down and only available through the dashboards and reports we create. Except for those with special admin access.

For PanPan, only the us, the owners, will have admin access to the database. Employees will only be able to see data through dashboards and reports.

Define disaster recovery plan

What happens if your data is lost or corrupted? You need to make sure you have a plan to recover your data in case of a disaster. Define the backup frequency, retention period, and recovery procedures.

For PanPan, we'll use automated daily backups of our PostgreSQL database stored in a secure cloud location.

2. Centralize & Format Data

Feed all key data sources into a central repository

Using our data model and automation plan, we can now start ingesting data from our key sources into a central repository. There are many tools available to help with this process. Some popular open source options include AWS Redshift, GCP BigQuery, PostgreSQL, Python, Airbyte, N8n and many more.

As mentioned before, PanPan will use N8n to create workflows that extract data from their point of sale system, CRM, and inventory management system. The data will be loaded into a PostgreSQL database.

Standardize formats, naming conventions, and data types

raw (csv, json, xml, etc)
formatted (parquet, types, etc)
enriched (selection, calculated fields, external data)

Data from different sources can come in different shapes and forms. To make sure that the data is consistent and easy to work with, we need to standardize the formats, naming conventions, and data types. This will make it easier to query and analyze the data.

After that we can enrich the data by adding calculated fields, selecting relevant columns, and integrating external data sources.

Finally, we can store the selected data in a relational format according to our data model.

For PanPan, we'll have a first step that will save the raw data. And a second step that will format, enrich, and store the data in the PostgreSQL database according to our data model.

Create tables according to the designed data model

Using the standardized and formatted data, we can now create tables in our central repository according to the data model we designed earlier. This will allow us to store the data in a structured way that is easy to query and analyze.

For PanPan, we'll create tables for Customers, Products, Sales, and Inventory in our PostgreSQL database.

3. Implement Dashboards, Reports, Alerts & Automation

To reach this point you had to do a lot of work. Congratulations! We finally have our materials ready to create the tools that will help us leverage our data to achieve our business goals.

Build standard reports and dashboards for key business metrics

Now we can use our favorite Business Intelligence tool to create reports and dashboards. Some popular open source options include Metabase, Superset, Redash, and Grafana.

For PanPan, we'll use Metabase to create dashboards that show daily and monthly sales per store, customer acquisition and retention rates, and inventory turnover rates.

Set up automated alerts for critical thresholds

To make sure we are always on top of our business, we can set up automated alerts that notify us when critical thresholds are met. This can include low inventory levels, high sales volumes, or changes in customer behavior.

For PanPan, we'll set up alerts in Metabase to notify us when inventory levels for any product fall below a certain threshold.

Wrap Up

By following this 3 step plan, you can successfully reach Level 2 Data Maturity. This will allow you to centralize and standardize your data, create reports and dashboards, and set up automated alerts. With these tools in place, you'll be able to leverage your data to achieve your business goals and take your organization to the next level.

After this you will be able to start thinking about exploiting cutting edge tools like AI and Machine Learning to create innovative solutions that leverage your data in new and exciting ways.