Redefined Metrics
Your single source of truth might not be as true as you thoughtEveryone's been in this position - you're a data analyst or data leader or executive, and you're in a meeting, and two people show up with different values for the same number. The number in question is an important metric for "knowing how the business is doing". Chaos ensues.
Some examples might include calculations for:
- Sales forecast amount
- The company's monthly recurring revenue (MRR)
- A churn rate
- A product conversion stat
- The effectiveness of a marketing campaign
But it could be anything your business wants to measure to drive operational efficiency.
"Redefined Metrics" is not a novel problem in data. For years, data teams have worked hard to solve the problem of "multiple sources of truth" with data modeling tools. While the problem surely goes back decades, this issue gained visibility among data leaders as early as 2018, as highlighted in the Forbes article Single Version Of Truth: Why Your Company Must Speak The Same Data Language. It's still a hot topic, discussed as recently as October 2023 in dbt's presentation of the new MetricFlow.
In this post, we'll run through the impact of redefined metrics, why they happen, how to detect them, and how to prevent and resolve the problem. You may be surprised to discover that your single source of truth isn't as "true" as you thought.
The Impact of Redefined Metrics
-
Confusion: When two people come up with different numbers for the same metric, it's confusing. It's hard to know who to trust, and it's hard to know what the "real" number is. This is highly unproductive, and we've all been in those meetings where people spend way more time arguing about the "real" number than figuring out how to respond to it.
-
Lost Trust in Data Team: This confusion can lead to a lack of trust in the data team. Despite all the effort to build a great semantic layer and a great high-quality data pipeline, the impression from leadership is that the company still lacks the ability to make good decisions with data.
Why Metrics Get Redefined
So how is it that, despite all the work that's been done to eliminate the "redefined metrics" problem, it still exists? Why is it that, even in teams with a well-defined kimball-modeled data set and a semantic layer, people still come up with different numbers for the same metric?
-
Evolving Needs: The business changes. And when the business changes, the metrics change. New dimensions emerge in the product and customer base that take time and effort to implement into modeling pipelines. If the "source of truth" metrics don't keep up, analysts and stakeholders are forced to fend for themselves.
-
Calcified Exploration: Exploratory analyses become core parts of someone's workflow. Someone figures out a complex query/analysis in a notebook as a prototype, and that becomes part of a report that goes to leadership weekly. A data engineering team, unaware of this workflow, doesn't merge it into the core pipeline. As the "source of truth" metrics evolve over time, the exploratory analyses don't. As engineered datasets are protected from quality incidents with tests, observability, and contracts, exploratory analyses are not. They're often forgotten about until they cause a problem.
Detecting Redefined Metrics
This can be a tricky one to detect automatically, especially because as noted above, it's not always clear which metric calculation is "right" and which is "wrong". However, there are a few strategies that can help you detect when metrics are being redefined.
-
Get Close to Stakeholders: The number one signal of redefined metrics is the confused meeting where there are multiple answers to the same question. If you're close to stakeholders, you'll hear about this. If you're not, you won't. Teams who do this really well will ensure a senior member of the data team is present at the executive meetings where these metrics are discussed. Even if everyone else in the meeting is a VP+, consider advocating for a data director or senior analytics engineer to attend that meeting as well.
-
Async Feedback Loops: In addition to sitting in on meetings where KPIs and business metrics are discussed, create a feedback loop for stakeholders to report discrepancies. This could be as simple as a Slack channel where people can ask "hey, I'm seeing this number in the warehouse, but I'm seeing this number in the dashboard. What's up with that?" Is it annoying to constantly chase questions about hand-rolled metrics? Sure. But it's better to learn about them than to have discrepancies lead to a lack of trust in the data team. As a bonus, you'll have a steady stream of feedback about what's important to stakeholders, and you can use that to inform your future work.
-
Analyze Query Logs: If you have a catalog of your key metrics, you can do some analysis on the queries that are being run against your warehouse and compare them to that catalog. For example, if you know you have a metric in your semantic layer called
projected_mrr
, you can look for queries that containSELECT ... as projected_mrr
. This is a bit of a blunt instrument, but it can be a good way to get a sense of what's being used in the wild. While much has been written about how LLMs lack the structural/mathematical awareness to truly "understand" SQL, we've found that tools like GPT can be quite good at analyzing SQL in this capacity.
Remediation and Prevention
It's easy to say "the only correct metrics are the ones the in production dataset," but to take a hard line here is to blind yourself and your team to the opportunities to harness signals about what data the business needs.
It's not enough to ship a new metric and move on to the next thing. Great data teams recognize that it's their responsibility to ensure that the metrics they ship are well-understood and well-used.
Here are a few strategies to prevent and resolve the problem of redefined metrics:
-
Education: Ensure that stakeholders and analysts understand the process for getting new metrics into the semantic layer. This is a process that should be well-documented and well-understood. If it's not, it's likely that people will take matters into their own hands. As appropriate, educate analysts and stakeholders how they can request new metrics, or even how to make pull requests into your transformation layer. You'll be leveling them up and making your life easier.
-
Require Transparency: Encourage and/or require that any metric presented to leadership include details about how the numbers were sourced and calculated. I've talked to at least a few CEOs who required that any chart presented to him include a summary of the calculation and a link to the SQL that generated it. This is a great way to ensure that everyone is on the same page about what the metric means and how it was calculated. This can also serve to deter folks from trying to hand-roll their own metrics in the first place.
-
Stakeholder Discovery: If you've read previous posts, you're probably waiting for the "bring a product mindset" angle, well here it is. Once you find misaligned analyses, you have to get curious and work with stakeholders to get to the bottom of what's happening. Just because a query against your warehouse doesn't match a metric in your semantic layer doesn't mean it's wrong. It could be that business data landscape has evolved, and now the metric in the semantic layer is wrong. You won't know if you've got a rogue query or a metric that's fallen behind until you have a conversation with the person/people running that query. As always, this is another great opportunity to understand the needs of your stakeholders and to build a strong feedback loop.
Although having a semantic layer is a great start, a semantic layer alone will not prevent the "redefined metrics" problem.
It can be easy to get mad when redefined metrics cause chaos and undermine trust in your team, but the best data teams recognize this behavior for what it is - valuable signal about what the business needs and how well the data team is doing at meeting those needs.