Five building blocks of a data-driven culture

Comment

Image Credits: locrifa (opens in a new window) / Shutterstock (opens in a new window)

Carl Anderson

Contributor
Carl Anderson is the author of Creating a Data-Driven Organization. He previously headed up data, analytics and data science at Warby Parker and WeWork, and is currently a member of WeWork’s Product Research Department.

How can organizations leverage data as a strategic asset? Data comes at a high price. Businesses must pay for data collection and cleansing, hosting and maintenance, salaries of data engineers, data scientists and analysts, risk of breach and so on.

The line items add up. However, if done well, a thriving data-driven organization can reap huge rewards. Controlling for other factors, Erik Brynjolfsson et al. from MIT’s Sloan School of Management found that data-driven organizations have a 5-6 percent higher output and productivity than their less data-driven counterparts. They also had higher asset utilization, return on equity and market value. Other research shows that analytics pays back $13.01 for every dollar spent. Being data-driven pays!

To be data-driven requires an overarching data culture that couples a number of elements, including high-quality data, broad access and data literacy and appropriate data-driven decision-making processes. In this article, we discuss some of the key building blocks.

Single source of truth

A single source of truth is a central, controlled and “blessed” source of data from which the whole company can draw. It is the master data. When you don’t have such data and staff can pull down seemingly the same metrics from different systems, inevitably those systems will produce different numbers. Then the arguments ensue. You get into a he-said-she-said scenario, each player drawing and defending their position with their version of the “truth.” Or (and more pernicious), some teams may unknowingly use stale, low-quality or otherwise incorrect data or metrics and make bad decisions, when they could have used a better source.

When you have a single source of truth, you provide superior value to the end user: the analysts and other decision makers. They’ll spend less time hunting for data across the organization and more time using it. Additionally, the data sources are more likely to be organized, documented and joined. Thus, by providing a richer context about the entities of interest, the users are better positioned to leverage the data and find actionable insights.

From the data administrator’s side, a single source of truth is preferable, as well. It is easier to document, prevent name collisions across tables, run data quality checks and ensure that the underlying IDs are consistent across the tables. It also is easier to provide flattened, easier-to-work-with views of the key relations and entities that, under the hood, may have come from different sources.

For instance, at WeWork, a global provider of co-working spaces, we provide our analytics users with a core table called the “activity stream,” a single narrow table that provides web page views, office reservations, tour bookings, payments, Zendesk tickets, key card swipes and more. The table is easy for users to work with, such as slicing and dicing different segments of our members or locations, even though the underlying data comes from many heterogeneous systems. Moreover, having this centralized, relatively holistic view of the business means that we also can build more automated tools on top of those data to look for patterns in large numbers of different segments.

In large organizations, there are often historical reasons why data are siloed. For example, large organizations are more likely to acquire data systems through company acquisitions, thereby resulting in additional independent systems. Thus, a single source of truth can represent a large and complex investment. However, in the interim, the central data team or office can still make a big difference by providing official guideposts: listing what’s available, where it is and where there are multiple sources, the best place to get it. Everyone needs to know: “if you need customer orders, use system X or database table Y” and nowhere else.

Data dictionary

Knowing where to get the data, and providing quality data, is only one ingredient. Users need to know what the data fields and metrics mean. You need a data dictionary. This is an aspect that trips up many organizations. When you don’t have a clear list of metrics and their definitions, people make assumptions — ones that may differ from colleagues. Then the arguments ensue.

A business needs to generate a glossary with clear, unambiguous and agreed-upon definitions. This requires discussion with all the key stakeholders and business domain experts. First, you need buy-in to those official definitions; you don’t want teams going rogue with their secret version of a metric. Second, it is often not the core definition where people’s understandings differ but how to handle the edge cases. Thus, while everyone might have a common understanding of what an “orders placed” metric means, they may differ in how they want or expect to handle cancellations, split orders or fraud.

Those scenarios need to be laid out, discussed and resolved. A goal here is to collapse multiple similar metrics into a single common metric, or flesh out situations where you genuinely need to split one metric into two or more separate metrics to capture different perspectives.

For instance, at WeWork, prospective members check out our facilities by signing up for a tour. Importantly, some people may tour different locations, or come back for a second tour to show other members of the organization before signing off on their new space. While our various dashboards had a metric called “tours,” they didn’t align across teams. The process of creating a data dictionary fleshed out two different metrics:

  • “Tours completed-Volume” captures the absolute number of tours taken, which our Community team, who staff such tours, monitor.
  • “Tours completed-People” captures the unique number of people who signed up for a tour. This can then feed into a lead conversion metric, which our sales and marketing teams track.

Specificity in well-chosen names, and unambiguous definitions with examples, are key here. It is better to err toward longer but descriptive names, such as “non_cancelled_orders,” or “Tours Created To Tours Completed Conversion %” than shorter names that users think they understand.

Broad data access

Having clean, high-quality data, from a central source, and with clear metadata, is ineffective if staff can’t access it. Data-driven organizations tend to be very inclusive and provide access wherever the data can help. This doesn’t mean handing over the keys to all the data to all the staff — the CIO would never sign off on that! Instead, it means assessing the needs of individuals, not just the analysts and key decision makers, but across the whole organization, out to the front-line of operations.

For instance, at Warby Parker, a retailer of prescription glasses and sunglasses, associates on the retail shop floor have access to a dashboard that provides details on their performance, as well as that of the store as a whole. At Sprig, a food-delivery company from San Francisco, even the chef has access to an analytics platform that they use to analyze the meals that have been ordered and understand which ingredients and flavors are popular or have not fared well, and so tailor the menu.

A large Fortune 100 financial conglomerate that hires data scientist from The Data Incubator’s fellowship is able to maintain a competitive edge in hiring compared to “sexy” Silicon Valley companies like Google, Facebook and Uber, partially through granting broad access to data for their data science team. And the access doesn’t just stop at data scientists — one of the products our alumni have worked on is building summary dashboards that automatically gives customer service reps a visualization of the interaction history of the customer on the phone.

It is those front-line staff — the customer service agent dealing with an angry customer, or a warehouse worker facing a pallet of damaged product — who can leverage data immediately to determine best next steps. If suitably empowered, they are often also in the best position to resolve a situation, determine changes to workflow or handle a customer complaint.

Data-driven organizations need to foster a culture whereby individuals know what data are available — a good data dictionary and generally seeing data being used in day-to-day decision making helps — and, further, that they feel comfortable requesting access, if they have genuine use case. Red tape should be cut so that while there is still an appropriate approval process and oversight, and systems in place such that access can easily be revoked if necessary, the staff get access without too many hoops to jump through and without too many delays.

Finally, with broader access, and more users of analytical tools, the organization will need to commit to providing training and support. At WeWork, while our data team are available through Slack, email and service desk tickets, we also provide weekly office hours to help users with our business intelligence tools, SQL queries and any other aspects about the data.

Data literacy

In a data-driven organization with broad data access, staff will frequently encounter reports, dashboards and analyses, and they may have a chance to analyze data themselves. To do so effectively, they must be sufficiently data literate.

Data literacy is often a multi-pronged effort. (For an excellent and accessible overview of this topic see this article by Brent Dykes.) At The Data Incubator, we engage clients with employees at a range of different skill levels that require a tailored approach.

One of the most exciting areas is data science training. This covers an introduction to the more advanced and computational data mining and machine learning approaches to extract insights from data, as well as create data products such as recommendation engines and other predictive models. This tends to be focused at the top of the skills pyramid for more advanced users to up their game a notch. One of the quickest data wins for many of our clients simply comes in training people who are half-way to becoming data scientists on the other half.

For example, pharmaceutical and finance clients tend to have legacy statisticians who are well-versed in the statistical aspects of data science but are weaker on the computational front. Many technology companies have an abundance of programming talent that lack in statistical rigor. Training statisticians on programming and programmers on statistics is a great “quick win” that can be extended more broadly.

For those who don’t have such skills, there are plenty of opportunities to increase data literacy across the board. Enterprises have begun to view data literacy training as necessary for everyone, and we’ve seen the demand for “introductory data science for managers” courses double in the last 12 months. The lowest and simplest level is to enhance basic skills in descriptive statistics. These are the basic ways of summarizing data: mean, percentiles, range, standard deviation, etc., and highlighting when they are or are not appropriate give the shape of the underlying data.

For instance, when data are highly skewed, as in house prices or income, the median is the appropriate metric with which to summarize the data, not the mean. Just training people to make fewer assumptions, to plot and examine the data and to use appropriate summary metrics would be a big win.

Another win can come from data visualization skills. Too often, charts are full of chart junk, that is, unnecessary clutter and annotations that detract from the key point. Or, inappropriate chart types are used — such as multiple pie charts each with a large number of segments — or, a color scheme is chosen that makes it near impossible to interpret.

It is a tragedy to spend a huge amount of effort on data collection and analysis, only to fail, and lessen the data’s impact, at the finish line. Just a small amount of data visualization training goes a long way, and can greatly enhance people’s presentation skills and make insights clearer, more digestible and ultimately likely to be used.

At the next level of complexity is inferential statistics. These are the standard, objective statistical tests used to detect, for instance, whether a trend or difference in website traffic between weeks is likely real or whether it is just random variation. The purpose here is not so that a manager or customer service agent can perform these tests but, instead, making them aware of how statistics can be of use, to understand correlation versus causation and appreciate that forecasts always come with uncertainty. For the decision makers and managers, this also can arm them with the skills to push back on shoddy work or where the data don’t support the conclusions.

Decision making

Data can only make an impact if it is actually incorporated in the decision-making process. An organization can have quality, timely and relevant data and skilled analysts who generate masterful reports with carefully crafted and presented insights and recommendations. However, if that report sits unopened on a desk, or unread in an inbox, or the decision maker has already made up his mind what action he or she is going to take, regardless of what the data shows, then it is all for naught.

HiPPO, “highest paid person’s opinion,” a term coined by Avinash Kaushik, is the antithesis of data-drivenness. You all know them. They’re the expert with decades of experience. They don’t care what the data says, especially when it disagrees with their preconceived notions, and they are going to stick to their plan because they know best. And, besides, they’re the boss. As the Financial Times explains:

HiPPOs can be deadly for businesses, because they base their decisions on ill-understood metrics at best, or on pure guesswork. With no intelligent tools to derive meaning from the full spectrum of customer interactions and evaluate the how, when, where and why behind actions, the HiPPO approach can be crippling for businesses.

Too often organizations have a prevailing culture where intuition is valued or there is a lack of accountability. In one survey, just 19 percent of respondents said that decision makers are held accountable for their decisions in their organization. It is in such habitats where HiPPOs thrive.

One way to counteract the HiPPOs is to cultivate a culture of objective experimentation, such as A/B testing. In those scenarios, whether it be a change to website design or marketing messaging, you control for as much as possible, determine the success metrics and required sample sizes, change that one thing and let the experiment run. The key here is to have a clear analysis plan and set out the success metric and any predictions before the experiments run. In other words, the plan prevents HiPPOs from cherry picking results after the fact. That same is true of any pilot program.

Part of the value of broad data literacy training is to allay fears from the perceived threat of big data. Data is not there to bolster (or undermine) existing decisions, but to help inform future ones. It does not threaten the manager’s job — but ignoring it might. By demystifying how data works, data science trainings can increase manager confidence in data and increase data-driven decision making in an enterprise.

Conclusion

Through work at both employers and clients, we’ve come to learn that data-driven culture does not come overnight, but is part of a multi-step process. The first requirement is a clean, single source of data from which analyses can flow. Second, data analysts and data scientists then need to agree on the data dictionary and what the data means. Next, not just data scientists but the entire organization needs to be given broad access to this data to enable the application of collective business expertise in analyzing data. With access to data must come good training to help reinforce data literacy. Finally, all those amazing data analyses must be put into the hands of data-trusting managers to affect decision making.

More TechCrunch

Meta launched its Meta Verified program today along with other features, such as the ability to call large businesses and custom messages.

Meta rolls out Meta Verified for WhatsApp Business users in Brazil, India, Indonesia and Colombia

Last year, during the Q3 2023 earnings call, Mark Zuckerberg talked about leveraging AI to have business accounts respond to customers for purchase and support queries. Today, Meta announced AI-powered…

Meta adds AI-powered features to WhatsApp Business app

TikTok is testing streaks that are similar to Snapchat’s in order to boost engagement, including how long people stay on the app.

TikTok is testing Snapchat-like streaks

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Your usual…

Inside Fisker’s collapse and robotaxis come to more US cities

New York-based Revel has made a lot of pivots since initially launching in 2018 as a dockless e-moped sharing service. The BlackRock-backed startup briefly stepped into the e-bike subscription business.…

Revel to lay off 1,000 staff ride-hail drivers, saying they’d rather be contractors anyway

Google says apps offering AI features will have to prevent the generation of restricted content.

Google Play cracks down on AI apps after circulation of apps for making deepfake nudes

The British retailers association also takes aim at Amazon’s “Buy Box,” claiming that Amazon manipulated which retailers were selected for the coveted placement.

UK retailers file a £1.1B collective action against Amazon over claims of data misuse

Featured Article

Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Rivian has changed 600 parts on its R1S SUV and R1T pickup truck in a bid to drive down manufacturing costs, while improving performance of its flagship vehicles.  The end goal, which will play out over the coming year, is an existential one. Rivian lost about $38,784 on every vehicle…

3 hours ago
Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Twitch has come up with a solution for the ongoing copyright issues that DJs encounter on the platform. The company announced Thursday a new program that enables DJs to stream…

Twitch DJs will now have to pay music labels to play songs in livestreams

Google said today it is partnering with RapidSOS, a platform for emergency first responders, to enable users to contact 911 through RCS (Rich Messaging Service).

Google partners with RapidSOS to enable 911 contact through RCS

Long before product-led growth became a buzzword, Atlassian offered free tiers for virtually all of its productivity and developer tools. Today, that mostly means free access for up to ten…

Atlassian now gives startups a year of free access

Featured Article

A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Artists have finally had enough with Meta’s predatory AI policies, but Meta’s loss is Cara’s gain. An artist-run, anti-AI social platform, Cara has grown from 40,000 to 650,000 users within the last week, catapulting it to the top of the App Store charts. Instagram is a necessity for many artists,…

3 hours ago
A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Google has developed a new AI tool to help marine biologists better understand coral reef ecosystems and their health, which can aid in conversation efforts. The tool, SurfPerch, created with…

Google looks to AI to help save the coral reefs

Only a few years ago, one of the hottest topics in enterprise software was ‘robotic process automation’ (RPA). It doesn’t feel like those services, which tried to automate a lot…

Tektonic AI raises $10M to build GenAI agents for automating business operations

SpaceX achieved a key milestone in its Starship flight test campaign: returning the booster and the upper stage back to Earth.

SpaceX launches mammoth Starship rocket and brings it back for the first time

There’s a lot of buzz about generative AI and what impact it might have on businesses. But look beyond the hype and high-profile deals like the one between OpenAI and…

Sirion, now valued around $1B, acquires Eigen as consolidation comes to enterprise AI tooling

Carlo Kobe and Scott Smith believed so strongly in the need for a debit card product designed specifically for Gen Zers that they dropped out of Harvard and Cornell at…

Kleiner Perkins leads $14.4M seed round into Fizz, a credit-building debit card aimed at Gen Z college students

A new app called MyGlimpact is intended not only to help people understand their environmental footprint, but why they shouldn’t feel guilty about it.

How many Earths does your lifestyle require?

Prolific Machines believes it has a way of transitioning away from molecules to something better: light.

Prolific Machines, with a $55M Series B, shines ‘light’ on a better way to grow lab proteins for food and medicine

It’s been 20 years since Shira Yevin, the lead singer of punk band Shiragirl drove a pink RV into the Vans Warped Tour grounds, the now-defunct punk rock festival notorious…

Punk singer Shira Yevin pushes for fair pay with InPink, a women-focused job marketplace

While the transport industry does use legacy software, many of these platforms are from an earlier era. Qargo hopes its newer technologies can help it leapfrog the competition.

Qargo raises $14M to digitize and decarbonize the trucking industry

When you look at how generative AI is being implemented across developer tools, the focus for the most part has been on generating code, as with Github Copilot. Greptile, an…

Greptile raises $4M to build an AI-fueled code base expert

The models tended to answer questions inconsistently, which reflects biases embedded in the data used to train the models.

Study finds that AI models hold opposing views on controversial topics

A growing number of businesses are embracing data models — abstract models that organize elements of data and standardize how they relate to one another. But as the data analytics…

Cube is building a ‘semantic layer’ for company data

Stock-trading app Robinhood is diving deeper into the cryptocurrency realm with the acquisition of crypto exchange Bitstamp.

Robinhood acquires global crypto exchange Bitstamp for $200M

Torpago’s Powered By product is geared for regional and community banks, with under $20 billion in assets, to launch their own branded cards and spend management programs.

Fintech Torpago has a unique way to compete with Brex and Ramp: turning banks into customers

Over half of Americans wear corrective glasses or contact lenses. While there isn’t a shortage of low-cost and luxury frames available online or in stores, consumers can only buy them…

Eyebot raised $6M for AI-powered kiosks that provide 90-second vision exams without an optometrist

Google on Thursday said it is rolling out NotebookLM, its AI-powered note-taking assistant, to over 200 new countries, nearly six months after opening its access in the U.S. The platform,…

Google’s updated AI-powered NotebookLM expands to India, UK and over 200 other countries

Inflation and currency devaluation have always been a growing concern for Africans with bank accounts.

Starting in war-torn Sudan, YC-backed Elevate now provides fintech to freelancers globally

Featured Article

Amazon buys Indian video streaming service MX Player

Amazon has agreed to acquire key assets of Indian video streaming service MX Player from the local media powerhouse Times Internet, the latest step by the e-commerce giant to make its services and brand popular in smaller cities and towns in the key overseas market.  The two firms reached a…

10 hours ago
Amazon buys Indian video streaming service MX Player