Our take on data mesh garnered such a response last year that we knew the topic merited its own outlook in 2022.
According to Google Trends, “data mesh” was one of the topics that broke the internet in 2021 — even more so than “data lakehouse.” However, it’s a topic that addresses a point of pain: We dump all sorts of data into data lakes or other silos, then we lose track of them or don’t adequately utilize and govern them.
After a couple years of incubation, we now expect data meshes will draw their first serious scrutiny.
Data mesh is an idea that, depending on who you speak with, was originated by Mark Beyer at Gartner or Zhamak Dehghani at Thoughtworks. For the record, they both used the same term, and they both address the disconnect that occurs when you accumulate huge troves of data — and then try to figure out who owns it and how it should be accessed and governed. But that’s about all they have in common.
Gartner’s concept is more about patterning the organization of metadata off principles that are akin to physical mesh networks. Borrowing inspiration from Metcalfe’s Law, as the number of metadata “nodes” in a data mesh proliferate, the more fully formed the metadata becomes (there could be some form of AI self-learning involved). With the Gartner research stuck behind a paywall, it shouldn’t be surprising that the concept developed at Thoughtworks took over the conversation. It’s premised on self-organizing domains staking lifecycle approaches to treating data as products, taking ownership of everything from the data pipelines to governance and security. In so doing, teams think more broadly about their data beyond simply building pipelines or organizing data sets.
Data meshes address a number of valid concerns about the limitations of top-down management or ownership of data. But at present, as concept, data meshes are not yet fully fleshed out, especially where it comes to self-service or federated governance. The going notion of data meshes is that the domains with the appropriate subject matter expertise should be the ones who own the data and manage it from cradle to grave. It’s a bottom-up approach to data management and governance that should theoretically improve accountability. The downside is that, not properly managed, data meshes could amplify or proliferate data silos, leading to waste, duplication, and inconsistent management and governance.
We don’t believe data mesh is sufficiently defined to work cross-enterprise, but we do think that data meshes could prove effective when implemented at a more modest scale. Specifically, when they are implemented across teams that already share a common context that may stem from histories of collaboration and/or from having shared, adjacent, or overlapping subject matter expertise. In an enterprise, we could foresee groups of data meshes emerging around focused disciplines, such as customer experience, supply chain management, product development, and so on.
Until now, the body of work published on data meshes has been generally positive, and we expect to see vendors across the data space “data mesh wash” their products in 2022. We’re talking about databases, BI, governance, ELT/data transformation, data cataloging, query federation, and information lifecycle management. Vendors will put out marketing messages to show how their offerings can support teams that are building data meshes. Yes, there will even be a virtual conference happening sooner than you think.
But keep in mind that data mesh is a process and architectural approach that delegates responsibility for specific data sets to the “domains” that have the requisite subject matter expertise. Data mesh is not a technology. Hopefully, vendors will not jump the shark and position their offerings as data meshproducts.
Our sense of impending backlash stems from the numerous private messages we received to our LinkedIn post that provide a teaser to what was published here. The crux of those messages was that data meshes could exacerbate data silo issues that already exist in most enterprises. We believe that is a very valid concern.
Even if data meshes as concept were fully fleshed out and bulletproof, a sign that the idea is being taken seriously is by the degree of public scrutiny. And so, the fact that a backlash occurs is actually a reflection of the degree that data meshes have hit a real point of pain.
But there is also another kicker: data meshes have often been contrasted to data fabrics. Data fabrics are designed to promote access to data across logical and physical stores, so we believe that contrasting data meshes to data fabrics is a false dichotomy.
Hold that thought.
A challenge is that the definition of data fabric is pretty hazy. Try this one from NetApp: “A data fabric is, at its heart, an integrated data architecture that’s adaptive, flexible, and secure. In many ways, a data fabric is a new strategic approach to your enterprise storage operation, one that unlocks the best of cloud, core, and edge.” Is that fuzzy enough for you? For our purposes, we’ll just state that a data fabric starts with a common metadata backplane, so when different teams describe their data products, they are all speaking from a common sheet of music.
Here’s one more prediction highlighting that data meshes and data fabrics actually have synergy: We expect that common metadata backplanes will become a sleeper issue this year, responding to the need to make sense of all the data — especially as it accumulates in the cloud.
You might not need a data mesh to start building a data fabric. But if you are considering kicking off a data mesh initiative, don’t even think of getting going without some form of data fabric.
This is the second part of our Data Outlook for 2022. Click here for part one, where we provide our take on real-time streaming convergence, machine learning, and data management.