More musings on data gravity, platform play, and the growing role of data lakes and cloud providers in cybersecurity
Addressing a few questions about data gravity and cybersecurity: role of cloud providers, data gravity and software security, AI, multi-cloud, and making a case for emergence of a security data layer
Welcome to Venture in Security! Before we begin, do me a favor and make sure you hit the “Subscribe” button. Subscriptions let me know that you care and keep me motivated to write more. Thanks folks!
A brief introduction & recap of the data gravity concept
I have recently published a deep dive titled “Game of Thrones in cybersecurity: data gravity, industry consolidation, platform play, private equity, and the great cyber gold rush”. One of the topics explored in that deep dive was the concept of data gravity.
As I explained at the time, “To be able to run thought experiments and think about the future of cybersecurity, it is critical to understand the concept of data gravity. The idea was first introduced in 2010 by Dave McCrory, a software engineer who observed that as more and more data is gathered in one place, it “builds mass”. That mass attracts services and applications, and the larger the amount of data, the greater its gravitational pull, meaning the more services and applications will be attracted to it and the more quickly that will happen.
Data gravity leads to the tectonic shift in cybersecurity: security data is moving to Snowflake, BigQuery, Microsoft Azure Data Warehouse, Amazon Redshift, and the like. As the amount of data increases in size, moving it around to various applications becomes hard and costly. Snowflake, Google, Amazon, and Microsoft understand their advantage incredibly well and are taking action to fully leverage it. As it relates to cybersecurity, they typically do it in the following ways:
By offering their own security services and applications, and
By establishing marketplaces and selling security services and applications from other providers.” - Source: Venture in Security
In this piece, I will address some of the follow-up questions that came up after publishing that piece, namely:
Why would cloud companies want to compete with point solutions instead of simply integrating?
What does multi-cloud mean for data gravity and platform play?
Are data gravity and platform relevant outside of operational security?
If nothing changes in the industry, what will security of the future look like?
“Land and expand” play of cloud providers
Cloud providers and large data companies are pursuing a “land and expand” strategy: get the customer to adopt the product for one use case, and once the data is in - start offering other products and services.
Platform companies grow their ecosystem with one or more of the three reasons in mind:
Acquisition - make it very easy for potential customers to integrate with their existing and preferred tech stack so that they are more likely to adopt the platform.
Retention - get the customer super deep into the platform’s ecosystem so that the cost of switching to another platform is prohibitively high.
Monetization - generate revenue from platform usage.
We are still early enough in the shift to the cloud, and all major cloud providers are fighting for being chosen as the number one solution (increasingly, customers are choosing diversification and multiple cloud providers, but I will cover that in a separate section). Because the fight for the cloud market share is still being waged, the focus of cloud providers is predominantly on the first two - acquisition and retention. We see that platform players typically start by monetizing their data, and then solving the adjacent problems that are most commonly encountered by its users and that have the potential to get the customer much deeper into the platform’s ecosystem and increase their switching costs.
This is a tricky balancing act as the platform
Must have pre-built integrations with the most commonly used tools and solutions used in the industry.
Must support interoperability. The platform company has to make it easy for people to keep using the tools they prefer, even if they are competitive and the platform company offers its own solution to the same problem.
For Google and Microsoft, picking what products to integrate into their platforms is much easier than for AWS. This is because the vast majority of solutions are built on or integrate with AWS, and the pool of those on Azure or GCP (or that integrate with them) is smaller. The part about interoperability and keeping the ecosystem open is trickier because of the inherent conflict of interest. As an example, let’s look at Microsoft: while it supports different identity and access management providers, it would like its customers to use Microsoft Entra. The moment it pushes customers to use its own solutions is the moment when they will leave as interoperability and choice are critical to any cloud provider’s value offering.
For as long as there is aggressive competition for the cloud market share, the main area of focus for major cloud providers is horizontal expansion (acquiring new customers and getting them as deep as possible into their ecosystem). As soon as the temps of public cloud adoption will slow down (and we know they will), cloud providers will be faced with the need to sustain the same revenue growth. This is when the vertical expansion will start - looking for ways to monetize the data of their existing customers will likely be the natural next step. It is during this “monetization” part that’s when I expect that cloud providers will attempt to do more and more security use cases in-house.
As cloud offerings become more commoditized, cloud providers might start looking at doing more security as a way to differentiate. Granted, there are other areas they could be looking at first, such as data analytics, infrastructure automation, and the like, - use cases where they can add a lot of value without increasing their liability. However, as soon as one of the cloud providers starts bundling more security with their cloud offerings, others will have to follow suit as few customers will be willing to choose a “less secure” public cloud.
I anticipate that in the future, more and more of the base layers of security will be provided by the cloud vendors, while technical security talent will be needed to act as a layer beyond this basic protection. At first, bundling security with cloud offerings will be seen as a differentiator, but slowly, it is likely to transform into a baseline expectation.
As of today, protecting the cloud & what’s in the cloud is very different. I am very curious to see how the “shared responsibility” for cloud security evolves in the next decade (we have already seen how Google is moving towards the “shared fate” model). Cybersecurity will always require collaboration and interoperability, and therefore platform companies have to be very careful when designing their marketplace experiences and making acquisitions so that they don’t alienate their other partners. I have no doubts that Microsoft, Google, and Amazon will continue to strengthen their in-house security offerings and look for ways to make their security solutions more attractive to their customers looking for one-stop solutions.
Another interesting question concerns marketplaces run by cloud providers that are transforming channel sales. I have previously published a dedicated deep dive about the future of the channel, which briefly mentions how cloud providers fit into this picture.
Multi-cloud and its impact on cybersecurity
One of the questions that keep coming up is how multi-cloud impacts the ability of cloud providers to offer security capabilities.
The trend of multi-cloud is still relatively new, so it’s hard to definitively say what impact it will have. However, there are several observations worth sharing:
As we are seeing the commoditization of the cloud, it is natural to expect that cloud providers will be forced to start bundling more and more value in order to remain competitive. Security is one of the toughest problems for enterprises to tackle, so anyone that can ease the lives of security teams will immediately turn them into vocal champions. Cloud providers and AI will have the unique ability to make sense of large volumes of data, hence it is logical that they would use this to their advantage.
When different cloud providers start offering base levels of security as a part of their services, this won’t outright make all point solutions redundant. Instead, it will raise a bar for new entrants who will have to find ways to add value beyond basic protection. In many ways, this will be similar to the effect antivirus had on the industry: antivirus is seen as a base layer of security today, and there are only a few AV vendors left. The fact that companies are adopting multiple cloud providers, does not change this trend.
There is little doubt that both cybersecurity and most importantly - the entire IT infrastructure - are going to become more and more complex. This complexity will make it easier for cloud providers to increase both the switching costs and the costs of running multiple cloud providers in parallel.
If we squint a bit, we can start to draw a parallel between two unrelated but similar trends: multi-cloud VS single-cloud, and best-of-breed VS best-of-suite cybersecurity strategies.
In the case of multi-cloud and best-of-breed cybersecurity products, companies must pay overhead premiums - extra costs to configure and keep running multiple tools. On the positive side - there is a level of duplication, a feeling of relative safety that comes from not having to rely on a single vendor, and higher bargaining power when it comes to pricing negotiations, so the cost savings might be substantial. Adopting a single cloud provider and similarly adopting a best-of-suite security platform increases risks and vendor lock-in (“all the eggs in one basket”) and reduces the ability to negotiate prices. The benefits of taking this approach, on the other hand, come from simplicity and cost savings from maintaining several relationships, etc. It is true that one cannot put an equal sign between these two problem spaces as fundamentally in the case of multi-cloud, companies are choosing redundant capabilities, while security products are additive, and not redundant, whether a company takes a best-of-breed or best-of-suite approach. However, there are some similarities, particularly around customer segmentation.
Most mid-market companies will likely stay with a single cloud provider, and be satisfied with the basic security offered by the cloud providers. However, it is likely that at the high end of the market, customers will continue to hire highly technical security teams, implement best-of-breed security tooling, and operate in multi-cloud environments. Large customers generate insane amounts of data that needs unification, analytics, and security but don’t always fit in one cloud warehouse.
As IT infrastructure gets more and more complex, the cost of running two or more public clouds in addition to on-prem can become prohibitively high for most organizations. It remains to be seen how the future of multi-cloud is going to unfold, but whatever happens, I don't see it changing the fact that cloud providers have the ability to up their game and start doing more of what is known today as cybersecurity.
ChatGPT and AI: the not-so-new “new kids on the block”
Artificial intelligence has long been seen as one of the potential disruptors of the cybersecurity space, but until the release of ChatGPT, few people imagined the extent to which it can change the future of the industry.
To build accurate models, artificial intelligence engines need vast amounts of data. Most importantly, the data
a) cannot be just generic - at least a part of it must be from the specific organization’s environment as every organization is unique,
b) must be representative of both good and bad behavior so that it can learn to distinguish between the two
Conceptually, the best models would be built if there was a way to apply a two-step process. First, look at immense amounts of data aggregated from different organizations, and build models that are fairly generic, yet accurate. Then, calibrate the model for a specific organization to see what is normal for this specific environment so that the number of false positives and false negatives can be reduced to a minimum. Lastly, have the ability for security teams to create their own detection logic and apply it over the base layers of AI-generated protection. Taking this approach could potentially help reduce the number of false positives as decisions about what is good and what isn’t would factor in the business context that is currently being ignored when vendors push the same detection logic to all customers with no regard for what is unique about each of them.
As of today, most organizations lack the technical talent required to implement AI in their environment. However, the barriers to adoption will continue to go down. Most importantly, the number of companies that will be trying to leverage recent advances in AI in cybersecurity almost certainly means that a few years from now, most organizations one way or another will have AI in their security stack. What is much more ambiguous is what that evolution will look like, and who will win in this game. The question that I find rather interesting is how AI and cloud fit together as their co-existence, mutual reinforcement, and potentially - merger is one of the most critical factors shaping innovation.
Any AI engine needs computing power, and the more data it processes, the more complex the algorithms are, and the more capacity it needs. The ChatGPT, as an example, uses Microsoft Azure to do what it does, and so do the AI engines of some of the leading cybersecurity companies. As demand for AI continues to grow, so will the need for cheap computing capabilities to collect more and more data, and make sense of this data.
On the other hand, cloud providers - Amazon’s AWS, Microsoft’s Azure, and Google’s GCP all offer cloud AI developer services. These tools make it easy for developers (and increasingly - people with no background in engineering) to build their own AI models. AI is now increasingly being used to manage IT operations - a new discipline called AIOps. Gartner explains that AIOps “combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination”. There is a big room for applying AI in cloud environments to enable it to learn from the data it stores, make predictions, and the like.
Cloud computing and artificial intelligence appear to be increasingly merging. We are at the earliest days of the AI revolution, and it is clear that the world is not going back to where it was before, similar to how the invention of the Internet, the impact of the iPhone on the market, and the rise of the cloud cannot be reversed. It is, however, unclear who will win this game: stand-alone AI companies like ChatGPT, cloud providers like Amazon and Microsoft, or someone else. When it comes to security, I think that whoever has access to most data will have the advantage of being able to build better models, and with that - produce better outcomes. It remains to be seen how this competition is going to play out, but it is apparent that cloud providers will play an important role.
Data gravity and platform play in software security
While the concept of data gravity can be easily seen when we analyze the trends surrounding cloud providers, they are not the only ones that benefit from it. Another great example is platforms that attract a different type of data - software code.
As more and more businesses are becoming software businesses, security now extends far beyond IT infrastructure. An important area of concern is software security. In a development organization, it’s not possible to effectively implement a security policy that only allows users to run “approved software” because developers run code from the internet (such as different libraries) all the time; it’s part of the job. It’s equally complex when it comes to the software supply chain as saying that “all packages must be signed by one of the vendors on the list” won’t work either. All software has a long list of dependencies, that in turn have their own dependencies, and so on. This great image perfectly illustrates the irony of the modern digital infrastructure.
Source: explainxkcd
Naturally, there is an entire ecosystem of tooling built to secure the code. Here is a brief look at the leaders in this space in the context of data gravity and platform play.
GitHub: Microsoft’s play in software security
In 2018, Microsoft acquired GitHub which at that time was primarily a code repository. Five years later, GitHub is much more than that, offering a suite of CI/CD tools and security solutions among others.
GitHub is a prime example of the data gravity in action: the code attracts other products and services people working on the code need. The “DevSecOps with GitHub Security” diagram on Microsoft’s website features an example of what being a part of the Microsoft ecosystem means. GitHub brings a wide variety of security tooling in one place: code scanning to identify vulnerabilities and coding errors, Dependabot alerts when a customer's repository uses a vulnerable dependency or malware, dependency reviews, and the like. At the end of March, GitHub added the SBOM generation tool for cloud repositories. Although it isn’t materially different from all the other SBOM vendors, having SBOM generation easily accessible within GitHub is convenient.
Microsoft sees itself as the one-stop shop for building a product and deploying secure code.
When talking about Microsoft, it is important to emphasize that Microsoft looks very different for developers than it does for enterprises; not just in terms of the ways it does business but also in terms of the mindset that guides its actions. The company has always seen itself as a developer-first organization (remember the famous “developers, developers, developers…” video of Steve Ballmer). It’s a very smart move as by attracting developers, giving them the ability to build, and being vendor-agnostic, Microsoft encouraged developers to produce products for its ecosystem and therefore expand the reach of its products. By capturing developer needs, and continuously growing its ecosystem, Microsoft created a pipeline that feeds its revenue-generating products such as Azure.
Snyk: GitHub’s biggest competitor
Snyk, GitHub’s biggest competitor, has also been building a developer-focused platform, but unlike GitHub, its entire focus is security. Snyk started as a static application security testing (SAST) vendor in the space that used to be dominated by players such as Synopsys. The older generation of vendors built their SAST tools with the view that applications were released yearly, quarterly, monthly, weekly, or in the best case - daily. This made sense before the rise of cloud, SaaS, and CI/CD, but not for newly emerged Saas providers that would commonly push many updates daily and even hourly.
Snyk came along to say “we’ll build a modern tool for CI/CD, integrate with all the modern tools like Jenkins, CircleCI, GitHub Actions, and the like, and all of this will be geared towards companies releasing often”. This enabled the so-called “shift left”: while traditionally security scanning happened later in the process, and a member of the security team would review alerts and get them to developers to fix, the “shift left” methodology pushes it all to developers and doesn’t let them check in the code that doesn’t pass the scan. This approach eliminates a lot of unnecessary labor from the security team.
To assemble a platform, Snyk has not limited itself to its ability to build products and has been on a buying spree over the past few years, acquiring TopCoat Data, Manifold.co, DeepCode, Fugue, CloudSkiff, and FossID. To expand its reach, it has also bought DevSecCon, the world’s premier conference dedicated to DevSecOps.
Other players & observations about the space
This picture of the space is obviously not exhaustive although the fight between Snyk and Microsoft’s GitHub is real. JFrog, which started as an artifact repository and now does supply chain security, CI/CD, and more, is another player but there are many, many more. In particular, there are a ton of small companies doing scanning engines. Some form of consolidation is almost inevitable as smaller players will be struggling to get enough customers, and companies are looking to reduce the number of vendors they rely on. Additionally, as teams get compressed due to budget contraction, building tools in-house will get expensive, so the suites of solutions have a great chance to win in the battle of best-of-breed vs best-of-suite.
The data gravity effect and platform play are not limited to cloud providers; what we are seeing in software security is unfolding in a very analogous fashion, centered around code ownership. GitHub is trying to do it from the code and tooling side, JFrog is trying to do it from the repo side (adding security tooling around deployment), and other big players such as Snyk and Synopsys are building and buying smaller players to assemble platforms around tooling, deployment, and supply chain security.
It is important to call out that simply because the data gravity effect is real, it does not automatically mean that anyone who has the data will be a successful security provider. A case in point is Atlassian - a company that once could have become a leader in application security but didn’t. Most development teams use Jira, and many also adopted Atlassian’s Bitbucket - a source code management (SCM) software. There was a point when it would seem reasonable that Atlassian would get into the security game the way JFrog has, but it didn’t. I would guess that they have a lot of legacy products & code that are pretty expensive to keep running, but whatever the reasons - Atlassian missed its chance to become a visible security player.
The case for security data layer
The concept of data gravity isn’t the only factor that drives the need to reshape the way security is delivered. Let’s put it aside for a second, and try to imagine what the future of security will look like if nothing changes.
Today, an average enterprise security team has anywhere between 25 and 100 security tools. While some estimates are more credible than others, the point is - it’s a lot of products. The vast majority of these tools collect data, apply some proprietary logic, and generate their insights, often in a format different from all others. As technology progresses and the number of attack vectors grows, there will be more and more point solutions created in the coming decade, each with its own value proposition: to safeguard no-code tools, to prevent leaks of intellectual property, etc. When taken in isolation, each of them addresses a legitimate problem, but when we bring all of that together, the picture starts to look very differently. What are the chances that in 2033, an average security team will need to rely on 200 disjoint point solutions all doing their own thing and working in isolation from other components of the security stack? Or, maybe, 300? If I were to take a bet, I would say the chances are close to zero.
I think we are witnessing an emergence of a new model where all the data comes into one place, be it Snowflake, Azure, or any of the large cloud providers. Then, individual security tools plug into this data layer, address the use cases they are designed to address and feed the data they generated back into the same place. Frank Wang puts it well in his blog: "I personally believe that the adoption of the modern data stack mentality in security might be one of the biggest cybersecurity industry disruptors in the upcoming decade. Security tools will no longer be in silos but built on top of these data warehouses and create better feedback loops for analytics."
The emergence of the data layer transforms previously siloed security products into add-ons that can be explored, enabled, and used through the marketplace, in a way that makes it easy and quick to access the capabilities security teams need. A variation of the marketplace or app store model has proven itself in other industries - think of AWS, Google Cloud, Shopify, and as of recently - GhatGPT. Most of the cybersecurity startups today are features (or as Andrew Plato calls them, “froducts”) rather than comprehensive, well-defined products, and they are well-suited as add-ons on large marketplaces. Philosophically, there are many similarities between someone building a Shopify plugin to solve a small use case for some niche group of customers, and a security startup building a company around a small feature of “securing X”.
Security is changing, and if there is one thing we know about the future, it’s that it will be radically different from what we see today. Although many of the ideas around data gravity, the effects it has on the industry, and the opportunities it creates, no doubt feel futuristic and far-fetched, I would argue so does the assumption that we can change nothing and continue to add tens, and even hundreds separate “products” to the security stack every few years. Changing the approach to security is therefore an inevitability.
Closing thoughts
Newton's law of gravity states that the attraction between two masses is directly proportional to the product of their masses and inversely proportional to the square of the distance between their centers. This law greatly applies to data in the cloud. Inertia - the tendency for people and companies to stick with what they know - over time has the ability to limit a cloud's freedom of movement. The more data is accumulated by the cloud provider, the more other data it attracts, and subsequently - the more products and services will be introduced to bring even more data. This cycle is continuous, and it leads to an inevitable deepening of the relationship between cloud users and cloud providers, making it harder and harder for companies to switch. Most cloud decisions aren’t a result of deliberate, complex strategies; they are an accumulation of shortcuts and companies doing what’s easiest at that specific point in time. Although many of the ideas surrounding data gravity may sound a bit futuristic, I think whoever owns and can make sense of the vast amount of data, will have the ability to shape the future of cybersecurity.
The effects of data gravity such as data attracting products and services and subsequently attracting even more data, can be seen far outside of the cloud space. For instance, code attracts products and services that rely on it and by doing that, shapes the future of platforms like GitHub and Snyk.
We are at the early stages of many trends - the rise of AI and ML, the growing importance of data gravity, and the evolving role of cloud providers, to name a few. While no one knows for sure how the story is going to unfold from here, we can be sure that it will be interesting. My conviction is that we will see security platforms emerging around data, a trend driven by the data gravity effect. Frank from Frankly Speaking puts it well: the next generation of 10B+ security companies will be leveraging network, code, identity, and other security-related data and building comprehensive, holistic platforms that solve big problems.
Great post. I don’t believe cloud providers are rushing to provide full security platforms to their customers. They give you building blocks to get one going. Multi cloud is another reason why I don’t think a single CSP will be able to dominate in the security space.