We are excited to announce that Premji Invest is leading Cohesity's Series F fundraise, which includes a definitive agreement under which Cohesity intends to combine with Veritas' data protection business, which will be carved out of Veritas, creating the world's largest data security & management vendor by any measure. This transaction is expected to close by end of 2024, valuing the combined entity at $7B of enterprise value. For fiscal year ended July 2023, the pro forma business will generate$1.6B in top-line, $1.3B of ARR, and 28% adjusted cash EBITDA margins while still growing meaningfully.
We believe the world needs a next-gen cloud data protection tool, which is exactly where Cohesity slots in. Palo Alto owns firewall, Crowdstrike owns endpoint, Z-scaler owns network security and SASE, Okta owns single sign-on / authentication, but there is still an unmet need for a scaled data security vendor. The reality is, Veritas played this role in the public markets for a number of years until it was acquired by Carlyle. Since then, the public markets have not had a way to participate in this trend, namely, for a quality, pure-play vendor in data protection and management. Given the rise in ransomware attacks over the last several years, data protection is more important than ever for CISOs as the last line of defense. As such, we're incredibly excited to see Cohesity accelerate its vision of building the world's biggest data protection &management business by acquiring Veritas’ data protection business.
Cohesity has a massive AI opportunity ahead. Let's zoom out for a moment -- after closing its acquisition of Veritas’ data protection business, Cohesity will be managing 100s of exabytes of data across 1000s of customers in the Fortune 500 and Global 2000after close. Think about what AI can do. Secondary data can be used to conduct enterprise search, train enterprise-specific AI models, and run all types of analytics. Typically, most enterprises would never classify, index, or track the vast amounts of "dark data" that accumulates in secondary storage. With Cohesity, enterprises get a comprehensive view of their data overtime (across on-prem, cloud, and edge), enabling them to index and search decades-old data, instantly, and have a time series view of the changes. We believe this will be indispensable to organizations who want AI-ready data to build next-gen enterprise models.
Cohesity exemplifies the rare types of businesses we strive to partner with in growth equity: market leaders that can compound growth meaningfully over a 5-10 year horizon. The rest of Cohesity's competitors are either subscale, SMB/midmarket-focused, offer only on-prem or cloud products, or have inferior technical capabilities. We believe the pro forma Cohesity business will be a best-in-class, enterprise-focused, n of 1 data protection and management vendor that can service all of an enterprise's data needs. We are incredibly excited about the journey ahead!
Cohesity and Veritas M&A Rationale
We believe a business combination of Cohesity and Veritas’ data protection business strengthens Cohesity's competitive positioning. The combined entity penetrates 96% of the Fortune 100 and 80% of the Global 500. It also possesses a global go-to-market footprint, including a strong partner ecosystem across all segments of CSPs, security players, VARs, SIs, and hardware OEMs. This transaction brings the best of both worlds together: Cohesity's next-gen hyperconverged scale-out architecture with Veritas' global presence and installed base with nearly all Fortune 100 companies.
In on-prem backup &security, Veritas is considered the gold standard, protecting more workloads than any other vendor (roughly the scale of Dell and Commvault combined).Cohesity's core business is a cloud-first data security business focused on protecting data as the last line of defense in the event of a cyberattack. Even if cybercriminals penetrate a company's firewall, network, endpoint, and application security systems, Cohesity prevents bad actors from compromising an enterprise's underlying data. Strategically, the combined business would be able to virtually serve all of an enterprise's data protection use cases, across on-prem and cloud/SaaS workloads, creating a one-stop shop. The future is one platform built on Cohesity.
While Veritas might be considered a legacy tool, the reality is that no one leaves Veritas: the Company is too deeply integrated into enterprises' storage infrastructure to ever be displaced. That said, we think the bigger opportunity exists in cloud over the next decade, which positions Cohesity incredibly well for growth. As such, we believe there are material strategic benefits in bringing two of the leaders in their respective categories together. Firstly, Cohesity can pioneer the future of enterprise search and data analytics, by expanding its scope of data under management to all of the on-prem workloads it does not protect today. Secondly, Cohesity can leverage Veritas' existing reach to cross-sell Cohesity data protection solutions into virtually all of the global 500ecosystem. Finally, the combined company will also work to deliver an integrated solution combining the best-of-breed technologies across the two companies.
Evolution of Secondary Storage Industry
The backup industry started decades ago, and secondary storage served one primary purpose: as an insurance policy for enterprises in the event of a natural disaster or a data center burning down. Vendors such as Veritas, Dell, Commvault, and IBM dominated this industry for years, providing physical storage solutions that lived within the four walls of an enterprise’s data center footprint.
Storage has a century-long history of slow and steady innovations from tape and cassettes to SSD and NAS. As the exponential growth of data marched forward, systems for managing storage ran into bottlenecks and solutions such as tiered storage, file & object storage, and deduplication popped up. The end result of this progression is a cobbled together suite of complex solutions that are siloed across environments and use cases, leading to massive data fragmentation. As storage requirements grow and multi-cloud and VMs inject complexity, enterprises are looking for ways to wrangle and operationalize data at scale while reducing costs.
For context, data falls into two major buckets: primary storage and secondary storage. Primary storage is utilized for mission-critical applications that need to access storage frequently, and includes both volatile (RAM) and non-volatile storage media (SSDs).“Hot” data often resides in Tier 0 or Tier 1 primary storage for live analytics and workloads. In primary storage, file systems are built to utilize storage resources that has best I/O (read/write performance). Secondary storage, which accounts for ~80% of enterprise data, is used to store data that can be accessed less frequently. Much of this "dark data," which includes highly confidential and sensitive documents (emails, employee records, billing data & financials, etc.) is sitting idle in storage media. This has created a ripe opportunity for next-gen data security vendors like Cohesity to manage, protect, and mine insights from the ever-growing pool of secondary data.
Given Veritas’ legacy, the Company’s strengths lie in its ability to support a number of older filesystems and protect more on-prem workloads than any of its competitors. Enterprises who purchased Veritas had two choices: they could either purchase Veritas appliances (these were physical servers, with a set amount of disk/storage capacity, and pre-installed software) or purchase the Veritas software and install it on servers provided by different vendors. The rise of cloud computing allowed for new entrants such as Cohesity an opportunity to enter the space on the strengths of a modern architecture that took advantage of what cloud computing offers natively. Legacy vendors such as Veritas see lower performance at scale due to architectural limitations especially while trying to leverage cloud computing.
Cohesity came to the scene in 2013 and provided enterprises exactly what they needed: performance/scalability, security, TCO reduction, ease of use, and cloud workload protection. Cohesity’s truly web-scale platform solves the limitations of scale, therefore positioning them to take share from legacy data protection and backup players like Commvault, Veeam, and Veritas. This alone is a $30B TAM and known to be incredibly sticky as once storage systems are deployed companies are slow to replace them.
Cohesity Standalone Thesis
Our thesis for investing in Cohesity's standalone business is threefold. Firstly, we believe Mohit Aron, who is the chief architect of the technology underpinning Cohesity, is a visionary entrepreneur that built a business with meaningful IP. When we first made the investment in Cohesity, the team conducted deep technical diligence and discovered that Cohesity built a proprietary file system (details in Cohesity's Architecture section below) designed specifically for secondary storage systems, enabling enterprises to quickly and reliably backup data regardless of where it sat: on-prem vs. cloud, physical servers vs. VMs, and traditional and containerized applications. In short, Cohesity's highly scalable hyperconverged infrastructure (HCI) & file system extensibility offer durable technology moats.
Secondly, Cohesity is extremely sticky: once Cohesity penetrates an enterprise account, customers almost never churn. This is evidenced by Cohesity's impressive mid-90s gross revenue retention metric and 80+ NPS score. All the customers we've surveyed explain that securing backup data is mission-critical, given it is the last line of defense against cyber criminals. Cohesity's blue chip customer base includes~42% of the Fortune 100, and the partnership with Veritas’ data protection business will only accelerate their penetration into large enterprise accounts. Cohesity is already the 6th largest player overall in secondary data undermanagement today, managing ~1.5 EB, but the largest player among the next-gen HCI solutions.
Thirdly, we believe Cohesity is levered to massive market tailwinds:
Growth in secondary data continues to expand at unprecedented rates, which serves as a tailwind to Cohesity's pricing model, which is based on total data under management.
As ransomware attacks are becoming more widespread, CISOs are increasingly focused on protecting their data assets, making data security vendors more important than ever.
We believe there is a large opportunity in AI for Cohesity.
Cohesity's AI Moment
The vast amounts of secondary storage data Cohesity is sitting on can be used to train enterprise models. Historically, it was difficult to establish a consolidated view of all the information sitting inside an enterprise because silos of data would be sprawled across different data centers for distinct operational workflows. This "data sprawl" made it exceedingly difficult for organizations to glean insights on their underlying data. Enterprises would need to make copies of data, transport the data between data centers, and stitch everything together.
With Cohesity, secondary data can be used to conduct enterprise search, train enterprise-specific AI models, retrieve file-level information such as file-type and user access history, access time-series data on various KPIs (storage utilization, application-level and file-level data trends), perform anomaly detection and log correlation to identify potential threats, and more. Utilizing NLP, users can easily query the Cohesity engine for any type of request, and with roles-based access control, organizations can create custom data access rules for users. Additionally, Cohesity can intelligently summarize storage snapshots, which are effectively a table of contents for data -- today, this happens manually. Cohesity provides a comprehensive view of data over time, enabling enterprises to index and search decades-old data, instantly, and see different variations of it during different periods. We believe this will be indispensable to organizations who want AI-ready data to build robust models.
Cohesity's Technical Architecture & Advantages
This dovetails nicely into understanding Cohesity's architecture, which consists of three major layers: the physical layer, the SpanFS file system, and the Application Layer.
Physical Layer: Cohesity's physical layer consists of low-cost, high-performance commodity hardware. The hardware consists of nodes, each of which has its own compute (CPU and RAM) and storage (hard disk and flash storage). For those who want better performance, they can purchase nodes with faster compute and larger storage capacity. Multiple nodes form clusters, which are interconnected using Ethernet cables. This enables easy scale-out -capacity can be increased simply by adding additional nodes.
SpanFS File System: Cohesity's patented file system, SpanFS, is at the heart of the business. SpanFS is a new file system designed specifically for data management workloads, offering high read/write throughput and quick response times. Think of this as the "brain," and it's singular goal is to intelligently allocate resources to the highest priority tasks at any given time, optimizing efficiency. For instance, if data ingestion is utilizing all resources, SpanFS will ensure all disks are utilized with data distributed equally. During data ingestion, the IO engine can detect random vs. sequential IO data (e.g. audio would be sequential), direct data to appropriate storage target (HDD, SSD, and cloud), dedupe data, compress colder data further once data ingestion is complete, dynamically down-tier cold data from SSD into hard drives (less expensive, but slower read time), and so forth. Cohesity built its own metadata store to manage all the data distributed across its nodes. Cohesity also patented SnapTree, allowing customers to take unlimited snapshots(or the state of a system at a point in time), without overloading the system.
Cohesity Application Layer: Cohesity offers a number of applications that sit on top of its file system, the most important of which are DataProtect, FortKnox, and DataHawk.
DataProtect: secures enterprise data across physical servers and VMs, traditional and containerized applications, databases, and SaaS workloads. Cohesity uses data encryption, MFA, RBAC (roles-based access control) to prevent bad actors from deleting data. Above all, Cohesity delivers on extremely fast recovery times (RTOs).
FortKnox: traditionally, enterprises would air gap data by storing copies of data on physical tape in an offsite environment. The downside is slow recovery times. FortKnox balances security with speed by offering a copy of data in a Cohesity-managed cloud vault, through physical separation & network isolation.
DataHawk: utilizes AI/ML to detect user and data anomalies, indicating an emerging attack, ensures recovery data is free of malware, and classifies sensitive data (like PII) in the event of an attack.
Cohesity’s long-term vision is to protect all enterprise data workloads & enable intelligent indexing, search, and retrieval of data at scale for enterprises that want AI-ready data. We believe we’re still in the early innings of this transition to protecting cloud workloads, so we feel confident about Cohesity’s ability to compound growth meaningfully even after an IPO. We're excited because we think the opportunity here is quite large, and Cohesity is well setup to dominate the overall category.