Trending Assets
Top investors this month
Trending Assets
Top investors this month
@stackinvesting
Peter Offringa
$50M follower assets
My background is in software engineering at the CTO level for a number of large consumer Internet properties. I apply that background to stock analysis of software infrastructure companies, bringing the perspective of a practitioner and budget owner.
55 following2,855 followers
Confluent (CFLT) Q2 2023 Earnings Review
After delivering favorable results in Q1, Confluent continued on largely the same trajectory in Q2. Sequential growth rates even imply stabilization around 30% annually, after almost 2 years of deceleration from Covid highs. This growth is being led by rapid adoption of Confluent Cloud, with the licensed Confluent Platform offering for on-premise deployments demonstrating surprising resiliency as well.

The other factor providing support is a sharp improvement in profitability measures, with Non-GAAP operating margin increasing by 25 points year over year from -34% to -9%. The company expects operating margin to reach break-even by Q4. Free cash flow is following a similar path. Profitability was helped by an 8% headcount reduction in January, which notably doesn’t appear to have impacted growth.

Confluent’s recent revenue outperformance has been driven by customers exceeding their commitments for Cloud service usage. While customers may be limiting the size of future contractual obligations (reflected in RPO), engineering teams are choosing to allocate more spend to Confluent as the quarter proceeds. To me, this reflects the value customers are extracting from the Confluent Cloud product, as they willingly spend more than they had budgeted.

This may reflect the larger overall theme that enterprises are scrambling to improve their data processing infrastructure in anticipation of leveraging AI to create new offerings for their customers, employees and partners. While the expected benefits from AI are still unfolding and enterprises are largely in proof of concept mode, it is generally understood that effective AI requires good data.

Quality data as an input for AI and ML processing isn’t new, but the priority has been increased. This translates into more focus on consolidation, filtering, cleansing and categorization of silo’ed data stores. Data recency is also recognized as an advantage, which is creating a greater push towards real-time data distribution. It is this focus on reaching more data and delivering it in near real-time that is served by Confluent.

As other software infrastructure companies are experiencing rapid drops in NRR, Confluent appears to be holding up well. While they don’t report the actual values, management reported that overall NRR is still over 130% and for Cloud it exceeds 140%. These are pretty remarkable numbers in this IT environment for a product that is approaching an $800M annual run rate.

In this post, I review Confluent’s product strategy and discuss a few new announcements. Then, I will parse the Q2 results and discuss the trends that appear to be driving the recent outperformance in CFLT stock. For interested readers, I have published a couple of prior posts on Confluent, which provide more background on the product offering, competitive positioning and investment thesis.

Confluent's Product Strategy

Confluent sells a cloud-native, feature complete and fully managed service that provides development teams with all the tools needed to support real-time data streaming operations. The platform has Apache Kafka at the core, supplemented by Confluent’s additional capabilities to make Kafka easier to scale, manage, secure and extend. Some of these value-add features include visualization of data flows, data governance, role-based access control, audit logs, self-service capabilities, user interfaces and visual code generation tools.

Unlike some other forward-looking cloud infrastructure categories, like edge compute or even SASE, Confluent doesn’t have to convince enterprises of the value of data streaming. Over 75% of the Fortune 500 already use Apache Kafka at some level to accomplish this. Confluent’s task is to demonstrate that their data streaming platform, which offers many enhanced capabilities over self-managed open source Kafka, is worth the incremental cost.

While justifying an upgrade in the corporate data center represents a higher hurdle, Confluent Cloud provides enterprises with a managed solution on their hyperscaler of choice, eliminating the need to maintain a large team of operations engineers with Kafka expertise. Additionally, Confluent’s custom Kora engine delivers significantly higher performance, resiliency and storage efficiency than open source Kafka.

As part of the Investor Day presentation in June, Confluent management provided an updated view of the Confluent platform and its future revenue drivers. These have expanded beyond the core data streaming use case to encompass new capabilities that deliver all the capabilities necessary to power a modern data streaming platform. The top-level value-add modules include connectors, stream processing, stream governance and stream sharing.

Confluent could build a successful business just providing a cloud migration path from open source Kafka to Confluent. This would require just the Stream component of the platform to execute. However, leadership sees an opportunity to address a broader set of use cases as enterprises upgrade their data distribution infrastructure in preparation for AI workloads, to address privacy concerns and to facilitate data exchange with partners.

Beyond the potential for expanded scope, I find it interesting that a couple of these capabilities overlap with focus areas for modern data cloud platforms (previously called cloud data warehouses/lake/lakehouses) like Snowflake and Databricks. It seems that data infrastructure providers are converging on a similar set of use cases, as they scramble to become the primary provider of the consolidated data plane for the enterprise.

The cloud data platform vendors are extending from their base as the enterprise’s central data warehouse. Confluent is expanding to address similar use cases around data processing, governance and sharing from their foundation as a data streaming platform. I think this represents a logical progression for Confluent. Architecturally, I can see the efficiency in addressing some of these use cases directly in streaming workflows. Both approaches will exist in the enterprise, but these new capabilities would represent an incremental revenue source for Confluent.

Specifically, data distribution throughout the enterprise and externally with trusted partners requires strong capabilities in governance and controlled data sharing. As Confluent positions themselves to become the “central nervous system” or data fabric for the enterprise, it makes sense to extend this publish-subscribe pattern to include systems outside of the corporate network. Governance and sharing provide new capabilities that allow Confluent extend their reach in a safe and controlled way.

  • Stream. At the core of Confluent’s offering is Apache Kafka, as the data streaming platform (DSP). The DSP allows companies to produce and consume (using the publish-subscribe pattern) real-time streams of data at any scale with strong guarantees for delivery. Today, this offering comprises the majority of Confluent’s Cloud revenue.
  • Connect. To build a central nervous system for enterprises, a basic requirement is to connect to all systems involved. Confluent Cloud allows customers to run any Kafka connector in a cloud-native way, making them serverless, elastically scalable and fault tolerant. The Confluent team offers 120 connectors out-of-the-box to a variety of popular enterprise systems. There are also hundreds of open source connectors to less common systems available. In Q2, Confluent released their custom connectors offering, which enables customers to run any open source connector inside Confluent Cloud. This expands their reach beyond the basic set of connectors. Leadership reported that the on-premise, Confluent Platform product has approximately an 80% adoption rate for Connectors. This product is monetized.
  • Process. Data processing is a key component of any major data platform. Stream processing extends basic data processing capabilities from SQL batches to full-scale data transformations with logic applied in real-time directly to the stream. Apache Flink is emerging as the standard for stream processing. Flink is available across a broad ecosystem of programming languages and interfaces. It is widely popular in the open source community and is used by some of the most technically sophisticated companies in the world, including Apple, Capital One, Netflix, Stripe and Uber. Every stream in Confluent likely will have some application code for data processing run outside of Confluent. The opportunity for Flink is to bring that development effort and processing logic within the stream on the Confluent platform, generating more consumption revenue. Leadership previously estimated this revenue source could grow to become as large as core streaming itself over time.
  • Govern. When data moves between systems, teams or companies, governance is needed to ensure privacy and security of data. Streaming governance is Confluent’s fully managed governance suite that delivers a simple, self-service experience for customers to discover, trust and understand how data flows. Confluent has taken a freemium approach to stream governance, giving basic functionality to every customer. More recently, they have started to monetize with their Stream Governance Advanced offering. Management reports that two-thirds of their Confluent Cloud customers are using Stream Governance already. Revenue growth from Stream Governance Advanced is the highest of any product they have launched.
  • Share. Inter-company sharing is a pattern the Confluent team noticed was gaining traction in the customer base in recent years. Customers in financial services and insurance needed to integrate to distribute key financial data streams within a complex network of publishers and consumers. Customers in travel needed to exchange real-time data on flights between airports, airlines, bookings companies and baggage handling companies. Retailers and manufacturers had to ingest real-time streams from suppliers to manage an end-to-end view of their inventory or supply chain. Oftentimes, these companies would have teams working out complex solution to facilitate this sharing, only to realize they were both distributing data internally through Kafka. Stream Sharing allows customers to enable cross-company data sharing within Confluent, but with strict access controls provided by Stream Governance.

These new capabilities can all be monetized separately. Confluent currently lists these as Value-add Components on their Pricing page for Confluent Cloud. Connectors and Stream Governance are in GA now and have pricing available. Stream Processing moved to early access with select customers in May. Stream Sharing was introduced in May with limited capabilities, but will continue to evolve into a full release with monetization.

Additionally, these capabilities are complementary and generate network effects for Confluent. They will both increase consumption within the platform and introduce Confluent to new potential customers. I have discussed in the past how Snowflake’s Data Sharing capabilities help them attract (and retain) new customers. I think the same argument for strong network effects can be made for Confluent with their new Stream Sharing capability.

"This means extending our central nervous system vision, something that spans the company to something that spans large portions of the digital economy. By doing this, the natural network effect of streaming, where streams attract apps, which in turn attract more streams is extended beyond a single company, helping to drive the acquisition of new customers, as well as the growth within existing customers. It’s essential to understand that these five capabilities: stream, connect, govern, process, share are not only additional things to sell, they are all part of a unified platform and the success of each drives additional success in the others. The connectors make it easier to get data streams into Kafka, which accelerates not just our core Kafka business, but also opens up more data for processing in Flink, adds to the set of streams governed by Stream Governance or they’re shareable by stream sharing. Applications built with Flink drive use of connectors for data acquisition and read and write their inputs from Kafka. Governance and sharing add to the value proposition for each stream added to the DSP. Each of these capabilities strengthens the other four." Confluent Q2 2023 Earnings Call

To calculate their market opportunity (TAM), the Confluent team used a bottoms-up approach. They examined their three customer segments, calculated the target population for each and then assigned an average ARR estimate. Their spend thresholds may skew on the high side for each cohort, but they have already hit these targets with some customers. As part of their Investor Day in June, management mentioned having 3 customers with $10M+ in ARR customers and 9 with $5M+.

When it was calculated in 2022, this exercise generated a $60B market opportunity. Using Gartner market data, management further predict a CAGR of 19% from 2022 to 2025. This yields a target addressable market of $100B by 2025, of which Confluent is currently less than 1% penetrated.

It’s worth mentioning that Confluent perhaps has a more favorable position to capture their share of TAM than some of the other software infrastructure categories that I follow. For example, observability vendors estimate a roughly similar sized TAM, but have to split that among several publicly traded independent providers (Datadog, Dynatrace, Elastic, Splunk, etc.). Security’s TAM is a couple times larger, but has many, many more vendors.

As compared to these, Confluent is the only independent, publicly traded provider of a data streaming platform. There are a few smaller private companies, which seem to have varying degrees of traction (Red Panda, Pulsar). The hyperscalers also offer managed Kafka and forms of pubsub systems, but none is as comprehensive as the Confluent platform.

Future Opportunities

Beyond the well understood path of converting traditional Kafka customer deployments to Confluent, there is a more disruptive aspect to Confluent’s strategy. That has to do with the logical combination of real-time streaming and basic stream processing capabilities. In many cases, skipping a central repository like a data warehouse makes sense as demand for near real-time updates increases.

During the product section of Investor Day in June, management walked through a number of usage scenarios for the Confluent Platform. It struck me that few of these involved sourcing or passing data through a data warehouse. For many use cases, it is faster to distribute data directly between operational sources and destinations. The major new capability to facilitate this direct pass-through is the addition of Stream Processing (through the Immerok acquisition based on Apache Flink). With this, sophisticated data aggregations and transformations can be addressed within the data streaming platform.

Granted, data will likely end up in a central repository eventually, like a data warehouse/lake/lakehouse for permanent storage, but a lot of the pre/post processing may be addressed within Confluent. This would shift a share of compute consumption from the cloud data platform providers to data streaming platforms like Confluent.

Confluent management has estimated that revenue from stream processing could eventually equal that for core data streaming. Viewed through this lens of facilitating real-time data distribution, that argument makes sense. Additionally, other new capabilities on the Confluent platform, like stream governance and data sharing mirror those of modern data platforms.

These use cases could peel off some cloud consumption revenue from the traditional cloud data platforms. As near real-time data distribution becomes the default, why not facilitate some portion of data processing and sharing through the streaming platform given that adequate governance is in place? I think this represents a reasonable argument for circumventing some portion of data sharing pipelines that would ordinarily be sourced from a consolidated data platform.

Q2 Earnings Results

Confluent reported Q2 earnings on August 2nd. As with the Q1 report, the post earnings response was positive with the stock popping by 16.2% the following day. Interestingly, this was the same jump experienced after the Q1 results. In both cases, the market discounted the stock significantly in the period before earnings. For Q2, it was down 21% from a recent peak just 2 weeks prior. This pre-earnings drop of about the same magnitude occurred in the two week period before Q1 earnings as well.

_CFLT Stock Chart (_Source – Koyfin. Click on the link for a reader discount)

Since Q2 earnings, CFLT stock has given back about 10% of its post earnings jump, but is still up about 45% YTD. Some of this coincides with a general AI stock sell-off that has occurred in August. Compared to other software infrastructure companies, Confluent’s valuation appears inline. The trailing P/S ratio is about 13.6 for a company growing revenue at 36%. Confluent remains unprofitable, however. The inflection to profitability and positive FCF margin within 6-12 months may provide further support for the valuation.

Revenue

Analysts were expecting revenue of $182.4M, representing growth of 30.9% annually and 4.6% sequentially from the Q1 result. This was at the middle of the range for $181M – $183M guided by the company. In Q2, actual revenue delivered was $189.3M, which was up 35.8% annually and 8.4% sequentially. Annualizing the sequential result reflects a 38.1% run rate, which is slightly higher than the revenue growth just delivered.

Subscription revenue grew even faster at 38.9% y/y to $176.5M and accounted for 93% of total revenue. In Q1, subscription revenue was $160.6M, meaning that subscription revenue grew sequentially by 9.9%. Annualized, that yields nearly 46% growth. Subscription revenue represents the core of the company’s growth as it is directly tied to demand for Confluent’s product offerings.

Services includes consulting work, implementation assistance and training. These are lower margin offerings necessary to help some customers with their transition to Confluent. Subscription revenue is more indicative of demand for Confluent’s core product offering and the higher growth rate here is a benefit.

Looking back over the past 2 years, investors will note that annual revenue growth has been rapidly decelerating from over 70% in Q4 2021 to 36% most recently. However, Q4 2021 represented a peak for the year 2021, with Q1 2021 revenue growth starting at 51%. More importantly, the amount of deceleration from Q1 to Q2 2023 was the smallest over that period at 240 bps. Further, the annualized sequential growth rate for Q2 would imply we are close to the bottom.

Revenue growth for Confluent Cloud is much higher than the overall company growth rate. The contribution mix is rapidly shifting towards Confluent Cloud with that product contributing 44% of total revenue in Q2, which was up 10% year/year from 34% of total in Q2 2022. Meanwhile, Confluent Platform’s contribution is decreasing, dipping below 50% for the first time in Q2.

Confluent Platform is growing more slowly, increasing by 16% annually in Q2 to reach $92.9M. This result exceeded the company’s expectations, however. In Q1, Confluent Platform revenue was $86.9M, also up 16% y/y. Most impressive is the 6.9% sequential growth rate in Confluent Platform, which implies that growth of Platform is actually accelerating. This was the second quarter in a row in which Confluent Platform growth outperformed management’s expectations. They attributed this to strength in regulated industries such as financial services and the public sector.

These entities are exhibiting a high demand for on-premise data streaming services, while they take longer to plan their cloud migrations. I find this interesting and may reflect a shift in priority for data streaming among engineering teams. One would think that a change to real-time data distribution would be saved for the cloud migration, but these organizations consider it important enough to justify ramping up within their on-premise installation. As I discussed earlier, I think a desire to prepare data infrastructure for AI is a contributing factor here.

Confluent Cloud revenue, on the other hand, grew by 78% in Q2 annually and 13.6% sequentially. Confluent added $9.9M of Cloud revenue in Q2, beating their guidance from Q1 for incremental Cloud revenue of $7.5M – $8.0M by a healthy margin of about $2.2M at the midpoint. The outperformance was driven by higher than expected consumption by some larger customers. Additionally, management noted that cloud as a percentage of new ACV bookings exceeded 50% for the seventh consecutive quarter.

The outperformance on Cloud represents an interesting signal. Customers appear to be conservative with their contractual commitments in advance (reflected in RPO), and then consume more resources (captured in revenue) within the quarter. While many software infrastructure companies are reporting customers optimizing consumption lower than expected, Confluent Cloud seems to be experiencing the opposite. Some large customers appear comfortable over-spending on Confluent Cloud.

RPO was $791.4M, up 34% y/y and 6.6% sequentially. In Q1, RPO growth was 35% y/y, so annual growth decelerated slightly, but sequential growth picked up. Current RPO, which represents RPO expected to be recognized within the next 12 months, was $514.8M in Q2. This was up 41% y/y and increased 7.9% from the $477M reported in Q1.

Management pointed out that the RPO growth rates were impacted by customers scrutinizing their budgets in the current spending environment. They again reinforced the observation that while commitments are lower, customers are actually consuming more than these commitments in the actual period usage. This provides a strong indicator of Confluent’s value to the organization, if management is willing to go over budget on Confluent consumption. With all of the attention being paid to spending trajectory, it is unlikely this over consumption is accidental.

"Our growth rates in RPO, while healthy, were impacted by a continuation of lower average deal sizes, a result of customer scrutinizing their budgets in the current environment. Despite the budget scrutiny, we remain encouraged that customers continue to derive value from using Confluent and consume more than their commitments, which is reflected in our revenue, but not in our RPO results." Confluent Q2 2023 Earnings Call

Confluent’s mix of revenue outside of the U.S. continues to grow, reaching 40% of total revenue in Q2. International revenue grew much faster than in the U.S., logging 45% annual growth versus 30% for the domestic business. This compares to 49% growth internationally in Q1 and 32% in the U.S. Lest we be concerned about the lagging growth in the U.S., Q2 domestic revenue increased by 9.6% sequentially, versus 7.1% for the international business.

Additionally, improvement in FX (neutral to weaker U.S. Dollar) should provide a little tailwind for international growth. While Confluent does not perform a constant currency calculation (because it charges in U.S. dollars), international customers would experience higher prices as the Dollar appreciates. When the opposite effect happens, Confluent prices feel lower for international customers. Other software infrastructure providers have reported a similar effect (Datadog, Cloudflare, etc.) and anticipate a small tailwind as FX improves.

Looking forward, management expects revenue for Q3 in a range of $193.5M to $195.5M, representing growth of 28.2% annually and 2.7% sequentially. This squeaked by the analyst estimate for $193.6M. The preliminary sequential guidance of 2.7% is lower than the 4.4% sequential growth projected from Q1 to Q2. That turned out to be 8.4% sequential growth in the Q2 actual, implying that Q3 sequential growth could decelerate into the high 6% range.

For cloud revenue, they expect $92.2M, which would be up 62% y/y and contribute 47.4% of total revenue. Given that they just delivered $83.6M of Cloud revenue in Q2, this represents an increase of $8.6M or 10.3% sequentially. This compares to the Q2 estimate in a range of $7.5M – $8M, which turned out to be $9.9M of incremental growth or a beat of $2.2M. Applying the same beat to the Q3 estimate would generate $10.8M of incremental growth and 12.9% sequentially. While that is a nice sequential growth, it is a bit lower than the 13.6% sequential growth just delivered. If we annualize the 12.9% sequential growth it equals the 62% annual growth projected, implying the deceleration in Confluent Cloud annual growth may be stabilizing.

For the full year, management raised the target revenue range by $7M to $767M-$772M for 31.4% annual growth. If Cloud represents 47% of total revenue in Q3, they expect that to increase to 48% – 50% in Q4. The slower increase in mix is attributable to continued strength in Confluent Platform growth. It should be noted that the $7M raise to annual revenue was equal to the beat on Q2 revenue. Management isn’t making any further assumptions about outperformance through the remainder of the year, given uncertainty about the macro environment.

I thought the Q2 revenue performance was strong and the forward guidance was okay. Analysts asked about the slightly softer raise to incremental Q3 Cloud revenue, given that the estimate is lower than what was delivered in Q1. I expect them to beat this target as they did in Q1. Given the remaining uncertainties in the IT spending environment, I think management is being suitably conservative.

Profitability

Confluent management has brought a sharp focus to profitability improvements over the past year. This has manifested most clearly with a roughly 24 point increase in Non-GAAP operating margin year/year. Prior to Q2, they had improved operating margin from -41.0% to -23.1% in the period from Q1 2022 to Q1 2023, representing a 1790 bps increase.

For Q2, they set a target of -16%, representing another 700 bps of improvement. This would generate ($0.08) to ($0.06) in Non-GAAP EPS. The actual results were much better with -9.2% operating margin. This beat their Q1 target by a whopping 680 bps and represented a 24.3 point improvement over Q2 2022. This result flowed through to Non-GAAP EPS which reached break-even at $0.00, beating the mid-point of guidance by $0.07.

Most of the improvement in year/year operating leverage came from efficiencies in the GTM effort. As a percentage of revenue on a Non-GAAP basis, S&M consumed 49% of revenue in Q2 2023, versus 62% a year prior, decreasing by 13 points. R&D decreased less, by 4 points and G&A by 2 points.

Investors will recall that Confluent announced an 8% staff reduction in January 2023. While a layoff is generally a disruptive event, the company has maintained its projected revenue growth in the two quarters following. This is actually a pretty bullish signal, as the period following the layoff would represent a distraction for most staff members. The positive financial objective of Confluent’s 8% workforce reduction was improvement in the path to profitability by a year.

Another contributor to the year/year improvement in profitability was a marked increase in gross margin. Total Non-GAAP gross margin improved 440 bps between Q2 2022 and Q2 2023, reaching 75.0% in the most recent report. For subscription revenue, the gross margin increases to 79.1%. Even on a GAAP basis, gross margin improved by 490 bps year/year. The improvement in gross margin was driven by two factors – healthy margins for Confluent Platform (which is license based) and lower cloud hosting costs on the hyperscalers through higher scale (volume discounts) and usage optimization.

It is worth noting that on a GAAP basis, the operating margin improvement was of the same magnitude, but starting at a more negative base. GAAP operating margin increased from -84.1% in Q2 2022 to -63.1% in Q2 2023, representing an improvement of 21.0 points. GAAP net loss per share stepped up from ($0.42) to ($0.35). Stock based compensation expense increased by 33.8% y/y from $68.9M to $92.2M.

Looking at cash flow, FCF was -$35.2M in Q2, which represented a slight improvement of $1.7M y/y. However, as a percentage of revenue, we have -26.5% a year ago and -18.6% in Q2. This shows about 8 points of improvement. Management had indicated that FCF margins should follow the trajectory of operating margins over time. On the earnings call, they stipulated that FCF margin should reach break-even roughly on a similar timeline to operating margin. That inflection is targeted for Q4 2023, which is now just two quarters away.

Looking forward, management projects -10% non-GAAP operating margin for Q3. This is below the -9.2% just achieved, but higher than the -16% target originally set for Q2. This implies that actual Q3 operating margin could hit the low single digits. Management took a similar approach with Non-GAAP income per share. For Q3, they set the target for ($0.01) to $0.00, which is just below the $0.00 delivered in Q2, but higher than the ($0.07) target originally set for Q2. With a similar beat, this will go positive in the Q3 actual.

For the full year, the improvement is more noticeable. Coming out of Q1, they raised the Non-GAAP operating margin target for fully year 2023 to a range of -14% to -13%. In Q2, they further increased the full year target to -10%. For income per share, they raised the target range from ($0.20) to ($0.14) in Q1 to $(0.05) to $(0.02) in Q2. These represent significant increases in my opinion, even more impressive than revenue growth in this environment.

Finally, the CFO provided long term targets for profitability. While revenue growth is maintained at 30% or higher, they expect to deliver operating margin in the 5-10% range. FCF margin would be inline with this as well. The long term trajectory would bring both operating and FCF margin to 25%+. This compares well to targets shared by other software infrastructure companies with a similar product mix.

These long term targets are slightly higher than those presented at the Investor Session in May. The long-term gross margin target previously was 72%-75% (now 75%+. The long-term operating margin target was 20%-25% and is now 25%+.

Customer Activity

Confluent demonstrated strong customer engagement in Q2, primarily with their larger spenders. The big opportunity for the company is to increase the annual commitment from their largest customers, as it doesn’t take many $1M plus customers to keep driving durable growth at their current revenue scale.

Of the reported customer metrics, Confluent experienced the highest growth in $1M+ ARR customers, which increased by 48% y/y. For Q2, they added 12 of these. That followed additions of 8, 15 and 13 in the prior three quarters. Confluent added 11 $1M+ customers between Q1 and Q2 2022, so this most recent Q2 outperformed slightly on an absolute basis.

Growth of $100k+ customers was healthy as well, with 69 added in Q2. This represented a 33% annual increase and 6.4% sequentially. Prior quarters registered counts of 60, 70 and 83, so roughly holding in the same absolute range. Q2 of 2022 added 59 of these sized customers, meaning this Q2 represented a healthy jump.

The total customer counts are growing more slowly. For Q2, Confluent added 140 total customers, up 17% y/y and 3.0% sequentially. This rate is on the lower end of prior quarters, which increased by 160, 290 and 120. A year ago in Q2 2022, Confluent added 0 customers sequentially, which was a consequence of a change to the sign-up process for the Cloud product.

As investors will recall, Confluent previously required any new Cloud customer to enter credit card information and begin incurring charges for any usage. After Q1 2022, new Cloud customers could get immediate access to the service for limited usage. This is called the Basic plan and allows for $400 of credits before requiring a credit card payment.

Once they are ready to ramp up, then customers switch to the Standard plan which is paid. I suspect this has created a lag in new paid customer creation. With IT budget pressure, customers with a planned migration to Confluent Cloud are likely extending the prep phase and delaying incurring charges as long as possible. There is probably a backlog of free customers ready to switch to paid, which should support paid customer additions over the next year.

Another factor that mitigates concerns around total customer growth is that Confluent has a large pool of existing Kafka users to draw from. In addition to the estimated 75% of the Fortune 500 that have a Kafka installation, over 150,000 organizations total are estimated to be using Kafka. As 36% of the Fortune 500 are already Confluent customers, I think we can assume that Confluent has demonstrated advantages over open source Kafka. These are likely the most discerning customers and historically have received the most focus. With 145,000 (150k – 5k current paying) customers left to convert to Confluent, I think there is adequate runway remaining for total customer growth.

During the Investor Day session in June, Confluent’s Head of Sales discussed how most of the sales and marketing focus to date has been on converting the Fortune 500 and Enterprise customer segments (defined as having >$1B revenue annually) to Confluent. They see an equally large opportunity to monetize the tens of thousands of Kafka users within the Commercial / Mid-market segment (revenue <$1B annually). Over the last 3 years, they have already increased total customers in this segment by 7x and increased $100k ARR customers by 13x.

As another signal around large customer spend in Q2, management noted that the number of $5M+ ARR customers continued to grow, but didn’t provide a number. They also referenced growth in $10M+ ARR customers in the past. During the Investor Day session in June, the head of sales mentioned that Confluent has 9 customers spending $5M+ and 3 with $10+ in ARR.

When I first looked at Confluent, I was surprised to learn that they had customers paying $5M or even $10M a year. Thinking about the importance of the solution for the largest enterprise data companies, it makes sense. As the Confluent platform expands within the enterprise to power the full data distribution backplane (evolves to the “central nervous system”), the importance of the solution rises. It doesn’t require too many $5M-$10M ARR customers to generate $800M in revenue (their current full year target).

Confluent doesn’t provide exact numbers for net expansion, but confirmed that overall NRR for Q2 was still above 130%. The gross retention rate was above 90%. The most impressive metric in my mind, though, is that NRR for just the Cloud service was above 140%. This is quite impressive for a business that has a run rate over $300M. I suspect this will drop below 140% at some point, but is clearly the driver of Cloud’s high annual growth.

Confluent management provided some examples of customer expansion journeys with their largest spenders. Some customers have increased spend by orders of magnitude over a short period of time. One large international bank increased spend by 34x over less than 3 years. An electronics manufacturer grew by 192x in a similar period, but with a smaller base.

It is this opportunity for large customer spend at relatively small scale that gives me confidence in Confluent’s ability to maintain high durable growth over the next few years. Additionally, none of this factors in new products, like stream processing, stream sharing and governance, which management thinks could be as large as the core data streaming product revenue stream over time.

Investment Plan

The primary opportunity for Confluent is to expand their reach across the enterprise data infrastructure fabric to reach the status of “central nervous system”. They have a relatively straightforward glide path in convincing the large installed base of Kafka users to upgrade to Confluent as part of their cloud migration plan. A TCO reduction of 60% and a broader set of capabilities makes this an easy argument for data-intensive enterprise operations (which most are now).

On top of this, Confluent is launching several new capabilities to enable the broader data streaming platform that can each be monetized as add-ons. These include Connectors (monetized now), Stream Governance (monetized now), Stream Processing (early access) and Stream Sharing (just announced). Management has estimated that revenue from Stream Processing could equal that of the core data streaming offering over time.

This expansion motion will in turn drive larger and larger spend by enterprise customers. Confluent already enjoys 147 customers spending more than $1M in ARR. They also reported a number of $5M ARR customers and even a few with more than $10M in ARR. I expect these large customer counts to continue to grow. With a target for less than $800M in revenue this year, Confluent doesn’t require too many customers of this size to make a noticeable impact.

During their Investor Day Presentation, the CFO presented a chart showing the typical growth path for their top 20 Cloud customers. Half of these customers started on the pay-as-you-go (PAYG) plan and eventually reached $1M+ with an ARR multiple of 17x over a 3 year period. On average, they reached $1M+ within just 6 quarters.

In spite of the same demand pressures highlighted by other software infrastructure providers during Q2, Confluent delivered favorable results. Revenue growth exceeded targets and the company increased their projections for the remainder of the year. Profitability showed a strong improvement with beats on EPS and significant increases expected in operating and FCF margin through the remainder of the year. Q4 is still projected to reach operating margin break-even, with FCF margin following the same trajectory.

This performance on the top line was achieved with the 8% headcount reduction in January. To conduct an organizational change like that and maintain revenue targets is pretty impressive. As compared to several other software infrastructure providers, Confluent hasn’t lowered their annual revenue guidance for 2023, which they set back in Q4. To me, this signals that demand for Confluent’s solution remains robust and growth would be significantly higher in a normal spend environment. Having many large customers exceed their committed spend in the quarter further supports the relative uniqueness of Confluent’s performance.

For these reasons, I think Confluent has a favorable set-up going into 2024. We could see revenue growth stabilize in the 30% range and FCF margin inflect to positive. With more focus on profitability by investors, the rapid improvement in profitability measures will be well-received. I have been increasing the allocation to CFLT in my portfolio on Commonstock. It now stands at 10%. I will likely grow this to match other data infrastructure providers like Snowflake and MongoDB over time.
post mediapost mediapost mediapost media
Confluent
Confluent Pricing
Learn about Confluent pricing and save on costs with Confluent Cloud. Instantly scalable serverless clusters ensure you pay only for what you use, eliminating over-provisioning and capacity planning.

@stackinvesting fantastic read as always, thank you Peter!
I'm always amazed how big of a business data services are.

"It is this opportunity for large customer spend at relatively small scale that gives me confidence in Confluent’s ability to maintain high durable growth over the next few years. Additionally, none of this factors in new products, like stream processing, stream sharing and governance, which management thinks could be as large as the core data streaming product revenue stream over time." - I very much agree and would love to see it play through
+ 7 comments
Q2 2023 Hyperscaler Earnings Review
Headwinds from workload optimization may have peaked in Q2. After nearly a year of negative drag on revenue growth for the hyperscalers, it appears that cloud infrastructure customers are shifting their focus away from workload optimization and back towards the deployment of new ones. At least that is what Amazon’s CEO told us in the opening statement of their earnings release. While the hyperscalers delivered revenue growth rates that decelerated from the prior year, they are now showing an uptick sequentially from Q1.

This circumstance makes sense. After rapidly spinning up new cloud workloads during Covid, established enterprises and start-ups alike postponed the normal post-launch clean-up cycles that address right-sizing server instances, tuning queries and rationalizing data retention. In an environment of flush IT budgets and pressure to ship fast, it’s easy for engineering teams to postpone this technical debt.

As budgets contracted and digital transformation project pace slowed down in 2022, engineering teams were encouraged to shift cycles back to paying down their technical debt. With pressure to generate cost savings, a lot of that work focused on optimizing existing application workloads. Engineering teams were able to review cloud resource utilization, reset capacity allocations, tune database queries, clean up prolific logging and refactor code to run more efficiently. All normal engineering best practices.

Except that they happened all at once. As this optimization work digested the oversized backlog of Covid technical debt, it created an unusually large reduction in resource consumption over a shorter period. This translated into less revenue for the hyperscalers and software infrastructure vendors that are consumption based. A normal application workload tuning exercise can reduce resource utilization by 20-30% or more. Excessively over-provisioned capacity might even be cut by 50%, simply by downsizing server instances. The hyperscalers rightfully make this easy to execute as a consequence of elasticity, but lower resource consumption means less revenue.

While this tuning is standard practice, in the post Covid catch-up period, it created a larger than normal reduction in spend. Enterprises were increasing consumption in some areas by launching new cloud workloads and digital transformation projects (although probably at a slower rate), but the immediate drop in cost for existing workloads created a large negative headwind to revenue growth. This caused the prior cadence of sequential quarterly increases in hyperscaler spending to slow down rapidly and even go negative briefly.

The optimization catch-up cycle can only last a finite amount of time. Eventually, technical debt is worked off and Covid workloads are tuned and right-sized. The effort to capture savings has diminishing returns, or as Microsoft’s CFO said in April, that workloads can’t be optimized forever. With the latest earnings commentary from the hyperscalers, it appears the catch-up period is wrapping up. Hyperscaler spend by enterprises will be increasingly driven by new activity, with the negative drag created by post-launch optimization returning to its pre-Covid levels.

Where the steady state growth rate lands remains an open question. Data points during Covid were clearly inflated, as both large enterprises and rapidly growing start-ups threw money at their cloud migration and digital transformation projects. We can assume that this rate of new investment has slowed down from the peak, but is still robust. Over the past 12 months, negative consumption trends from optimization have obfuscated the post-Covid steady state. When optimization stabilizes, investors will get a clear idea of what the steady state growth rate will be.

Unfortunately, that growth rate won’t be easily comparable to the pre-Covid period as another force has injected a new variable, which is investment in AI. As enterprises consider their AI strategy, they are spinning up IT projects to harness their data to create digital services powered by new AI models. These efforts are generating demand for AI specific services from the hyperscalers. There would be some spillover into demand for generalized software infrastructure services as well.

As growth rates for the hyperscalers potentially level out and even inflect back upwards, investors can expect a similar pattern to follow for various software infrastructure providers. Logically, if enterprises are consuming more compute and storage resources on the hyperscalers to support their new AI-driven applications, then a similar increase in demand should materialize for adjacent software services, like databases, monitoring, delivery, orchestration and application security. Serving content at scale to users worldwide over the Internet still requires these things.

In this post, I will review the general trends in cloud infrastructure and how they are impacting the providers. Then, I will look at results from the Big 3 hyperscalers. I’ll wrap up by trying to project what path demand trends may take going into 2024, particularly understanding that AI will become an even larger influence over the next year.

Background

In their March 2023 quarter, Microsoft’s CFO commented that “at some point, workloads just can’t be optimized much further.” She is correct. Cloud workload optimization refers to the customer process of reviewing costs of various infrastructure services and identifying opportunities to reduce them by making better use of the resources available. Hyperscalers generally offer multiple tactics for cost reductions, like considering instance sizes, upfront commitments, access latency, software version and more.

During Covid boom times, it was common to over-provision server instances. Code was often rushed to production, with adequate bug checking, but limited performance tuning. As enterprise IT teams are now contending with budget pressures, they are trying to support the same workloads for less cost. By right-sizing server instances, refactoring code and rationalizing data retention policies, they have been reducing resource consumption and lowering the cost for each workload.

In some cases, these cost savings can reach 50% or more. As an example on AWS, simply downsizing an EC instance from 2xlarge to large can cut monthly costs by 75%. Committing to a one year contract rather than month-to-month can save another 20% and a 3 year commitment can reduce costs by 40%. These types of changes can be executed very quickly, usually within a quarter. As a consumption business, the negative impact on hyperscaler revenue would be immediate.

However, workload optimization has diminishing returns. The most significant changes are logically front-loaded. As each quarter passes, the revenue impact should decrease. This would explain what appears to be the current situation. Enterprises are still optimizing, but the opportunities to make the large reductions in cloud spending are passing, as they have mostly worked off their Covid backlog. Going forward, the negative reductions in spend on existing cloud workloads would slow down, allowing new workloads to contribute more of the growth in enterprise spend.

Optimizations started back in the second half of 2022. They continued through Q4 and into Q1 2023. As we are now receiving Q2 earnings reports from the hyperscalers, management is still referencing the effect, but the impact on revenue growth rates appears to be decreasing. This is evidenced by leveling out of annual growth rates and even sequential reacceleration of growth from Q1 to Q2.

Looking forward, management teams are hesitant to predict the end of optimization. In many ways, enterprises will always optimize workloads to some extent. The right question is whether the negative drag of optimization efforts is increasing, decreasing or remaining at steady state. I think it has reached steady state transitioning from Q1 to Q2 and will start decreasing through the remainder of 2023. Amazon, for one, has called out this inflection starting in Q2.

This means that revenue growth rates for hyperscalers can be primarily driven by the expansion of existing workloads (from more usage) and the introduction of new ones. New workloads are created by enterprises starting digital transformation projects or migrating existing applications to the cloud. This type of work has continued through the optimization period, likely with some projects postponed to accommodate one-time budget resets. I think the frequency of project delays will decrease also, as macro headwinds and interest rates stabilize.

Another factor to consider relative to hyperscaler utilization has been demand from the start-up community. As interest rates dropped to zero during Covid, VC firms poured money into start-ups offering all kinds of new digital-based services. These start-ups were encouraged to get product to market as quickly as possible, generating rapid scale-ups on hyperscaler infrastructure.

As this funding dropped in 2022 (whether measured by investment amounts or IPOs), so did the cloud infrastructure demand generated by the start-ups. In many cases, the existing start-ups went through their own optimization exercises, similarly tuning performance and right-sizing their resource consumption to match the new reality of their growth rates. Like enterprises, though, these start-ups will start to expand again, and will begin increasing their cloud infrastructure spend.

At some point, VC funds should start investing again. We are already seeing this in the AI space. AI services require other software infrastructure services beyond model training, if they are packaged as an application delivered over the Internet. As these start-ups scale up their products for broad use, they should create a new tailwind for hyperscaler and software services demand. Software infrastructure providers, like in application security, monitoring and operational databases, have also called out AI start-ups among their new customer additions.

AI Headwind or Tailwind for Broader Software Infrastructure?

Looking more broadly at software infrastructure service consumption and non-AI hyperscaler spend, one wildcard is the near term IT budget allocation between direct AI services and everything else. Most hyperscalers have rolled out AI specific offerings at this point. These supplement their existing cloud infrastructure services that allow enterprises to host applications, data processing and storage on the hyperscalers.

Enterprises and start-ups are flocking towards AI services, pursuing opportunities to launch new AI-driven products for their customers, partners and suppliers. These efforts are already showing up in hyperscaler earnings results. Microsoft reported that 1% of Azure’s annual revenue growth was due to AI spend in the prior quarter. They expect that to grow to 2% in the current quarter.

This is great news for the companies directly in the AI infrastructure value chain. The hyperscalers obviously benefit, as do the companies providing hardware to the hyperscalers that enable AI processing. These include the AI chip manufacturers, data storage and networking providers.

If IT budgets are set for 2023, it is possible that enterprises will shift some funds from digital transformation and cloud migration projects into new AI initiatives. This might result in less spend for traditional cloud infrastructure services, versus specific AI offerings. For the hyperscalers, it doesn’t matter too much, as they generate revenue from both. For vendors of services that wrap around AI-driven applications (operational databases, security, delivery, monitoring, etc.), it might.

AI services delivered as standard Internet-based applications would still consume these services, but if enterprises shift more spend towards building their AI models initially, there may be less remaining budget to fund the ongoing digital transformation and cloud migration projects. As technology leaders at enterprises prioritize their efforts based on input from corporate leadership, they may choose to delay some projects on the standard cloud migration rail in favor of investment in AI processing.

In fact, Arista Networks has a term for this. They break out hyperscaler spend on their networking gear into two buckets – AI networking and classic cloud networking. The idea is that hyperscaler infrastructure spending in the past (2022 and earlier) has been to support “classic cloud” workloads, like those enabling digital transformation and cloud migration of enterprise applications. AI networking refers to network gear specifically earmarked to enable clustering of large numbers of GPUs and storage for AI processing.

"During the past couple of years, we have enjoyed a significant increase in cloud capex to support our cloud titan customers for their ever-growing needs, tech refresh, and expanded offerings. Each customer brings a different business and mix of AI networking and classic cloud networking for their compute and storage clusters. One specific cloud titan customer has signaled a slowdown in CapEx from previously elevated levels. Therefore, we expect near-term cloud titan demand to moderate with spend favoring their AI investments. We do project, however, that we will grow in excess of 30% annually versus our prior analyst day forecast of 25% in 2023." Arista Networks, Q2 2023 Earnings Call

Arista Networks leadership signaled that their cloud titan customers appear to be considering their mix of AI networking and classic cloud networking investments. Given the large demand for AI services, they are favoring spend to support AI investment near term. Arista still expects the net benefit to be positive for their total revenue, as they raised their annual growth target to 30% from 25% for 2023.

This signals that vendors selling hardware and services directly targeted at AI processing will likely be better positioned near term, than those catering towards “classic cloud” workloads. For companies like Arista and Nvidia, this should drive increased demand. For businesses supporting “classic cloud” deployments, there may be a temporary dip in demand, as enterprise investments shift towards AI.

Like Arista, many “classic cloud” software infrastructure providers can be utilized by AI workloads as well. These require large amounts of clean and secure data, a standard application stack if they will be accessed over the Web, as well as monitoring and security. These demands should generate consumption for cloud database, data streaming, application security and monitoring, but perhaps not as much as a “classic cloud” application.

Over the long run, I think that AI will generate other cost savings in enterprises, freeing up more budget for cloud infrastructure services. As developer productivity increases, enterprises will launch applications faster, generating more demand for cloud services to host them all. These effects should drive an acceleration in demand for “classic cloud” services, offset by savings in engineering headcount. Developers will still be hired – enterprises just won’t need as many.

The independent providers of “classic cloud” infrastructure outside of the hyperscalers can benefit from demand for their services from AI-specific companies as well. Many of these companies are fast-growing AI start-ups, flush with VC investment and a mandate to grow quickly. These companies have rapidly emerged in new customer highlights for independent software infrastructure providers in monitoring (DDOG), operational databases (MDB) and application security (NET). It’s probable that a reduction in spend from enterprises is backfilled by new demand from AI customers.

What does this all mean? After the majority of one-time optimization work is complete (and we can assume that the biggest cost reductions were front-loaded), then growth in cloud spend should return to a normal level. Normal will likely reflect growth before Covid, discounting for decay from the law of large numbers. Most importantly, spend growth will no longer have a large headwind from the one-time optimization catch-up.

Another factor to consider is that enterprises will likely start to realize internal cost savings as a consequence of their AI efforts. This should free up more budget to pay for them. Microsoft’s CEO hinted at this in response to an analyst question on their latest earnings call. The analyst referenced data showing that developers are experiencing a 40%-50% productivity improvement from GitHub Copilot and wondered if that same boost would extend to other Copilot efforts in Microsoft 365, Sales and Services.

"I think what you’re also referencing is now there’s good empirical evidence and data around the GitHub Copilot and the productivity stats around it. And we’re actively working on that for M365 Copilot, also for things like the role-based ones like Sales Copilot, our Service Copilot. We see these business processes having very high productivity gains. And so, yes, over the course of the year, we will have all of that evidence. And I think at the end of the day, as Amy referenced, every CFO and CIO is also going to take a look at this. I do think for the first time — or rather, I do think people are going to look at how can they complement their OpEx spend with essentially these Copilots in order to drive more efficiency and, quite frankly, even reduce the burden and drudgery of work on their OpEx and their people and so on." Microsoft CEO, Q4 FY2023 Earnings Call

In sum, these contributors should allow the hyperscalers to enjoy consistent revenue growth again, in line with secular trends of digital transformation and cloud migration. They should additionally benefit from incremental spend on AI services, as enterprises and start-ups alike invest separately in those. The same trends would apply to the independent software infrastructure providers in observability, data processing, delivery and application security. After the hyperscaler earnings were reported, particularly with Amazon, the stocks in these companies jumped. As independent software providers report their most recent quarterly results over the next few weeks, we should get further updates on how these trends are playing out.

Hyperscaler Results

With that background, let’s look at how the hyperscalers performed. I will focus on the cloud business for each company and the revenue component of that. Amazon, Google and Microsoft have other aspects of their businesses that aren’t material to this discussion. Additionally, their reporting of the cloud hosting component of their business is usually limited to revenue performance, with varying degrees of transparency about future expectations.

(Chart from Koyfin)

Coming into this quarter’s earnings, all three hyperscaler company stocks have enjoyed nice appreciation so far into 2023. Amazon stock had the most appreciation, up 59% YTD prior to earnings on August 3rd. Microsoft stock was up 46% YTD before earnings on July 25th. Alphabet was third, up 38% prior to earnings on the 25th as well.

The stock performance following their earnings reports was mixed, as Microsoft disappointed and lost 3.9% the following day. Alphabet performed well on the back of strong advertising spend, gaining 5.6% the day after earnings. The market had to wait a week for Amazon’s results, which turned out to be worth it. Both their primary business and AWS beat expectations, driving an 8.3% stock price jump the next day.

For their cloud infrastructure businesses, the three companies delivered revenue growth slightly above their prior guidance and street expectations. After a couple of quarters of rapid drops in growth rates, the deceleration appears to be moderating. For Amazon AWS and Google Cloud, which provide the actual revenue value, we can see that sequential growth even picked up from the prior quarter. This implies that revenue growth rates are leveling out. For Amazon, we also know that growth improved as Q2 progressed and the same trends continued into July.

Given that all three companies showed improving sequential growth, but still referenced ongoing workload optimization headwinds, my interpretation is that enterprises have already addressed the big tuning opportunities. They are now working through smaller refinements or have largely completed their deferred tech debt for Covid workloads. Technology leaders would prioritize the low hanging optimization fruit first, which would explain why the largest impact was seen earlier in the cycle.

The other factor helping improve hyperscaler growth rates is the contribution from AI spending. The hyperscalers all developed products that specifically support the desire of enterprises to begin leveraging their internal data to generate new service offerings using AI. AI specific start-ups are generally hosting on the hyperscalers as well. As these start-ups receive a surge of VC funding, they represent new spending for AI services.

Microsoft reported that about 1% of their revenue growth in the prior quarter was from AI services. They expect that to double to 2% in the September quarter. The other hyperscalers don’t break out AI spend, but their management teams provided commentary indicating that AI is driving consumption across their service offerings.

With that, let’s briefly review what each hyperscaler reported.

Microsoft

As part of their Q3 (ended March 2023) earnings report, Microsoft guided for Azure revenue growth of 26-27% in constant currency for Q4 (ending June 2023). They had just delivered 31% annual revenue growth, which was down 7% from 38% growth in their Q2 (ended December 2023). For the Q4 quarter just reported on July 25th, they actually achieved 27% revenue growth, which was a 0.5% beat at the midpoint. This included a contribution from AI services equal to 1% of growth.

For the current Q1 quarter ending in September, the market was looking for 25% growth. Management guided just above that to a range of 25-26%, with 2% of revenue growth coming from AI. They expect the growth trends from last quarter to carry forward into the current quarter.

While Microsoft doesn’t report the exact revenue amount for Azure, the deceleration in annual growth appears to be slowing down. Two quarters ago, the deceleration was 700 bps from 38% to 31%, then 31% to 27% in the June quarter and potentially just 100bps from 27% to 26% in the current quarter. This isn’t completely apples-to-apples, as AI’s new contribution to the growth rate is increasing, but the trend implies the sequential growth rate is leveling out and possibly even increasing. In order for annual growth rate deceleration to slow that much, the sequential growth rate would need to be increasing.

Management also provided a signal about the overall size of Azure revenue. The CEO commented that Microsoft Cloud revenue was $110B over the prior 12 months and that Azure passed 50% of that for the first time. This implies that Azure’s annual trailing revenue total was $55B, which compares to $85.4B for AWS over the last 4 quarters.

The sales pipeline for Azure is healthy. Microsoft’s CFO commented that Azure received a record number of $10M+ contracts in the prior quarter. Additionally, the average annualized value for large long-term Azure contracts was the highest ever. This was driven by customer demand for both traditional cloud services and new AI offerings.

Management highlighted the continued strength in their Azure AI offering. It is enjoying rapid customer adoption with 11,000 customers. They cited several examples of enterprises incorporating ChatGPT features into their own internal product offerings. Mercedes is using ChatGPT through Azure OpenAI to improve its in-car voice assistant for 900,000 vehicles in the United States. Financial services company Moody’s built its own internal Co-pilot to improve productivity of its 14,000 employees.

Alphabet

Alphabet bundles revenue for Google Cloud Platform (GCP) in the line item called Google Cloud. This includes their workforce application business (Google Workspace – formally G Suite) in addition to their hyperscaler services. For Q1 (ended March 2023), Google Cloud delivered $7.454B of revenue, up 28.1% annually and 1.9% sequentially. In Q2 reported on July 25th, they increased Cloud revenue to $8.031B. This roughly matched the prior annual growth rate with 28.0%, but jumped sequentially by 7.7%. This highlights why tracking sequential growth rates currently will be more indicative directionally than annual growth rate comparisons.

I particularly liked the jump in sequential revenue growth, which is much higher than the prior two quarters. While we don’t know the exact contribution from Google Cloud Platform, management typically comments on the relative growth rate of GCP versus Google Cloud overall. During the earnings call, they again confirmed that growth of GCP was higher than the 28% annual growth rate of Google Cloud overall. This implies that the Workforce component of Google Cloud is growing more slowly.

While management highlighted the strong results for GCP, they did cite ongoing optimization of spending from enterprises. They also didn’t provide any forward guidance on GCP performance, which is standard. Management did discuss at length the strength of adoption of their AI product offerings. They claim that their cloud infrastructure is a leading platform for training and serving generative AI models, with more than 70% of Generative AI unicorns on Google Cloud, including Cohere, Jasper, Typeface and others.

GCP offers a wide variety of AI supercomputer options for customers ranging from Google TPUs and advanced Nvidia GPUs to new A3 AI supercomputers powered by Nvidia’s H100. Google AI is experiencing strong demand for their more than 80 available AI models, both open source and third party. The number of customers consuming these has grown 15x from April to June.

Examples include Priceline for trip planning, Carrefour for creation of full marketing campaigns and Capgemini to streamline hundreds of internal business processes. HSBC uses their anti-money laundering AI service to flag financial crime risk. What I like about these examples is the breadth of use cases, extending into many facets of enterprise operations. This goes far beyond simple chat agents powered by ChatGPT.

Amazon

Saving the best for last, Amazon signaled the most upbeat momentum for cloud infrastructure spend. Not only did the overall business deliver a strong beat, but AWS performed better than expected. Investors will recall that AWS grew 16% y/y in Q1 (ended March 2023). The CFO mentioned on the earnings call that the annual growth rate had dipped to 11% y/y for the month of April (first month of Q2). If this deceleration continued, the overall growth rate for Q2 might have dropped to 10% or lower.

This led analysts to expect 10% annual growth for Q2 from AWS. The actual rate was 12.2% annually, which implies revenue growth re-accelerated after April. In fact, the q/q revenue growth rate was 3.7% in Q2, recovering from the -0.1% drop in Q1. This stabilization of growth rates was supported by management commentary.

"So, again, if we rewind to our last conference call, we had seen 16% AWS revenue growth in Q1, and the growth rates had been dropping during the quarter. And what I mentioned was that April was running about 500 basis points lower than Q1. What we’ve seen in the quarter is stabilization and you see the final 12% growth. So, while that is 12%, there’s a lot of cost optimization dollars that came out and a lot of new workloads and new customers that went in. What we’re seeing in the (current) quarter is that those cost optimizations, while still going on, are moderating, and many maybe behind us in some of our large customers. And now we’re seeing more progression into new workloads, new business. So, those balanced out in Q2. We’re not going to give segment guidance for Q3. But what I would add is that we saw Q2 trends continue into July. So, generally feel the business has stabilized, and we’re looking forward to the back end of the year in the future because, as Andy said, there’s a lot of new functionality coming out with. So, optimistic and starting to see some good traction with our customers’ new volumes." Amazon CFO, Q2 2023 Earnings Call

The statement from the press release that “growth stabilized as customers started shifting from cost optimization to new workload deployment” provides a good summary for the drivers of the Q2 results relative to Q1. As I discussed in my Q1 review of hyperscaler results (and Microsoft’s CFO hinted at), enterprise workload optimization had to decrease at some point. There is only a finite amount of optimization that can be performed and the impact is usually front-loaded. This effect was largely a catch-up on technical debt that had been accumulated during the Covid spending surge.

All the while, enterprises have been continuing to deploy new workloads and other lagging companies are starting their cloud migration journey. These effects generated incremental consumption of cloud infrastructure resources, but it was being offset by the negative impact of optimization. As optimization curtails, overall revenue growth rates will be primarily driven by new deployments again.

Amazon leadership also wanted to emphasize the opportunity for AI to drive more demand within AWS. They highlighted a “slew of generative AI releases” that provide cost savings and ease of use relative to model training, LLM customization and code generation. AWS wants to democratize access to generative AI for their customers by providing access to any LLM of choice and simplifying the requirements to get started. They are also emphasizing security and privacy, lest proprietary enterprise data is leaked out through model training.

Investment Plan

While the three hyperscalers continued to reference ongoing customer workload optimization as part of their calendar Q2 earnings reports, the magnitude of the negative impact appears to be decreasing. This was evident from their revenue growth rates and management commentary. Both AWS and Google Cloud delivered sequential revenue growth rates, which annualized are greater than their growth rate over the prior year. This implies a reacceleration of growth.

This quarter, Amazon leadership provided the strongest signal that the impact from enterprise workload optimization is largely behind us, emphasizing that customers are shifting their focus to launching new workloads. The CFO even added that some of their largest customers are complete with their Covid workload optimization catch-up work.

Investors will recall that last quarter Microsoft CFO’s reflected a similar view that the optimization surge would eventually end. More specifically, she asserted that optimization can’t continue forever, implying that customers are mostly through the adjustments that have the largest negative impact on revenue. She discussed how workload optimization had started about a year ago and that comparables will be easier as we get into the second half of 2023.

This underscores a point that I raised in my review of the hyperscaler results from Q4. Optimization of a cloud workload tends to be a one-time exercise. Once the resources dedicated to an application workload are reset to match actual utilization, there isn’t a reason to keep reducing that allocation. The magnitude of an optimization exercise can be large. Simple server instance downsizing and longer commitments can reduce workload costs by 50% or more. The impact on quarterly revenue will also be immediate, as a consequence of the hyperscaler consumption models.

Once optimization is complete, however, then growth in utilization will once again be driven by increases in application usage. Enterprises will continue their cloud migration and digital transformation work, generating new consumption of infrastructure resources. While this was continuing to a lessor extent over the past year, its contribution to revenue growth was offset by a large deficit from workload optimization.

Additionally, enterprises are pursuing initiatives to incorporate AI into new offerings for their customers, employees and suppliers. This drives more hyperscaler resource consumption, both of AI services and the broader rails of software infrastructure. These efforts will go beyond core AI model generation, cascading out towards inference and all standard cloud application support services (data, storage, security, monitoring, etc.). As Microsoft’s CEO said, AI services will also “spin the Azure meter.”

Last quarter, I tried to visualize these various effects in the diagram below. While there are a lot of moving parts and certainly unknowns, I think we can make some assumptions that explain the surge in hyperscaler revenue growth during the Covid period (2020-2021) and then the marked deceleration in growth rates we have been witnessing over the last few quarters since mid-2022. If we assume this has been driven by a cycle of over-provisioning and optimization, the trends in hyperscaler growth rates make sense. Further, if optimization will taper off and AI workloads ramp up, then we can extrapolate the likely curve of revenue growth over the next year or two.

The trends I anticipated in Q1 have largely played out in Q2. We are witnessing an inflection in the blended revenue growth rates, as optimization effects are bottoming and should diminish going forward. While the growth rate of overall classic cloud infrastructure is slowing decreasing due to large numbers, spend on new AI offerings is starting to register and will likely backfill a portion of the decay (if not all of it).

"I mean, even the workloads themselves, AI is just going to be a core part of a workload in Azure versus just AI alone. In other words, if you have an application that’s using a bunch of inference, let’s say, it’s also going to have a bunch of storage, and it’s going to have a bunch of other compute beyond GPU inferencing, if you will. I think over time, obviously, I think every app is going to be an AI app. That’s, I think, the best way to think about this transformation." Microsoft CEO, Q2 FY2023 Earnings Call, January 2023

For investors in those companies that provide supporting infrastructure around the hyperscalers, like observability, data transport and management, security and delivery, we should see similar trends play out. The optimization and new workload patterns for these companies should follow the behavior of the hyperscalers.

Additionally, a new wave of AI-driven innovation won’t just benefit the vendors of the core AI inputs, like chip manufacturers and the hyperscalers. Any service hosted in the cloud and delivered over the Internet will consume the same software infrastructure resources that contributed to the last wave of Internet growth, whether Web2, mobile apps or remote work.

Further, new AI application investment won’t require incremental budget from most enterprises over time. The costs can be offset by the productivity gains for their knowledge workers. Enterprise departments will find that their employees can accomplish more with AI-driven software services and co-pilots. They will therefore require less headcount to complete the same amount of work. Payroll costs will decrease, providing savings to be invested in more sophisticated software services, whether digital assistants, workflow automation or system-to-system coordination.

Finally, the creators of software applications, namely developers, will become several times more productive. They will deliver new digital experiences faster. The result will be more applications that then consume greater cloud infrastructure resources. That additional expense will be further offset by fewer developer resources.

For me, all of this implies a guardedly optimistic outlook for the independent providers of cloud infrastructure and associated software services. We will likely see stabilization of revenue growth rates for the software infrastructure companies that will be reporting their quarterly results over the next month. This will be driven by the same moderation of cost optimization impact being experienced by the hyperscalers.

Unless the macro picture takes a significant additional step downward, I expect demand to pick back up in the second half of 2023, providing a real opportunity for reacceleration of growth going into 2024. This could provide a favorable set-up for many of the software infrastructure stocks that have been beaten down over the last year. For those willing to stomach some volatility over the next quarter or two, you might be well rewarded 6-12 months from now.
post mediapost mediapost mediapost media
ir.aboutamazon.com
Amazon.com Announces Second Quarter Results
Amazon.com, Inc. (NASDAQ: AMZN) today announced financial results for its second quarter ended June 30, 2023. Net sales increased 11% to $134.4 billion in the second quarter, compared with $121.2 billion in second quarter 2022. Excluding the $0.3 billion unfavorable impact from year-over-year changes in foreign exchange rates throughout the quarter, net sales increased 11% compared with second quarter 2022. North America segment sales increased 11% year-over-year to $82.5 billion. International

Insights from a Pair of Data Summits
AI has crashed onto the investing stage in 2023, already driving significant stock price gains for several companies. Some, like Nvidia and Microsoft have already projected a direct revenue benefit as part of recent earnings reports. For others, management has indicated they expect AI to drive demand tailwinds going forward as part of management commentary.

Eventually, most software service and infrastructure providers should benefit from increased demand, as AI services proliferate and contribute to all areas of the economy. As many AI services are delivered through Internet-based applications, the same application guardrails of security, delivery, monitoring and operational data storage will be needed. This is in addition to the increased consumption of data services to collect, prep, distribute and process the inputs for various AI models.

AI-driven expert systems and co-pilots will increase the productivity of information workers. Enterprises will need fewer of them to accomplish the same amount of work. This will free up budget to increase spend on AI software services, similar to the efficiencies gained from the proliferation of SaaS tools over the last decade that helped internal business teams automate most aspects of their operations.

Software development teams, in particular, will experience a significant increase in output per employee. Enterprises will be able to clear their application backlogs more quickly, increasing the need for hosting infrastructure and services. At steady state, fewer developers will be needed, supporting a shift of IT budget from salaries to software.

As data is the largest ingredient to these enterprise AI development efforts, software vendors providing data processing and infrastructure services stand to benefit. AI has further elevated the value of data, incentivizing enterprise IT leadership to review and accelerate efforts to improve their data collection, processing and storage infrastructure. Every silo’ed data store is now viewed as a valuable input for fine-tuning an even more sophisticated AI model.

In the realm of big data storage, enterprises need a place to consolidate, clean and secure all of their corporate data. Given that more data makes better AI, enterprise data teams need to ensure that every source is tapped. They are scrambling to combine a modernized data stack with an AI toolkit, so that they can rapidly, efficiently and securely harness AI capabilities to launch new application services for their customers, partners and employees.

At the center of these efforts are the big data solution providers. These include legacy on-premise data warehouses, cloud-based data platforms and of course, the hyperscalers. Among these, Snowflake and Databricks are well-positioned, representing the fastest growing modern data platforms that can operate across all three of the hyperscalers. While the hyperscalers will win their share of business, enterprise data team leadership often expresses a preference for an independent data platform where feasible.

Fortunately for investors, Snowflake and Databricks held their annual user conferences recently. Perhaps it was intentional that they fell within the same week – at least they staggered the events between the first and second half. Both companies made major product and partnership announcements, leading to many comparisons and speculation about changes in relative product positioning.

The market for the combination of big data and AI processing will be enormous, with some projections in the hundreds of billions of dollars of annual spend. While the Snowflake and Databricks platforms are clearly converging in feature set scope, they still retain different approaches based on their historical user types. Such a large market will likely support multiple winners in the near term.

In this post, I will review the major themes around AI and then discuss the primary announcements from Snowflake Summit and Databricks’ Data + AI Summit. As part of this, I will try to extrapolate how each is positioning themselves relative to the major trends emerging in the AI and data processing market. Investors can use this information to position their portfolio to capitalize on AI secular trends and specifically consider the opportunity for SNOW. Hopefully, Databricks will enter the public market in the next year or two, providing another investment option.

AI as a New Secular Tailwind (Maybe the Biggest Yet)

First, let’s discuss what has changed in the last year that has brought AI to the forefront of the market’s purview and ignited a rush of investment into the space. Investors are being inundated with references to AI and are left to interpret what impact this may have on cloud infrastructure spending. They also hear discussion from entrenched software providers that they have been using ML and AI all along. So, what is new and how does this change things? Haven’t we seen this before?

If we focus just on generative AI and LLM’s, the primary change is how humans can interact with these new digital experiences and the value they can expect from them. Specifically:

Better User Interface. The method of interaction between human and machine is evolving from point, click, select on a screen (web browser or mobile app) to natural language queries and task instruction. This increases the efficiency of the interface by an order of magnitude or more. Natural language as an interface to data makes everyone a business analyst, programmer or power user, without needing to learn an obscure scripting language or complex interaction protocol.

As natural language models improve and add contextual awareness for specific business domains, humans will be able to interact with data services through conversation, rather than code. Expert systems will be built on top of the data, providing guidance to operators and employees, making everyone a specialist. This is will increase efficiency, disperse expertise and speed up decision making. In many cases, actions can be automated and the human interface will shift to quality control rather than work product creation.

Better Value Extraction. The latest generation of machine learning tools are able to model much more complexity than in the past. They can represent billions of parameters across a neural network, consisting of millions of nodes and the relative weights between them. With highly powered GPUs, these neural networks can be modeled by training on large data sets in relatively short amounts of time. Combined with new architectures like transformers for deep learning, these capabilities have spawned next level AI engines that are orders of magnitude more powerful and sophisticated than even the best recommendation engines of a decade ago.

Microsoft’s CEO calls these “reasoning engines”, discussing these two contributors as part of his keynote at the recent Microsoft Inspire partner event.. This provides a useful label to wrap the underlying AI complexity. It captures the idea of information processing – delivering a model that represent people, places, things and the relationships between them. These models can generate insights, make predictions and complete structured work items. They encapsulate all of the data available, whether scraped from the public Internet or loaded from enterprise systems.

With a simplified, natural language user interface sitting in front of a sophisticated reasoning engine, enterprises can improve efficiency and outcomes across a wide range of use cases. These improvements represent the two major additions that have spawned a new wave of interest in unlocking the capabilities of AI.

Better Outcomes

More efficient user interaction will drive much higher utilization of existing software services. LLMs and ChatGPT like interfaces allow humans to interact with software applications through an interface that is based on natural language. Rather than being bound to traditional GUIs with preset choices or requiring use of a scripted language (like SQL) to define tasks, chat interfaces allow users to engage software applications through text-based prompts. Additionally, larger machine learning models can represent more complex information sets, greatly expanding the scope of problems that can be addressed by AI training.

As an example in the consumer space, Priceline is working with Google AI to create a virtual travel concierge as the entry point for users to plan a trip. A simple text-based instruction with some rough parameters could kick off a large number of queries to multiple application data services (flight, car, hotel, entertainment, dining, etc). This replaces complex interfaces with many combinations of drop-downs, selectors, submit buttons, etc., which are then repeated for each aspect of trip planning. The user efficiency of querying for all of this in one or two sentences would result in more overall usage. Not to be outmaneuvered, other travel providers like Expedia are working on similar features.

Beyond consumer shopping and generalized knowledge aids like ChatGPT, there is an even larger opportunity within enterprises to harness machine learning to improve internal business operations. These could take the form of better customer service, predicting failures, increasing productivity or speeding up business processes. These all promise to contribute to higher sales and lower costs. As one enterprise in an industry rolls out an AI-driven service that improves their competitive position, then all other players will need to follow suite.

"AI is a transformative technology that has the potential to unlock tremendous business value. According to a recent McKinsey study, AI could add up to $4.4 trillion annually to the global economy. Our focus is on enterprise AI, designed to address these opportunities and solve business problems. The list of use cases is long and includes IT operations, code generation, improved automation, customer service, augmenting HR, predictive maintenance, financial forecasting, fraud detection, compliance monitoring, security, sales, risk management, and supply chain amongst others." IBM CEO Opening Remarks, Q2 2023 Earnings Call

An emerging area of investment that delivers many of these benefits revolves around creating digital twins of common physical business services. At the Databricks Data + AI Summit, JetBlue provided a great example of this. Their senior manager of data science and analytics described how JetBlue is using the Databricks platform to enable generative AI across a number of use cases.

What struck me is the extent of their application of AI across multiple business domains. JetBlue is leveraging LLMs to create chatbots to serve information across all of their operational specializations (maintenance, operations, planning, ticketing, etc.). Additionally, they are creating digital twins for most real-world functions (customer activity, flights, airports, crews, etc) in order to run gaming scenarios to predict potential service issues as well as business opportunities.

The scope of these operations and the planned expansion provides investors with confidence that this AI trend is real. I imagine that if JetBlue is engaged in building this many AI models, then every other airline is likely pursuing a similar strategy. If they aren’t, then they risk creating a competitive disadvantage. This dynamic shifts investment in AI from a nice-to-have to must-have.

Business operations will become more efficient and lower cost, as expertise is dispersed to all employees, rather than a few highly paid, time-constrained specialists. This will have implications across many industries, whether health care (everyone is a doctor), legal, product design, finance, transportation, supply chain management, etc. Enterprises will be able to produce more output with fewer people, driving profitability. Savings from less staff can be invested into more AI enablement.

I think that these factors will drive a large increase in consumption of new AI-enabled digital experiences. Existing software applications will be re-imagined and redesigned to make use of the improvements in interaction, efficiency and effectiveness. While public Internet consumers experience the most visible benefits, we will likely see a much larger investment and catalyst from internal business applications. The creation of these expert systems for use within enterprises will drive a whole new level of business investment and productivity improvement.

This process may resemble the scramble to launch new mobile applications in the early 2010s, but likely at an even larger scale. Mobile apps increased usage of software infrastructure because humans could access those applications from anywhere. Instead of interacting for an hour a day while seated at their computer, users could engage over many hours from anywhere. Additionally, new hand gestures (touch / swipe) made the interface more efficient.

Yet, mobile apps didn’t make the base software applications more effective. Most mobile apps involved reproducing a similar experience to that exposed in a web browser. With AI-enabled applications, though, we get both benefits. The interface is more efficient due to natural language and the effectiveness of the application will be much greater as a consequence of more powerful reasoning engines. Combined, these two factors should generate more application usage (more interaction, more data scope, more processing).

If employees are more productive, then enterprises will need fewer of them. This reduction of department headcount will free up budget to pay for the software that drives this productivity. Whether it is $20/month for ChatGPT or $30/month for Microsoft’s new 365 Copilot, a $100k annual all-in cost per corporate information worker (salary, benefits, space, etc.) will pay for a lot of software.

A recent study from consultancy McKinsey, reported that AI has the potential to improve productivity in a number of common information worker functions in large enterprises. Some examples include:

  • Sales productivity increased by 3% to 5%.
  • Marketing productivity increased by 5% to 15%.
  • Companies saved 10% to 15% on R&D costs.
  • Software engineering productivity increased by 20% to 45%.
  • Customer service productivity increased by 30% to 45%.

As these productivity enhancing AI services ramp up, I think the rate of hiring will slow down within the Global 2000 for traditional knowledge workers. Obviously, enterprises will be careful with this messaging, as they don’t want to fuel the “AI is replacing jobs” narrative, but I think the writing is on the wall. In some cases, executives aren’t avoiding it. IBM previously announced that they are pausing hiring for information worker roles that they think could be replaced by AI at some point. This is projected to impact about 7,800 workers over several years.

When fewer high cost information workers are required to accomplish the same output, that savings can offset the cost of additional software automation. IBM’s CEO intends to invest more in AI and automation to address these corporate functions. That implies a shift of more corporate budget to IT and associated software services.

As part of Accenture’s latest earnings call, they cited an internal survey in which executives at customer companies were asked about their plans for AI. The results indicated that 97% of executives expect generative AI to be transformative to their industry and that 67% of organizations are planning to increase their spending on technology in general, with prioritization for investments in data and AI.

"And so while it is early days, we see generative AI as a key piece of the digital core and a big catalyst for even bigger and bolder total enterprise reinvention going forward. In fact, in a survey of global executives that we completed just last week, 97% of executives said Gen AI will be transformative to their company and industry and 67% of organizations are planning to increase their level of spending in technology, prioritizing investments in data and AI." Accenture Q3 2023 Earnings Call, June 2023

Themes and Likely Beneficiaries

With that background, let’s explore how the rush to pursue AI strategies may change the prioritization of enterprise IT spend and what capabilities become more important. In some cases, AI is accelerating the need for some services (real-time data) or even reversing the direction of some trends (processed data versus raw). Additionally, while themes around governance have always been important, AI injects new considerations into data privacy and even data sharing.

In the Snowflake Summit keynote, their CEO said that in order to have an AI strategy, a customer has to have a data strategy. He is referring to the fact that most enterprises are sitting on a treasure trove of data, unique to their industry. When OpenAI rapidly rolled out iterations of ChatGPT, many technology analysts thought those generalized capabilities might disrupt a number of traditional industries. This was because ChatGPT’s intelligence was easy to extrapolate to solve all kinds of problems.

However, what has become clear is that a large language model is only as effective as its training data set. With the public Internet available for training, it is very difficult for ChatGPT to discover unique insights with industry-specific context in business segments like manufacturing, retail, health care, supply chain, transportation, finance and more.

ChatGPT and other generative AI tools have served an important purpose, though. They provide a very visible example of what is possible with generative AI and the application of new large language models. Every C-level executive can extrapolate those capabilities to envision uses within their business. Their personal experience with ChatGPT (or their kids’) develops a new frame of reference.

Pre-trained models, like those from OpenAI, serve as a valuable starting point for industry solutions. It’s estimated that 95% of data processing is saved by bootstrapping an AI service with a pre-trained foundation model. The last 5% represents the critical and unique contribution to transform the custom model into a working solution applicable for a whole bevy of industry specific use cases. This last 5% of data comes from enterprises.

After extensive pre-training, the final steps of inference and fine-tuning can be accomplished with a targeted data set. This highlights the need for enterprise data to be clean, structured and recent. It is what Snowflake’s CEO was referring to. In order for enterprises to realize all the AI value from their proprietary data sets, they need to update the infrastructure that collects, processes and stores that data.

Databricks’ CEO expressed the opportunity for enterprises as succinctly. Enterprises will create competitive advantage by harnessing their proprietary data sets to build their own unique AI models. These will differentiate them not just from other players in their industry, but the first wave of popular AI services built from content on the public Internet.

The use cases for enterprises will be specific to their business operations, offering huge opportunities to automate decision making, improve customer service and empower employees. These services can also be shared within their industry’s information value chain, made available to key partners, suppliers and service providers. These use cases aren’t as interesting to public Internet users, but stand to drive enormous productivity gains for businesses. I think the market for these internal AI enterprise services will be much, much larger than what the public has experienced thus far through popular chat agents like ChatGPT.

And this is the core of the opportunity for a number of AI-adjacent data providers. These companies can ride the coattails of the rush to capitalize on new AI-driven capabilities and the desire to unlock whole new services, insights and automation within many industries. As enterprises invest in AI, data infrastructure providers can both facilitate direct access to AI models and help improve the end-to-end pipeline of data that will feed them. This makes the whole AI investment surge a potential tailwind for data processing and storage companies.

Databricks and Snowflake stand at the center of this, offering extensive platforms of capabilities with a large set of engaged enterprise customers. Both are evolving their platforms quickly to position themselves as the ideal platform for the convergence of data and AI. Let’s review some of the data themes elevated by AI and examine how these two data providers are positioning their platforms. This will be passed the lens of what they prioritized for announcement at their respective user conferences at the end of June.

Get as Much High Quality Data as Possible

With high processing costs to train extensive AI models, the data storage industry is returning to its bias towards succinct and clean data to load into AI models. This increases the importance of high quality data sources, favoring structured models like those found in a data warehouse. While storage of data in raw, unstructured formats, like those in data lakes still exists, they are often viewed as a preliminary state in advance of AI processing.

This represents a reversal from the industry’s general trend over the last few years, where data vendors were trying to accommodate vast data lakes supporting the ability to query across multiple data types. The emergence of AI is pushing the industry back towards an appreciation for clean, formatted, well structured data sources. Generalized LLMs are powerful, but expensive. Custom models can leverage curated data to fine-tune pre-trained models for less money.

A shift towards pre-processed data as an input for AI inference favors those solution providers that maintain clean data structures with tight controls over data access. Performance is important to contain costs, as data sets expand. Granular governance over data sources ensures proprietary enterprise data isn’t leaked into public models or exposed to partners. An platform architecture that favors centralization of data would support the simplest way to ensure security, control and performance.

Given these trends, Snowflake is doubling down on the idea that the Data Cloud represents the best place for enterprises to consolidate all of their data. By migrating all enterprise data onto Snowflake’s platform, customers can ensure they have full control over it. This also minimizes latency associated with pulling data across a remote network connection from a distant source for processing. Governance is easy to enforce if the data is all in a vendor-specific format.

Where customers need to manage data outside of the Snowflake platform, they can leverage Iceberg Tables as Snowflake’s universal open format. At Summit, Snowflake announced the extension of this open format to imbue governance concepts into the management of the data. Customers will be able to designate whether their Iceberg Tables inherit governance from Snowflake (managed) or allow another engine to handle access controls (unmanaged). Native and External Tables are being consolidated as part of this.

The important distinction that Snowflake makes on their approach is that this solution introduces no loss of performance. Customers who choose to store data in open formats will get comparable performance to data stored in internal Snowflake tables. This performance guarantee represents the trade-off between supporting multiple open formats (like Databricks) and consolidating on a single format (like Snowflake Iceberg Tables).

Databricks Moves Towards Controlled Openness through Unity Catalog

Databricks, on the other hand, is taking a more open approach to data quality and management of input sources. This starts with accommodating multiple data formats. Where several data management providers (hyperscalers, Snowflake) are trying to select a single data format for storage, Databricks is pivoting towards making the issue a moot point with their introduction of Delta Lake 3.0.

With Delta Lake 3.0, Databricks essentially created an abstraction layer over all three data formats (Delta Tables, Hudi and Iceberg) and called it UniForm. This manages the metadata associated with each format and then translates all three formats out to Parquet so that the data can be operated upon without worrying about the underlying storage format.

This is an interesting, and arguably strategic, move by Databricks to sit above the fray and become the connective layer between all these data sources. For the other players in the ecosystem, it likely increases their determination to make their format the standard. Or, they will acquiesce and gravitate towards a similar approach as Databrick’s universal format. All of that abstraction would impact performance, though, so there is an advantage to having a single format for raw processing speed.

This commitment to openness by Databricks is being manifested in other ways as well. They are hanging governance and data source auditing on their Unity Catalog, which has been expanded to support more data types and tracking functions. Unity Catalog provides a central view of all resources and workflows within a customer’s Databricks instance. Besides just managing permissions and access control, it has deep auditing and lineage capabilities, allowing users to track the source of data through history.

During the Data + AI Summit, Databricks announced three major additions to the Unity Catalog. These extend its reach to more data sources, expand governance capabilities and allow for broader monitoring.

Enterprise-wide reach. The reality in most enterprises is that data is scattered across multiple silos. While the ideal from Snowflake’s perspective is that data teams consolidate all of this data onto a single platform through a series of migrations, that will take time and teams need to harness this disparate data now.

To address this need, Databricks introduced Lakehouse Federation into public preview. This supports a data middleware that can connect to multiple data sources (including competing ones) for discovery and querying. It can set and enforce data access controls on each data source down to a granular level. To improve performance, it can perform query optimization and cache frequently accessed data.

In the future, it may even push data access control policies back down to the data sources. This would be a very interesting extension, providing a central control plane for governance across all data sources.

Governance for AI. Beyond working with structured tables in data sources, AI requires interaction with other types of artifacts. Examples include unstructured files, models and feature definitions. Databricks has extended Unity Catalog to work across these new AI structures, allowing data teams to manage access all in one tool. This also injects lineage across data and these AI artifacts, so that teams can keep track of all dependencies. These capabilities are part of Unity Catalog for AI, which is available in public preview.

AI for Governance. AI can be applied to governance to make it easier to manage a large-scale data pipeline. Databricks is making new capabilities available as part of Lakehouse Monitoring that perform quality profiling, anomaly and drift detection, and data classification across both data tables and AI models. The output of this monitoring can be exposed in a variety of formats, including tables, visualizations, alerts and policies.

As a useful example, Lakehouse Monitoring can be trained to look for PII within data tables and then generate an alert when it is identified. This capability is in public preview for select customers to use.

With these expanded capabilities, Unity Catalog underscores the emerging core strategy for Databricks. They intend to become the connective tissue across all data sets within the enterprise, allowing users to securely access, share and process all data, no matter where it is located (even if on Snowflake). This isn’t being done in a loosey-goosey way either, with strict governance controls spanning all sources.

This elevates Databricks to a status of spanning all clouds and data providers (sometimes referred to as a supercloud function). This posture aligns well with their positioning around openness and open source, as opposed to a closed system. Over this architecture, they are allowing enterprises to build AI models and support generative AI services. Given that AI performs better with access to more data, this positioning of supporting an open ecosystem plays well.

Snowflake has a similar broad vision, which is to make all of an enterprise’s data available in one system, allowing data teams to easily build and serve AI models. Their strong preference is that all data is stored in the Snowflake platform. Where Snowflake needs to connect to external data sources, they make that possible through Iceberg Tables, but the bias is towards bringing the data onto the platform eventually. All of Snowflake’s supporting services around deep governance, collaboration and fast processing just work better this way. This makes Snowflake often referred to as a “closed” system.

Closed systems aren’t inherently at a disadvantage, though. They are generally easily to manage (fewer interfaces to maintain), perform better (less layers of abstraction) and have a simpler infrastructure footprint. Security and privacy are easier to control and therefore less likely to fail. Historically, closed systems have had more commercial success in software infrastructure than open ones.

However, Databricks’ bias towards openness and connectivity allows new customers to harness their disparate data sources faster. Lakehouse Federation enables them to just plug in all their silo’ed data sources quickly. They don’t necessarily give up governance, as Unity Catalog enforces access controls. Performance will likely be slower than having all the data in one place, but customers don’t need to wait on a migration to start fine-tuning their AI models.

Bring AI Compute to the Data

After enterprises have ensured they have maximized the high quality data available to feed their AI models, they need to actually run them. A major trend driven by the growing consolidated data sets is the ability to bring the AI model processing closer to the data itself. There are a couple of justifications for this. First, the pure gravity of the data set increases the cost to move it around. Second, the requirements of privacy and security are more easily accomplished if the data remains within a controlled data storage environment.

These drivers have led the big data storage vendors to expand their capabilities to deliver AI runtimes within the same environment as the data. Those providers have historically already invested extensively in data governance capabilities to manage access to approved data sets at a very granular level. By expanding their capabilities to provision an application runtime adjacent to the data, these providers can extend the same data governance controls to the applications themselves.

With AI workloads, this requirement is just as important. A model trained on proprietary data can reveal sensitive data. Even internally, there may be different employee cohorts with varying levels of access. Enterprises will likely fine-tune hundreds of smaller custom AI models to support compartmentalization of data between users. Systems of governance can be extended to the AI models and the interfaces used to access them within the same environmental boundary already managed by the data provider.

The alternative is to copy proprietary data outside of the system of record to train a model or perform fine-tuning. This copying is not just costly, but also dilutes the security model for that data. User access permissions would need to be duplicated for each application that has access to each custom AI model, creating a lot of overhead for the security team.

Data providers have been working hard to address these issues by bringing a development environment and runtime into the core data platform. At their annual Summit user conference in June, Snowflake announced a number of new capabilities to accomplish this. These go beyond existing modules, like Snowpark, that support application code written in specific languages (Java, Python, Scala). The new capabilities span partnerships, products and extensions of previous announcements, greatly increasing the extensibility of the platform and flexibility for developers.

The most encompassing product development is the introduction of Snowpark Container Services. Containers provide developers with a reproducible runtime environment in which to execute their packaged code. Snowpark Container Services allows data engineering teams to either build their own applications from scratch or import ready-made containers from partners. Partner contributions are served as Native Apps and distributed through the Snowflake Marketplace. For portability, Snowflake’s Containers are packaged and distributed as Docker containers.

Containers built and packaged by developers can be written in any programming language (C, Java, Node.js, Python, etc.). The container itself can be executed on either traditional CPUs or GPUs. This broad flexibility goes beyond what had been available through Snowpark previously, which offered more limited language support through structured frameworks like UDFs (User Defined Functions) and Stored Procedures. Snowflake effectively short-circuited the long road of incrementally adding more languages to Snowpark.

With Snowflake Container Services, developers have the ability to create any application that they could on a standard hyperscaler platform. These can still be data science or engineering specific programs, like executing machine learning Python libraries for training or processing scripts for transforming and loading data. Developers can even layer on rich front-end user interfaces with popular Javascript frameworks like React. Applications can be deployed as scheduled jobs, triggered service functions or long-running services with a UI.

Snowflake Container Services provides customers with multiple benefits. They simplify the overhead for developers and data science teams by outsourcing the configuration and management of the hosting environment to Snowflake. Further, containers can access customer data directly within the Snowflake platform, bringing existing governance controls along.

This management of the data is an important distinction. One could argue that Container Services just mirrors what the hyperscalers already offer. The difference is that Container Services can only access the customer’s data through the existing Snowflake governance layer, allowing Snowflake to guarantee access controls and privacy restrictions. Regular containers on the hyperscalers don’t come with this built-in governance.

Further, data teams can leverage Snowflake’s new relationship with Nvidia to unlock a number of AI services and processing power within Snowpark Container Services. This gives developers access to Nvidia’s NeMo framework for extending third party large language models (LLMs), as well as Nvidia’s own internally developed models. Snowflake enterprise customers can use their Snowflake data to create custom LLMs for advanced generative AI services that are specific to their industry domain. These might include chatbots, recommenders or summarization tools.

The collaboration also brings NVIDIA AI Enterprise to Snowpark Container Services, along with support for NVIDIA accelerated computing. NVIDIA AI Enterprise includes over 100 frameworks, pre-trained models and development tools like PyTorch for training, NVIDIA RAPIDS for data science and NVIDIA Triton Inference Server for production AI deployments.

The big advantage of this relationship is that enterprise customers can make use of Nvidia AI services without having to move their proprietary data outside of the fully secured and governed Snowflake platform.

“Data is essential to creating generative AI applications that understand the complex operations and unique voice of every company,” said Jensen Huang, founder and CEO, NVIDIA. “Together, NVIDIA and Snowflake will create an AI factory that helps enterprises turn their own valuable data into custom generative AI models to power groundbreaking new applications — right from the cloud platform that they use to run their businesses.” Snowflake Press Release, June 2023

In addition to Nvidia, Snowflake secured partnerships with a number of leaders in the AI space and related data analytics providers. For example, customers can run Hex’s industry-leading Notebooks for analytics and data science. They can tap into popular AI platforms and ML features from Alteryx, Dataiku and SAS to run more advanced AI and ML processing. Other launch partners include AI21 Labs, Amplitude, CARTO, H2O.ai, Kumo AI, Pinecone, RelationalAI and Weights & Biases. All of these partners are delivering their products and services within Snowpark Container Services.

Snowpark Container Services is in private preview. It enhances the rapid adoption that Snowpark has already achieved. In his opening remarks, Snowflake’s CEO shared that in Q1 more than 800 customers used Snowpark for the first time. About 30% of all customers are now using Snowpark on at least a weekly basis, up from 20% in the prior quarter. Consumption of Snowpark has increased nearly 70% q/q, after being released just 6 months ago.

Snowflake’s strategy with these marquee partnerships like Nvidia is interesting. Understanding that they are not an AI company and unlikely to compete on AI capabilities, they are sticking to their core competencies and leveraging partnerships to offer their customers the best-of-breed AI capabilities within the Snowflake environment. This contrasts somewhat with Databricks’ approach, which is to develop or own (through the MosiacML acquisition) AI models that they can share with customers.

As an example of the potential benefits of Snowflake’s approach to deliver best-of-breed AI processing through partnerships, Nvidia claims that their RAPIDS processing architecture can generate significant performance improvements over other pipelines for machine learning data prep and model training. RAPIDS is a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs. RAPIDS leverages Nvidia’s years of development in graphics, machine learning, deep learning and high-performance computing (HPC) to deliver high performance. This capability is available to Snowflake customers directly adjacent to their data within the secure Data Cloud.

Snowflake Streamlit Delivers a Pre-packaged User Interface

Streamlit provides an important front-end for AI powered experiences on Snowflake. It is an open-source Python library for app development that is natively integrated into Snowflake for secure deployment onto the platform. Customers can use Python to create interactive applications from data and ML models. Besides supporting custom code in Python, the tool provides a visual interface for selecting UI components and configuring them in a preview screen. Deployment is then initiated with a single click.

Streamlit has been in private preview since last year with a lot of customer interest. Now, it is being leveraged as a ready-made front end to bootstrap AI applications. As of the Summit conference, Snowflake leadership shared that over 6,000 Streamlit powered apps with generative AI or ML models behind them have already been built on the Snowflake platform. At the conference, leadership committed that Streamlit will go into public preview within a few weeks.

Snowflake’s vision is the same as it has been, as they look towards new opportunities from generative AI and LLMs. Enterprises on Snowflake have already invested significant cycles in organizing their data, creating roles, setting access policies and establishing overall governance. Snowflake wants to honor all of that investment and allow customers to capitalize on the full value of generative AI without having to move data off of the platform.

Customers can leverage either Snowflake’s internal first party models or import third party models from any source. They can perform full model training, or just fine-tune the models with their custom data. This can all be performed securely within the Snowflake platform without exposing their proprietary data to outside parties.

With that said, Snowflake is developing some enhanced AI capabilities internally. The product team introduced several new capabilities to improve Snowflake’s AI processing support on the platform. The first is Document AI, which allows customers to extract data from unstructured documents. These can even be an image of the document and the service will use OCR to translate it into text.

Once the text is extracted, it can be stored in a large language model and queried by users. Document AI will be useful for customers where documents constitute a large part of their data workloads, like businesses in healthcare. The content of these documents can be extracted and utilized as part of model training in the same way as structured data within the Snowflake platform. Document AI is the manifestation of the Applica acquisition announced last year. The capability is now available in private preview.

Snowflake also announced ML-Powered Functions in public preview. These extend machine learning capabilities to a broader audience, specifically analysts who traditionally operate in SQL. This allows analysts to create standard functions using SQL, but access three different machine learning techniques to enhance the response.

The three ML frameworks available in the first release are Forecasting, Anomaly Detection and Contribution Explorer (what conditions caused a problem). The business benefit for the customer is that it empowers the business analyst to be more self-reliant to address common machine learning investigations themselves. For Snowflake, these types of queries would drive more consumption.

Snowflake also provided an update on Unistore. It is still in private preview, taking longer than expected to be ready for public release. As a show of confidence, five of the private preview customers are using Unistore in production, in spite of Snowflake’s caveats. They are targeting public preview near the end of 2023.

This is a bit disappointing, as Unistore brings the promise of consolidating transactional and analytical workloads onto a common platform. The value proposition remains the same and customers appear engaged so far. I look forward to seeing the actual use cases that enterprises adopt once this data storage engine is publicly available. At the very least, it could power read-heavy applications that are data rich.

Snowflake is moving more quickly with the Native App Framework, announced at Summit last year. This allows third-parties to build, distribute and monetize apps natively in the Data Cloud. The big advantage of these is that they run next to a customer’s data, keeping that data private, even from the app developer.

The Native App Framework was in private preview and has been promoted to public preview on AWS. As part of that, Snowflake announced that 25 partners had already produced about 40 apps that are available in the Marketplace. The number is relatively low because the Snowflake team is performing thorough quality control on the Apps (like the Apple Store), where they test applications for security, performance and reliability before allowing the App to be listed. Snowflake leadership claims that some of these provider Apps, like DTCC and Bloomberg, are bringing new customers to Snowflake.

To allow customers access to more robust ML models, Snowflake made two other product announcements. First, they introduced two new libraries in public preview. The first supports feature engineering, which enables users to prepare data to be fed into an AI model. The other supports actual model training within Snowflake. Fidelity was an early private preview customer of these libraries for their internal use cases.

A final major feature was the announcement of a Model Registry for Snowpark. This allows customers to manage their growing repository of ML models to help support ML Ops. With the Model Registry, customers can discover, publish and share models with a governed view of the model artifacts and associated metadata. Registered models can be easily deployed for inference. Customers can expose the models and results for internal developers and SQL users to consume.

Databricks Adds Adjacent Compute as Well

Not to be outdone, Databricks introduced a number of new capabilities at their Data + AI Summit which add compute capabilities in close proximity to the data. In many ways, Databricks has already been there, being built on top of the Spark engine. If Snowflake’s CEO says that “in to order to have an AI strategy, you have to have a data strategy”, Databricks makes a similar call to action of “data needs to be at the center of your AI strategy”.

The Databricks platform supports building end-to-end AI applications through three main steps. These encompass what they call Lakehouse AI. It starts with the data. The user has to collect the right data and prepare it in the optimal format for machine learning. Next, they need to identify the appropriate models and tune them. Finally, users can make model output available to end consumers through applications, with monitoring of performance and strict governance.

Databricks’ Unity Catalog supports all of these steps, delivering data source discovery, access controls, governance, lineage, auditing and monitoring. This AI workflow should be performed inside of the data platform. That way, proprietary data used to build models isn’t leaked out into the public space. Managing the data is the hardest part of these three steps – collecting large data sets, running them through models and using data to measure the effectiveness of the AI solution. Keeping all these steps in one platform allows for a common user access, governance, data and UI model.

To make each of these capabilities more AI-centric, Databricks announced a number of new capabilities at the Summit. Over the last year, they have upgraded the capabilities of the Lakehouse AI platform to work better with new generative AI practices and artifacts.

To support more robust preparation of data for AI processing, Databricks management introduced two new technologies to be released in upcoming months.

  • Native Vector Search. This applies to text-based documents or unstructured data stored in Databricks, like internal business process documentation. Vector search can parse this text into tokens and build an index of the relationship between them. Vector Search exposes an endpoint for the AI model to query.
  • Support for Online Feature Serving. Allows the data model to include contextual transactional data to customize the response for the user querying the model. This is critical to enabling enterprises to add their proprietary context and customer data to AI-powered applications.

Combined, these new capabilities allow LLMs to craft a relevant response to the application query. The other half of the flow is of course the AI models themselves. The AI model is the most powerful part of the application and Databricks wants to provide developers with many choices for their models. Databricks has updated the Lakehouse AI stack to help application developers find, tune and customize those models. As a starting point, they have built in support for a number of the most popular proprietary models available as service from third parties, including OpenAI, Bard, MosiacML, Cohere and Anthropic.

Many customers are also utilizing open source models. To help customers manage them, Databricks announced their open source Model Library. They identified all the best-of-breed open source models, packaged them up and made them available inside of Databricks. They also optimized access to accommodate high performance requirements.

Databricks will also be developing ways for users to take these open source models and customize them. This allows customers to bring additional data into the model, perhaps after it has been exercised in production for a period, to further enhance the model’s effectiveness. This could include responses that were scored positively as an example. They will also support tuning, latency and cost optimization trade-offs.

Once models are in production, they need to be evaluated. The user wants to constantly score the model for accuracy, bias, effectiveness and other metrics. To support this, Databricks released support for MLflow evaluation. Users can measure the responses from multiple models and A/B test them. This might be useful when comparing the performance of a SaaS model to a tuned OSS model.

To support outside model access, Databricks introduced the MLflow AI Gateway. This enables enterprises to consolidate and access all of their third-party AI use cases through a single gateway. This works across the major SaaS AI models, keeping credentials and logs in one place. It also manages shared credentials and enforces rate limiting where needed.

The last step is going to production. This requires continuously running infrastructure to serve the AI application. Databricks has offered an ML Inference product for two years, which has been growing rapidly. At the Data + AI Summit, they added GPU support for inferencing and also tuning and optimizing the most popular LLM families. This will deliver even higher performance at serving time.

Finally, as the application is being served, it should be monitored. Monitoring for ML models is a data problem. Customers should log and analyze every response given to end users. The model’s performance should be correlated to whatever business metrics are important. An example might be a successful customer service outcome without requiring escalation to a human.

These metrics can be collected and incorporated into a monitoring dashboard that users can track. To help customers with this, Databricks introduced Lakehouse Monitoring. This allows users to identify relevant performance data to collect, aggregate it into charts and graphs and then display that information in a historical context within a dashboard.

Secure Collaboration

AI raises the usefulness of secure data sharing as well. Data quality for a custom model would be further enhanced if an enterprise could increase the scope of data they have available to fine-tune it. While each enterprise will closely guard their proprietary data, there is an argument for a few companies to form a strategic alliance within their industry category to deliver a better solution than the rest of their competitors. This type of collaboration could be facilitated by secure data sharing.

Snowflake has long supported data collaboration through a number of capabilities. They started with secure data sharing, which allows one Snowflake customer to make a data set available to another customer in a protected location on the Snowflake platform. In this case, the recipient is not provided a copy of the data. To make a data share more actionable, Snowflake later introduced Clean Rooms, which allow the recipient to run data processing against the shared data and only store the result. This enables the sharing party to keep the majority of the data hidden from the recipient.

As a result of Snowflake’s long time offering of data sharing capabilities, the feature enjoys high penetration among customers. This is particularly entrenched within Snowflake’s large customers. During Snowflake’s Investor Day during Summit, leadership revealed that 70% of their $1M+ ARR customers are using data sharing.

They even provided an example of data collaboration by technology service provider FiServ, which spans multiple data sharing partners. What is noteable is the cascading chains of data sharing and processing, extending out to multiple partners and customers. The benefit for Snowflake and other data providers with data sharing is that the capability is very sticky. In Snowflake’s case, the participants need a Snowflake account in order to use data sharing. This network effect both encourages new customers to join Snowflake and also discourages them from leaving.

Recall that Snowflake has invested heavily in the past in the creation of industry specific solutions. These represent an ecosystem of participants in a particular industry segment, who can interact through systems of shared data and special product offerings curated for that particular vertical. As each of these verticals tries to apply AI to their domain, Snowflake will be well-positioned to offer domain-specific capabilities. They can feed foundation models to create unique offerings that possess specific industry context with the latest data.

Snowflake can help ecosystem participants securely assemble larger data sets through controlled sharing. They can extend foundation models with domain specific context and offer them to ecosystem participants. While individual enterprises will closely guard their proprietary data, they will realize that collaborating with a couple of other participants in the same sector might result in an even better AI-driven offering. We should see the emergence of tightly controlled enterprise alliances that revolve around data sharing to create better AI models for their industry that disrupt other participants. Snowflake’s sophisticated data sharing capabilities will become a huge advantage here.

While providers in the Snowflake Marketplace have been focusing on selling curated data sets, AI models provide a whole new layer of services that Snowflake can offer through the Marketplace. As we saw with the Data Cloud Metrics in the Q1 earnings presentation, sequential growth of Marketplace offerings slowed down to 3% Q/Q growth. I’m not surprised, as there are likely only so many generic demographic, weather, financial, etc. datasets that can be sold. However, rich, contextually-aware AI models distributed through the Marketplace could provide a whole new growth vector for vendors.

Databricks Ramps up Collaboration Features

Acknowledging the opportunity in the collaboration space, Databricks significantly enhanced their capabilities around data sharing, clean rooms, marketplace and apps as part of the Data + AI Summit. These enhancements followed the similar theme with Unity Catalog to make the Databricks platform open to both customers and non-customers.

For secure data sharing, Databricks introduced their open Delta Sharing service about 2 years ago. This works across any platform. The Delta Sharing protocol allows the provider to determine what data to share and who is allowed to access it. The consumer need only implement a Delta Sharing client and subscribe to the source published by the provider. With minimal configuration, the consumer will get access to the shared data in their client application.

Delta Sharing has become a popular capability, with 6,000+ active data consumers (not necessarily 1:1 with customers) and over 300 PB of data shared per day. On top of Delta Sharing and Unity Catalog are built the three primary features of Databricks’ Collaboration platform. These include the Marketplace, Lakehouse Apps and Clean Rooms.

The marketplace had been in preview mode for several months, with hundreds of listings. At the Data + AI Summit, Databricks moved the Marketplace to General Availability. Providers can share data sets and notebooks, across the public marketplace or a private exchange to share data securely with select organizations.

Coming soon, customers will be able to discover and share AI models through the Marketplace. The user can search for a pre-trained AI model that matches their use case (Databricks demo’ed a medical text summarization model), read about the model and even submit some sample queries against it. If the user indicates they want to subscribe to the model, the provider then provisions the models and a notebook into the user’s workspace. The user can run the model directly within the notebook or invoke it through a model serving endpoint.

Models available in the marketplace will span both open source and proprietary models. AI Model sharing will be very useful for customers with access to proprietary data, who want to fine-tune an existing model for their use case. On the other hand, some customers may not have access to sufficient data or want to short-circuit this process for some use cases, where the investment in customizing their own model isn’t warranted.

To accommodate this, customers could access a full-blown application that provides the desired functionality with a simple interface. Databricks introduced Lakehouse Apps to address this case. This provides a new way to build, deploy and manage applications for the Databricks platform.

Developers will be able to create Apps using any language and then run them on the Databricks platform within a customer’s instance. The proximity of the application to the customer’s data on Databricks provides a secure environment, without risk of data leakage. It also avoids lengthy security or privacy reviews, as the application never moves data outside of Databricks. The incentive for developers is to gain access to Databricks’ 10,000+ customers. Early App developer partners include Retool and Posit.

When a customer wishes to subscribe to an App, they can search the Marketplace for a provider that matches their target use case. Then, they can install the App and designate the data set which it will be allowed access. Access permissions are controlled in Unity Catalog and can map to all of a customer’s data assets (whether on Databricks or not).

Finally, Databricks announced Clean Rooms as part of the user conference. This addresses the use case where multiple data collaborators want to share data, but only after data processing scripts have been run to create a subset of data appropriate for sharing. This limited sharing might involve a use case between one or more retailers and an advertiser. The retailers don’t want to expose all of their consumer data and only reveal details for the intersection of consumers on a particular media site.

Clean Rooms provide the ability to run data processing jobs in a secure environment on Databricks and only share the output between parties. Those jobs can be written in any language and run as a controlled workload in a secure runtime on the Databricks platform. Privacy is maintained as the output is only made available to the participants, without exposing the source data (only Databricks can read that).

In the future, Databricks plans to add more capabilities to the Collaboration suite. These additions include transaction support in the marketplace and code approval workflows for Clean Rooms. For those following Snowflake’s history, these new features are very familiar. While Databricks is following Snowflake’s strategy with collaboration features, they significantly round out the Databricks platform’s feature set, bringing it closer to being on par with Snowflake’s capabilities in this area.

Real-time Data Enhances AI Performance

As enterprises realize that AI efficacy is improved with high quality, proprietary data, interest in making sure that data is recent is increasing as well. This provides a catalyst for upgrading data infrastructure stacks to include data streaming, stream processing and real-time data ingest. Both Snowflake and Databricks announced new capabilities to handle the ingestion of real-time data streams. As investors will recall, MongoDB announced a new module for stream processing at their MongoDB.local user conference in June as well.

For a clearer view of how modern data infrastructure providers could benefit from the increased use of newer foundational models by enterprises, we can refer to a diagram provided by Confluent at their Investor Day. With traditional machine learning processes, the primary focus by enterprises was on performing custom training and feature engineering. These models were created by loading large enterprise data sets through a batch function from a data lake or data warehouse. Once the base enterprise model was established, inference would customize results for each request. Inference generated some data access, but not at the same volume as the original model construction.

With Generative AI, LLMs and other foundation models, a third party often provides the generic model pre-trained with public data. The enterprise then applies much heavier inference to inject its contextual data into the generic model. This inference can be applied in real-time for every user interaction. Given the scope of functions and data addressed from a chat-based interface, the volume of data accessed in real-time to deliver a customized response for each user (based on their history) could actually be much larger and broader in scope than what was required to build a standard ML model under the prior method.

To get a sense for the demand associated with leveraging real-time data streams to better inform AI services, Databricks commented on growth of streaming jobs that feed Delta Live Tables on customer instances. Over 50% of customers are now using real-time data streaming. Weekly streaming jobs have grown by 177% over the last 12 months.

The Databricks CEO described usage of this feature as being “on fire”. He said that while a lot of people are excited by the potential for generative AI, they aren’t paying attention to “how much momentum streaming applications now have.”

Snowflake has also been supporting streaming data ingestion for several years through Snowpipe Streaming. This supports standard ingestion frameworks and integrates with popular streaming sources. The improvements that the Snowflake team has been focusing on is reducing the latency of the streaming load. At this point, data landed in Snowflake can be accessed within a few seconds, versus minutes previously.

Snowpipe Streaming is supported by Dynamic Tables, which allow users to perform data transformations as data is being piped into Snowflake. Data can be joined and aggregated across multiple sources into one Dynamic Table. As those data sources update, the Dynamic Table will refresh to reflect the latest results.

Databricks and Snowflake are using Kafka as the source for their real-time streaming data ingestion. If they are noting such significant growth in usage of data streaming capabilities, that likely implies strong demand for streaming services from providers like Confluent (CFLT). If investors are searching for another company that might be an indirect beneficiary from the rush to leverage AI, CFLT is worth a look.

Investment Plan

Software and data service providers would like to power as many of the steps in the AI value chain as possible. Foundation models are becoming ever more available through open source, the public domain and commercial offerings with API interfaces. The generic steps of serving the data inputs (structured and unstructured), training, adaptation and inference could be powered by a single platform. This platform would provide the foundation for an ever-increasing number of domain specific, AI-enhanced tasks that are incorporated into enterprise application functions.

This all implies that cloud-based data storage and processing engines like Snowflake and Databricks would be very useful for adding user-specific context to any pre-trained model. As this data is often requested in near real-time, overall consumption of storage and compute resources would logically increase for the customer. This increased demand should drive more revenue for data and AI platform providers.

New AI application investment doesn’t need to require incremental budget from most enterprises. New costs can be offset by the productivity gains for knowledge workers. Enterprise departments will find that their employees can accomplish more with AI-driven software services and assistants. They will therefore require less headcount to complete the same amount of work. Payroll costs will decrease, providing savings to be invested in more sophisticated software services, whether digital co-pilots, workflow automation or system-to-system coordination.

As evidenced by the announcements at their respective user conferences in late June, both Snowflake and Databricks are pivoting to address the enormous opportunity presented by the rush to incorporate AI into nearly every business and consumer process. They both added substantial capabilities to their respective platforms. I think Databricks is progressing more quickly, but Snowflake already has a large customer base and generates more revenue.

Both companies are pursuing a huge market opportunity, allowing for multiple winners, at least in the near term. As part of the Financial update during Snowflake’s Investor Day, they sized the market at $290B by 2027, or about 3-4 years from now. This applies to both Snowflake and Databricks, as well as the hyperscalers. Between those two, we know that current annual revenue is below $5B and even Snowflake has only projected their revenue to $10B by 2028.

That leaves a lot of market share for Snowflake and Databricks to grow into, even with the hyperscalers positioning for their portions of spend. Both companies are rapidly converging in terms of platform capabilities, but still appeal to slightly different customer segments. I think this will allow them both to continue growing substantially from here, much like the hyperscalers coexisted during the surge in cloud migration.

Of course, the hyperscalers themselves are pursuing their own slices of this market. I purposely leave out comparisons of their product positioning partially for brevity, but primarily because many enterprises still prefer a neutral solution for their data storage and analytics. Many of the customer testimonials at both Snowflake and Databricks Summits emphasized the value of using an independent platform to avoid lock-in with one of the hyperscalers. This bias towards a hyperscaler-neutral data platform may not be requirement for all enterprises, but will likely represent the preference for most.

While Databricks is a private company, investors have SNOW for consideration in the public market. I think Snowflake still has tremendous potential, particularly with the probable demand tailwind that generative AI introduces. We will get another update from SNOW’s Q2 earnings report in a couple of months. At some point, possibly in 2024, we might have Databricks to consider for investment as well.
post mediapost mediapost mediapost media
www.snowflake.com
Snowflake Dynamic Tables and Declarative Streaming Data Pipelines
Dynamic Tables automate incremental data refresh with low latency using easy-to-use declarative streaming data pipelines to simplify data engineering workloads.

MongoDB Q1 FY2024 Earnings Review
After guiding for a sequential 4% drop in revenue for Q1, MongoDB delivered a strong beat. More importantly, their preliminary estimate for Q2 revenue would achieve a reacceleration of annual growth if they outperform at the same level as Q1. The revenue projection for Q2 even leapfrogged past the analyst consensus for Q3. While the market expected some conservatism, this level of outperformance caught investors by surprise, with the stock surging 28% the next day.

Equally impressive were improvements in profitability. In the past, MongoDB has been discounted for poor operating leverage. The transition to 2023 has brought record levels of operating income and FCF, closing the gap with peers in the software infrastructure space. This also led to a significant beat and raise on EPS, which we don’t often see with high growth SaaS companies.

Even customer activity notched records. Both total customer additions and those with spend over $100k in ARR represented all-time quarterly highs. Of new customers, over 200 companies are categorized in the burgeoning AI industry, providing another catalyst as these start-ups are landing new capital at levels on par with the Covid beneficiaries of 2020-2021.

MongoDB has emerged as a hot stock once again, with its valuation multiple now pressing up against the top of its peer group. The stock has more than doubled in 2023 and reached its 52-week high recently. MongoDB is well-positioned to capitalize on tailwinds from AI, as enterprises revamp their data infrastructure to delivery new insights and services from their proprietary data sets. With new product announcements at MongoDB.local in June, MongoDB is further supporting the case for consolidation of application workloads onto its developer data platform.

In this post, I review the Q1 results and try to project the trajectory for the rest of 2023. I also loop back on MongoDB’s product strategy and update their positioning with all the major announcements from MongoDB’s recent annual user conference. This event brought new capabilities that should enable the next phase of MongoDB’s growth.

MongoDB Product Strategy

MongoDB’s product vision is to deliver a reliable, scalable and easy to use data platform to serve as the transactional store for modern software applications. This hinges on their flexible document-oriented data model, which aligns to how developers work with data within applications. They have extended the platform to address multiple data storage workloads from the same interface, allowing engineering teams to reduce the number of database point solutions that must be supported. This consolidation lowers vendor costs, reduces infrastructure management overhead and simplifies application architecture.

Long ago, the MongoDB team recognized that the document model could be extended to support the query and storage patterns of several other popular data schemas. To abstract the differences, MongoDB built a Unified Query API which facilitates the translation of different data models into the core document store. Adjacent data types can be easily modeled in a document form and then queried through the API using the same access patterns. These extensions address most “document adjacent” data models like key-value, time series, graph, geospatial, search and even denormalized relational.

MongoDB remains the most popular non-relational database solution on the market. With their expansion to address other document-adjacent data types, MongoDB is delivering a broader multi-model data platform. The platform has also been extended to support specific application delivery patterns like mobile, in-app analytics, charting and search.

As the data platform’s applicability expands, it becomes suitable as the backing data store for many applications within the same enterprise. Most modern enterprises allow developers the freedom to choose their technology stack from a preset selection of options. MongoDB’s go-to-market with large enterprises involves getting anointed as one of these sanctioned solutions. After that, the broad popularity, ease of use and familiarity of MongoDB with developers drives expansion to serve new application workloads and support legacy database refreshes.

As developers were empowered by enterprise engineering teams to make tool selection, the result has been a sprawl of point database solutions to address multiple data types and workloads. When enterprise IT budgets are flush, this additional overhead is tolerated. When budgets retract, however, engineering leaders will look for ways to reduce expenses. Consolidation of vendors naturally provides savings through economies of scale.

The MongoDB value proposition is to provide a single platform to replace most database types for modern application development. As software applications are increasingly designed (or refactored) around individual microservices, a single data model can be provisioned for each. MongoDB can address most non-relational models from the same platform. Even relational data models that tolerate some denormalization can be represented by the document model.

The advantages of a single data platform revolve around cost savings through volume discounts, fewer vendor agreements, higher developer productivity and reduced DevOps overhead. MongoDB delivers all of these benefits. Additionally, MongoDB clusters can be located on and share data across all three hyperscalers (GCP, AWS and Azure). This cross-cloud capability is appealing as enterprise development teams seek to avoid lock-in with one hyperscaler.

MongoDB continues to add support for new data types. With each, they expand the addressability of a customer’s application footprint. As an enterprise can have multiple database vendors within their infrastructure, MongoDB measures their success with a customer by the number of application workloads they address over time. Growth in workloads feeds their net ARR expansion rate, which remains above 120%.

Through the addition of new data types and workloads, the MongoDB sales team can make the argument to customers that the platform can be leveraged for many types of applications. Following the sprawl of point solutions for every data workload over the past 5 years, this consolidation strategy has a number of targets. These extend to dedicated database solutions for time series, graph, key-value, search and basic analytics.

The net benefit is significant for engineering teams. This consolidation of database engines onto a single platform reduces costs (single vendor economies of scale), simplifies access (single programming interface) and lowers maintenance overhead (back-ups, configuration, etc.). Where MongoDB’s solution represents an adequate replacement for a point database (time series, key-value, search, graph, geospatial, etc), an engineering team can consider the migration of an application workload to it. Additionally, new applications suited for a non-relational data store can be spun up on the MongoDB platform from the beginning.

Recent Product Announcements

Looking forward, MongoDB’s product strategy is to continue to improve the capabilities of existing data workloads and to add support for new ones. This theme underscored their recent MongoDB.local user conference in June 2023. The team unveiled a number of new capabilities and programs, which should drive further growth.

With the rapid rise of AI as a central theme for many enterprises, MongoDB demonstrated they have been keeping up. The platform has even emerged as a preferred component of the modern AI data stack, with many AI companies represented among recent customer additions. Fortunately, the MongoDB product team anticipated the opportunity to serve AI workloads back in 2022, when they began work on their first major product announcement.

Vector Search Workload

As I discussed, MongoDB’s product vision is to enable development teams to utilize one data store for all of their application workloads. To realize this strategy, they have been adding new processing engines to the platform to support different data structures. This provides developers with a single interface to query. For DevOps teams, they have less infrastructure to manage.

To demonstrate their continued momentum in addressing as many transactional workloads as possible, MongoDB announced the addition of support for vector search in MongoDB Atlas. Vector search represents a key capability in the interaction with AI models. As models are trained, data is represented as vectors in N-dimensional space. In the simplest terms, vectors consist of a large set of numbers, unique to each object being modeled by the AI system.

Objects in this context can be anything relevant for the AI model within the business space being represented. General examples include text strings, images, videos, audio files, etc. They can also be any physical or logical construct with parameters, like automobiles, clothing, buildings, medical conditions, insurance coverage and more. In each case, the vector represents the “fingerprint” for each object within that AI model. The relationship between objects is then calculated by the distance between vectors across its entirety, or by combinations of dimensions.

To query the model, vector search retrieves data based on the expressed relationship desired (often nearest neighbor). These results are used to feed different machine learning operations, like similarity, recommenders, personalization and Q&A. The vector space also provides long term memory for LLMs trained on an enterprise’s proprietary data.

In a typical configuration, an enterprise would feed their proprietary data into an AI model to create embeddings, which are represented as vectors. Those vectors can then be stored in a vector database (like Chroma, Pinecone, etc.). Separately, metadata to describe each of the objects in the vector database would be stored in a transactional database (relational, NoSQL, etc.) close by.

If the enterprise is already a MongoDB customer, they could store the object metadata in an Atlas document store for easy retrieval. The platform could also provide some or all of the source data, depending upon how extensive MongoDB’s penetration is. With the addition of vector search, the MongoDB team has added an important capability that really extends their reach into the typical AI workflow. The enterprise can now use MongoDB for all data storage and retrieval steps in delivering an AI-enabled experience, allowing the customer to realize the benefits of a single data store. Additionally, as MongoDB is available on all three hyperscalers, customers can move data around seamlessly, making use of the best (and cheapest) location to run their AI models.

Vector Search is integrated with two popular open source frameworks, LangChain and LlamaIndex, which provide tools for accessing and managing common LLMs for a variety of applications (like from Anthropic, Hugging Face, OpenAI, etc.). Anticipating that vector search would become an increasingly important workload, the MongoDB began a private preview of the capability six months ago with select partners and customers. As a result of that testing and iteration, Vector Search is now being offered in public preview.

With over 200 customers identified as AI companies reported in Q1 and “thousands” referenced during the Investor Session, MongoDB is rapidly making inroads with the AI community. This could become a large tailwind of additional consumption, beyond the gain from the resumption of secular growth from digital transformation and cloud migration. Of all the publicly traded software infrastructure providers I cover (besides maybe Snowflake), MongoDB appears the best positioned to capitalize on the surge in demand for AI data processing.

Stream Processing

Stream processing involves taking action on data in a real-time stream, generally in order to query and manipulate the data while it is in-flight. Examples of actions are creating a materialized view, aggregating, filtering, cleaning and alerting on the data. Stream processing is usually performed in coordination with a data streaming platform like Apache Kafka or Confluent Cloud.

MongoDB’s addition of a stream processing capability represents an interesting move. First, it recognizes the increased importance of real-time data as an input to newer AI models and the logical next step in data-driven applications. As LLM’s are being applied to different use cases, like customer service, the AI agent interface will only be effective if it is provided access to the latest customer data. This can’t be a snapshot from the prior day. This requirement elevates the value of real-time data distribution, which is easily facilitated by data streaming platforms like Confluent.

While somewhat competitive to Confluent, I I think MongoDB’s investment in a stream processing capability validates Confluent’s strategy to incorporate Apache Flink (through its Immerok acquisition) into the Confluent platform. Additionally, I think the primary use of the Atlas Stream Processing offering will be to handle data that will ultimately be stored in Atlas. Providing a materialized view and/or prep functions in advance of storage would be a useful addition.

MongoDB Atlas Stream Processing is available to customers in private preview. For MongoDB, this provides yet another workload that would be relevant for customers building an AI system.

Isolated Search Workloads

Atlas Search has become increasingly popular with customers. As investors will recall, MongoDB launched Atlas search back in 2020 to provide customers with a means to address common application search use cases (full text, faceted, geospatial) from the same data store. Prior to this, customers would have to stand up a separate search cluster (usually Elasticsearch or SOLR) to deliver search functionality for their application. This represented another system to maintain, as well as the implementation of a data synchronization job to keep the search index updated.

As most popular search implementations are built on Apache Lucene, the MongoDB team simply integrated Lucene into their data platform. With Atlas Search, MongoDB supports the primary search types from the same interface as the core database. This has the benefit of eliminating the overhead of maintaining a separate search cluster and data syncronization process. Those are all managed by MongoDB Atlas behind the scenes.

Atlas Search has become so popular with customers that they are migrating increasingly critical search workloads to MongoDB. This has created new scalability challenges, where the search workload might consume more resources than the database. In order to address this in an efficient way for the customer, MongoDB added Dedicated Search Nodes. This allows customers to scale up their search workload separate from the operational database. They get better observability, control and cost efficiency for heavy search usage. Dedicated Search Nodes is now available in preview mode.

Relational Migrator to GA

About a year ago, MongoDB announced the Relational Migrator as a controlled access tool to be used by their Field Engineering teams to help customers plan their migration off of a relational database onto MongoDB. When it was first released, MongoDB planned to use feedback from their internal teams and customers to further shape the product, with an expectation that it would be ready for general availability in 2023. As part of the MongoDB.local conference, the team brought Relational Migrator to GA, meaning that customers can freely use the tool to plan and execute their migrations.

As background, MongoDB’s largest set of entrenched workloads to target for migrations are relational databases. Before MongoDB and other alternate data stores were available, relational databases were the only option for application development. As legacy applications are upgraded and new applications are planned, MongoDB’s data platform is a suitable choice for many database implementations that previously defaulted to relational. Application re-architectures, particularly a refactoring into microservices, also drives this motion.

A service-oriented architecture makes it much easier to keep some workloads on relational, and move the rest to a non-relational schema. While MongoDB wouldn’t be appropriate to replace every relational implementation, it could be applied to many of them. In the past, engineering teams would shoehorn all data models into relational. Some concepts, like user profile data, are much better suited to a document model, where new fields can be easily appended and look-ups are by a single primary key.

Migrating off of an entrenched relational database is a difficult exercise. It requires changes to the data model, the application code and movement of the data. In the past, engineering teams had to create their own tooling to support these migrations. MongoDB sales support and professional services would assist in these projects by providing basic scripts to automate some of the work. A flexible, UI-driven tool was needed as much of the effort can be redundant across implementations.

This was the genesis for the Relational Migrator. The goal of this product is to provide engineering teams with the tools to easily connect to a relational database, analyze its table structure, map that to the document model and then manage the data migration. It also provides support in rewriting application code to access data through the MongoDB connector, as a replacement for the SQL code that queried the relational database.

The Relational Migrator is now available for a number of popular cloud and on-premise relational databases. Data can be ported to either MongoDB Atlas or Enterprise Advanced. Further, customers can opt for a single migration of the data (if downtime is acceptable) or through a continuous synchronization process (minimal disruption).

While the majority of MongoDB’s growth is driven by workloads from new applications or microservices, they are smart to facilitate a direct migration off of a relational database for the instances where the customer wishes to further consolidate their data storage solutions onto a smaller set of providers. Relational Migrator will facilitate a steady stream of additional workloads from the refactoring of older applications to take advantage of newer data models.

Atlas for Industries

Similar to Snowflake’s ecosystem of industry specific solutions, MongoDB announced Atlas for Industries to provide customers with industry-specific expertise, programs, partnerships and integrated solutions. Within each industry, MongoDB will develop unique capabilities and achieve relevant certifications that make their offering more valuable for customers within that category. Additionally, each industry selected already has a critical mass of customers using the MongoDB platform, allowing the MongoDB professional services team to share best practices and security considerations.

Combined with MongoDB’s foray into supporting operations for AI, industry participants can benefit from MongoDB’s experience as they establish their own AI strategies. Because data is the critical differentiator for enterprises in building competitive advantage around AI offerings, MongoDB’s visibility into what is working effectively across their customer base will become a valuable input for industry-specific solutions.

As part of the announcement, MongoDB will establish financial services as the first industry specific offering. This leverages MongoDB’s deep experience servicing customers in the finance industry. Currently, 19 of the 20 largest banks in North America use the MongoDB platform within their infrastructure. This provides MongoDB with the insight to layer on new capabilities that meet requirements unique to a particular industry segment.
As an example of a value-add capability for financial services, MongoDB achieved the Amazon Web Services Financial Services Competency. This involved demonstrating MongoDB’s strong capabilities around security, reliability and system management for usage specific to financial operations. These were validated by the AWS team through direct testing.

MongoDB plans to expand Atlas for Industries programs to other industry categories over the course of the next year. They have programs on deck for manufacturing and automotive, insurance, healthcare, retail, telecommunications and the public sector. Interestingly, these have a lot of overlap with Snowflake’s Industry Solutions. For MongoDB, I think Atlas for Industries will provide another competitive advantage in attracting enterprise customers in targeted categories, providing a further catalyst for growth.

Hyperscalers

MongoDB enjoys strong relationships with all three hyperscalers. While the hyperscalers offer competitive database products, they have also all struck productive co-selling relationships with MongoDB. These incentivize their sales teams to work closely with their counterparts at MongoDB to land enterprise customers on that hyperscaler with MongoDB as the primary database.

In March 2022, AWS took their relationship to the next level when they announced an expanded collaboration with MongoDB. The agreement with AWS builds on the prior multi-year relationship between the two companies, aimed at driving customer adoption of MongoDB Atlas on AWS.

In an effort to further improve the customer experience, both companies have agreed to collaborate across sales, customer support, solution architecture, marketing and other areas to make MongoDB Atlas a better experience for developers on AWS globally. This includes increased workload migration incentives and enhanced tools to help customers move from legacy technologies in on-premises data centers to MongoDB Atlas on AWS. Finally, the partnership will support MongoDB’s expansion into more AWS Regions across the globe and the US Public Sector with FedRAMP authorization.

During the Investor Session at MongoDB.local, MongoDB’s CEO conducted a Partner Spotlight discussion with the Managing Director of the AWS Marketplace. The AWS leader highlighted many of the benefits of working closely with MongoDB. AWS smartly recognizes that they can strike a win-win relationship in this case. First, they can keep the enterprise customer happy by allowing their developers to remain on MongoDB. At the same time, AWS can sell them many other adjacent services for application hosting, storage and networking. Second, AWS still generates revenue from MongoDB’s usage of the underlying compute and storage, as Atlas is hosted on AWS.

For investors who have followed the MongoDB story for a while, you will recall that this mutually beneficial collaboration wasn’t always the case. In 2019, AWS announced DocumentDB, which represented a hosted version of MongoDB. This directly competed with MongoDB. Many analysts predicted the product would be a MongoDB killer.

Fast forward to today and DocumentDB did not kill MongoDB. The product still exists, but has not broadened its penetration beyond a few AWS customers. According to DB-Engines, AWS DocumentDB’s popularity ranking is way down at position 151, while MongoDB is ranked at position 5. Even expanding the comparison of scores on DB-Engines to include more modern solutions from the hyperscalers, the proprietary hyperscaler products are scored far below MongoDB.

Additionally, those relative positions haven’t changed much over time. Amazon DynamoDB and Azure Cosmos DB are still far below MongoDB in terms of overall developer popularity. In fact, none of the newer, proprietary database solutions (those not derived from open source or less than 10 years old) from the hyperscalers have even broken into the Top 10.

I think at this point, we can dispense with the thesis that newer specialized database solutions from the hyperscalers will dislodge MongoDB. The data just doesn’t bear this out. As an aside, the same pattern applies to Snowflake, relative to hyperscaler data warehouse solutions like Redshift or BigQuery. The point is that in the database category, the independent providers tend to do a better job at building a solution that appeals broadly to developers across all cloud providers.

This is likely due to a combination of factors. They include a more complete feature set, a desire to avoid lock-in and more focused developer relations. For MongoDB and Snowflake, the ability to work across all three hyperscalers with the same interface represents a big advantage in my view. Every global enterprise engineering team has to consider the opportunity cost of maintaining all their data on a single hyperscaler. They must also be prepared for the optionality to support another hyperscaler without having to rewrite their application. With MongoDB, the API is the same across all providers.

Google Cloud Deal

Another catalyst for fostering a closer relationship between MongoDB and the hyperscalers is AI. During the user conference, MongoDB announced an expanded partnership with Google Cloud that enables developers to use state-of-the-art AI foundation models from Google to build new classes of generative AI applications.

This partnership delivered a couple of new capabilities. First, developers can use MongoDB Atlas Vector Search with Google’s Vertex AI platform to build applications with AI-powered capabilities for personalized and engaging end-user experiences. Vertex AI will provide an API to generate text embeddings from customer data stored in MongoDB Atlas, combined with PaLM text models to create advanced functionality. Examples of AI-powered capabilities that developers can derive from this combination include semantic search, classification, outlier detection, AI-powered chatbots and text summarization.

The other benefit coming out of the new partnership will be access to dedicated professional services teams that can help rapidly prototype applications by providing expertise on data schema and indexing design, query structuring and fine-tuning AI models. Developers can also tune models to further improve the performance of the model for specific tasks. Google Cloud and MongoDB are working closely together to make these experiences even more seamless with Google’s Generative AI capabilities built right into MongoDB Atlas.

When applications are ready for production, the MongoDB and Google Cloud professional services teams can optimize applications for performance and will support future feature development based on customer feedback.

I think this relationship further underscores the unique position that MongoDB occupies in the AI value chain, further solidified with the introduction of Vector Search. As Google is pushing their AI capabilities, this partnership with MongoDB provides developers with a reason to utilize Google Vertex AI for their AI workloads.

Q1 Earnings Results

Now that we have reviewed MongoDB’s product strategy and recent announcements, let’s look at how these translate into the most recent earnings results. After a couple of mediocre quarterly reports, MongoDB’s Q1 earnings results were particularly well-received by the market. Prior to the report on June 1st, MDB stock had spent most of 2023 in the low $200 range. It even dipped to the $140’s briefly in November 2022.

The day after Q1 earnings, MDB stock shot up 28%. It recently reached a new 52 week high, after passing the $380 mark hit back in August 2022. It is still below the historical ATH of $590 reached back in November 2021. Looking forward, the key question for investors will be if MongoDB can reaccelerate its revenue growth from the 29% annual increase in Q1 and outperform their current full year estimate for 19% growth.

Beyond a return to durable elevated revenue growth, a rapid improvement in profitability measures are providing further support for the stock. Over the last year, MongoDB surged past Non-GAAP operating margin break-even to deliver 12% op margin in Q1. FCF margin was even better, hitting a record of 14%. During the Investor Session, the CFO set the long-term operating margin target over 20%.

The market seems to be anticipating improvements in revenue growth and operating leverage, with the P/S ratio kicking above 20 recently. MongoDB’s P/S ratio has been above 20 in the past, but revenue growth was well over 30% annually then. Accelerating revenue growth with record operating and FCF margin may provide enough momentum to push MDB back towards its ATH price over the next year.

With that, let’s review the results from Q1. Combined with the product development commentary from the recent Investor Session at MongoDB.local, we can try to discern some signals about the growth story looking forward to 2024.

Revenue

As part of MongoDB’s Q4 report in March, leadership forecast underwhelming preliminary revenue guidance for Q1 and the full year. At the time, it wasn’t clear if leadership was being conservative, or anticipated a more significant slowdown than what had already been experienced in the prior year (calendar year 2022). Because of the full year estimate for slower growth, analysts set a low bar for Q2 and Q3 revenue targets. This is what allowed MongoDB to really outperform in Q1.

MongoDB delivered $368.3M in revenue during the first quarter, which was up 29.0% annually and 1.9% sequentially over Q4. This beat the analyst estimate for $347.0M by a healthy margin, which would have represented 21.6% annual growth and -4.0% sequentially. MongoDB also surpassed their preliminary Q1 guidance for a range of $344M – $348M by $22.3M and 5.9% of sequential growth.

In fact, Q1 revenue even leapfrogged the analyst estimate for Q2 revenue, which was for $360.8M or 18.8% annual growth. Due to the outperformance in Q1, MongoDB leadership set the initial Q2 guide for a range of $388M – $392M, representing 28.4% annual growth and 5.9% sequentially. If we assume the same level of beat for Q2 actual revenue (+$22.3M), MongoDB could deliver $412.3M for annual growth of 35.8% and sequential growth of almost 12%.

This would represent a nice reacceleration of revenue growth over the Q1 rate. Given that annual growth decelerated in every quarter of the prior year and sequential growth never broke 10%, this would represent a favorable inflection. I think this turning point in the growth story is what reignited the performance of MDB stock.

For the full year, the company had set guidance for revenue in the range of $1.480B – $1.510B in the Q4 report, representing annual growth of 16.4%. After the Q1 beat, they raised this by $37M at the midpoint to $1.522B – $1.542B for 19.3% annual growth. The magnitude of the raise exceeded the $22.3M beat in Q1.

Looking to the second half of the year, the full year revenue guide currently implies flat sequential growth for Q3 and Q4. On an annual basis, revenue growth would drop below 20%. That assumption is based on adding the Q1 actual to the Q2 estimate (which is $758.3M) and subtracting from the midpoint of full year guidance of $1.532B, resulting in just $774M of revenue for the second half of FY2024. Dividing by 2 gives $387M a quarter, or less than the midpoint of the current Q2 guide.

I think the full year revenue projection is conservative and will be raised as MongoDB beats each subsequent quarterly estimate. Given their momentum in customer growth, I think it is unlikely they will revert to 0% sequential revenue increases.

MongoDB’s cloud offering, Atlas, grew by 40% y/y and contributed 65% of total revenue. In Q4, Atlas grew by 50% and also contributed 65% of revenue. By applying the percent of revenue for Atlas to total revenue, we can get an approximate value for Atlas revenue in each quarter. Atlas was responsible for about $239.4M of revenue in Q1, versus $234.8M in Q4, for 1.9% sequential growth. That represents a pretty substantial deceleration from the 11.7% sequential growth in Q4. However, management pointed out that there are fewer days in Q1. Because Atlas is a consumption business, this would impact revenue generation by about 2-3%.

During the Investor Session, MongoDB’s CFO displayed an interesting slide that provided some insight into the change in y/y growth rates for Atlas. He was able to show how the macro headwinds that started in Q2 FY2023 (calendar 2022) translated into slower weekly sequential growth in Atlas consumption.

The slide illustrates the clear drop-off in growth rates starting Q2 last year and the pick-up in Q1. As we look forward to the 12 months, MongoDB can potentially reaccelerate sequential growth. This would be driven by increased consumption of Atlas by the combination of new customers, expanded workloads within existing customers and the natural growth of existing application usage.

Profitability

While the outperformance on revenue growth was refreshing, MongoDB’s improved profitability delivered another positive catalyst. In an environment where investor expectations have shifted towards a balanced mix of revenue growth and margin improvement, MongoDB responded well. They are demonstrating the ability to drive meaningful gross profit and free cash flow from their platform.

In Q1, profitability measures beat expectations and hit new records in several meaningful areas. Non-GAAP gross margin reached 76.0% in Q1, which is 100 bps above Q1 from a year ago. Gross profit grew by 30.6% year/year, which is about 160 bps faster than overall revenue growth. Improvements in gross margin continue to be driven by efficiencies in Atlas operations.

As a cloud-based service, Atlas will have lower gross margins than the pure software package Enterprise Advanced. Yet, in spite of Atlas’ contribution to revenue continuing to increase, MongoDB has been able to methodologically inch up gross margin. During the Investor Session, the CFO shared a long term target for gross margin of 70% or more.

The outperformance on revenue growth, gross margin and improved operating leverage are all combining to allow MongoDB to deliver strong Non-GAAP operating income. In Q1, MongoDB generated record operating income of $43.7M, representing an operating margin of 11.9%. This beat the company’s own guided range from Q4 for $10M to $13M by about 4x. This compares to $17.5M in Non-GAAP operating income a year ago and $37.2M in Q4.

The increase in operating margin follows a line of gradual improvement since the IPO. The lack of profitability had been a major investor complaint for a while, particularly as some software peers achieved positive operating margin much sooner than MongoDB.

On a per share basis, the Non-GAAP operating income outperformance drove a huge beat in EPS over analyst estimates. MongoDB delivered $0.56 of Non-GAAP EPS, versus the $0.19 consensus. It’s not often that we see beats of that magnitude on EPS with software companies.

Shifting to cash flow, the same story emerges. Q1 represented a record for FCF generation. MongoDB delivered $51.8M of FCF, for a FCF margin of 14.0%. A year ago, FCF was $8.4 for a 2.9% FCF margin. In Q4, it was $23.8M for a 6.6% FCF margin. FCF in Q1 doubled over Q4 and was a 6x improvement from a year ago. The high FCF margin allowed MongoDB to remain over the Rule of 40 for the quarter.

Looking forward, MongoDB expects the strong profitability performance to continue. For Q2, they are estimating Non-GAAP income from operations in the range of $36M – $39M. While below the $43.7M just delivered in Q1, it is significantly above the original estimate set by the company for Q1 of $10M – $13M. On an EPS basis, MongoDB leadership estimates Non-GAAP EPS in a range of $0.43 – $0.46, which beat the analyst estimate for $0.14 substantially.

For the full year, MongoDB leadership raised profitability estimates as well. For Non-GAAP operating income, they increased the range from $69M – $84M issued in Q4 to $110M – $125M in Q1. This translates into a Non-GAAP EPS range of $1.42 – $1.56, compared to analyst estimate for $1.04, and the company’s prior guidance of $0.96 – $1.10.

The majority of the gains in operating leverage has been generated by improvements in spend efficiency. Non-GAAP operating expense as a percentage of revenue has decreased from 113% in FY2018 to 70% in FY2023. Over the long term, the CFO shared an operating margin target of 20%+ during the Investor Session. Based on MongoDB’s current momentum, this should be easy to hit in the next couple of years.

Customer Activity

MongoDB carried the good news from their Q1 report into customer activity as well. The number of total customer additions hit a record in Q1 with 2,300 new customers joining the platform. This exceeded counts for at least the past 2 years, which even included the Covid-fueled IT spending surge in 2021. It also far exceeded the 1,700 customers added in Q4, which represented a concerning dip.

Of the customer additions, MongoDB leadership also cited over 200 customers that identify as AI companies. This perceived demand tailwind is driving a lot of the excitement around the stock. Considering the product announcements in AI made during the MongoDB.local event, like vector database support, this interest isn’t surprising. Nor is it necessarily new, as MongoDB leadership first mentioned Hugging Face as a new customer back in Q3 (admittedly, I missed the significance of this one, as I was ramping up on the AI story).

"The shift to AI will favor modern platforms that offer a rich and sophisticated set of capabilities delivered in a performance and scalable way. We are observing an emerging trend where customers are increasingly choosing Atlas as a platform to build and run new AI applications. For example, in Q1, more than 200 of the new Atlas customers were AI or ML companies, while start-ups like Hugging Face, Tekion, One AI, and Nuro are examples of companies using MongoDB to help deliver the next wave of AI-powered applications to their customers." MongoDB Q1 FY2024 Earnings Call

In addition to a surge in total customer additions, MongoDB hit a new record on the number of sequential adds in $100k+ customers. This increased by 110 in Q1 to reach 1,761. This growth helped keep MongoDB’s net ARR expansion rate above 120%. MongoDB doesn’t report the actual value, but it is good to see the rate hasn’t dipped below this threshold with the slower overall growth.

Direct Sales customer additions dipped in Q1 to only represent 300 net adds over Q4, but management shared that the majority of the Direct Sales customers were in the Enterprise segment (their largest). As investors will recall, the Direct Sales metric represents those customers who get assigned a sales rep (versus self-serve), so variations in this metric could have other causes.

During the Investor Session, MongoDB leadership displayed a slide with customer logos segmented by industry. An important aspect of the MongoDB investment thesis is their penetration with many mainstream businesses. For example, some of their largest customers are in financial services, with examples like Wells Fargo and Morgan Stanley. They also enjoy broad penetration in other non-tech categories like healthcare, manufacturing, media and retail.

Intuitively, we would assume these companies (particularly financial services) would gravitate towards a relational model. Yet, they are embracing MongoDB’s non-relational workload support for a number of different applications. These customers appreciate the benefits of MongoDB’s high performance, versatility and presence across all cloud providers. This also highlights the fact that MongoDB isn’t embraced just by progressive developers at tech start-ups, but also engineering teams at established Fortune 100 enterprises.

In fact, during the Investor Session, MongoDB’s CFO shared a slide listing penetration among the Fortune 100, Fortune 500 and Global 2000. These numbers are higher than I expected, with over 60% of the Fortune 100 using MongoDB in some capacity. Even though MongoDB has a foothold in these large companies, there is still plenty of growth remaining. The CFO followed this slide with data showing that MongoDB’s share of total database spend for the Fortune 100 and Fortune 500 is below 2%.

The difference is explained by the number of workloads that these large enterprises use MongoDB to address. As discussed earlier, the application workload is MongoDB’s currency in large customer spending growth. MongoDB is often brought into the organization to provide the data store for a single application workload by a development team. Then, over time, the broader engineering organization becomes comfortable with the inherent advantages of MongoDB and begins applying it to adjacent applications.

This drives the growth of customer spend within each enterprise across two dimensions. First, as the customer’s business grows, the use of each application increases, generating more consumption of the MongoDB service backing it. Second, the addition of new application workloads starts new growth curves that stack on top of each other.

MongoDB’s CFO provided an example of this during the Investor Session, where a particular customer iterated through 5 application workloads. As time progressed, the ARR associated with each established workload increased from more utilization. Then, each subsequent workload stacked a new ARR curve on top of the prior ones.

Having managed a large-scale hosting infrastructure consisting of many microservices for a major consumer Internet property in the past, I can see the appeal of consolidating many data models onto a single platform. While it may not address every edge case, MongoDB can handle most requirements for document, key value, time series, wide column and search workloads. Additionally, some relational data models are fairly flat in practice (meaning denormalized with some data duplication for performance), which can easily be repurposed into a document model.

The benefits of a single platform like MongoDB for consolidating most (not all) workloads far outweigh any gaps in functionality. These include a single interface, consistency across cloud providers, lower cost (due to fewer vendors) and less DevOps overhead. Atlas makes this argument even more compelling, as the infrastructure management is outsourced to the experts at MongoDB.

In the Q1 earnings calls, management reported “record” number of new workload captures within existing customers. This supplemented the all-time-high number of total customer additions, emphasizing that MongoDB’s future revenue growth should be durable. They will continue to increase revenue as existing large customers apply MongoDB to more workloads within their organization. This motion is supported by MongoDB’s continued product expansion to address new data models.

Looking forward, future elevated growth will be maintained as new customers backfill any eventual saturation within large customers. As long as MongoDB maintains healthy net ARR growth with existing customers and continues to add new customers at a robust clip, then overall revenue growth should continue in the 30%+ range. New AI workloads provide another tailwind to prop up growth over the long term.

Investment Plan

In my review of MongoDB’s Q4 results, I was guardedly optimistic, predicting that the low preliminary full year guide was conservative. I cited a few factors that demonstrated MongoDB’s ongoing momentum and raised the potential benefit from AI.

In the Q1 report, MongoDB showed that these tailwinds are indeed in place. Performance surpassed expectations on almost every level, surprising investors with their momentum. The market rewarded the stock with a 28% bump the next day. At current prices, MDB is up more than 100% in 2023 and is sitting near its 52 week high, with the P/S ratio hovering around 20. This valuation has brought it above most peers and is now inline with the most highly valued stocks, like NET and SNOW.

I agree that the stock is pricey at this valuation, but also consider MongoDB a company I want to own. It is becoming increasingly clear that investment in modern data infrastructure will be a priority for enterprises going forward. Not only does this represent a continuation of previous digital transformation secular trends, but also is a requirement for any company to leverage AI to improve their service offerings.

As part of Accenture’s latest earnings call, they cited an internal survey in which executives at customer companies were asked about their plans for AI. The results indicated that 97% of executives expect generative AI to be transformative to their industry and that 67% of organizations are planning to increase their spending on technology in general, with prioritization for investments in data and AI.

"And so while it is early days, we see generative AI as a key piece of the digital core and a big catalyst for even bigger and bolder total enterprise reinvention going forward. In fact, in a survey of global executives that we completed just last week, 97% of executives said Gen AI will be transformative to their company and industry and 67% of organizations are planning to increase their level of spending in technology, prioritizing investments in data and AI." Accenture Q3 2023 Earnings Call

Given the critical position of data as a driver of AI, I have increased my portfolio allocations to data infrastructure providers (CFLT, SNOW, MDB and even DDOG and NET). MDB now occupies equal standing with my other large positions. As data is becoming the competitive differentiator for realization of AI ambitions, providers of data infrastructure are well-positioned. Of these, I think MongoDB has significant optionality. They will benefit from increased consumption driven by the confluence of greater developer productivity, more AI-driven data processing demand and the ongoing trends of digital transformation and cloud migration.
post mediapost mediapost mediapost media
Seeking Alpha
Accenture Plc (ACN) Q3 2023 Earnings Call Transcript
Accenture plc (NYSE:ACN) Q3 2023 Earnings Call Transcript June 22, 2023 08:00 AM ET

Thanks for this excellent write up, especially the parts explaining how MongoDB is positioned to take advantage of AI trends. They also have great sales management and sales teams and are well positioned for future growth.
+ 11 comments
Snowflake Q1 FY2024 Earnings Review $SNOW
Snowflake’s consumption business continues to feel pressure as large enterprise customers look for ways to optimize usage. While it seemed that management had a handle on forecasting the impact of this effect when they lowered guidance back in Q4, they still underestimated the trend. With the Q1 report, they once again brought down the full year revenue target.

This had the expected effect of torpedoing the stock the day after earnings, with a post-earnings drop of 16.5%. After that, a strange thing happened. The stock began appreciating again and surpassed its pre-earnings price two weeks later. As with other software infrastructure companies, the market’s perception that demand for AI services will drive incremental usage is propping up the stock. Snowflake management added to this momentum with a few key announcements, including a preview of their upcoming Snowflake Summit conference, which will include a fireside chat with Nvidia’s CEO. This all implies that Snowflake is positioning themselves to benefit from increased AI workload demand.

In the near term, Snowflake is being directly impacted by their consumption model, which magnifies changes in customer behavior as they look for ways to reduce costs. Looking forward, the market is anticipating the moderation of these optimization headwinds, as enterprises work through the low hanging fruit of cost savings. At that point, Snowflake’s growth should return to its prior cadence driven by new cloud migration workloads and the expansion of existing ones.

Given that enterprise benefit from AI hinges on access to a consolidated, clean and secure data set, Snowflake is well-positioned to serve as a primary data source. Their positioning is further solidified as the same environment could be leveraged to run jobs that enhance the AI models. This applies to LLMs and other foundation models, as well as more traditional types of machine learning output like recommenders. Snowpark, Steamlit and other extensions that make the environment more programmable started this process. New acquisitions are bolstering the platform’s capabilities as well. Investors are looking towards announcements at the upcoming Summit user conference for more insight into Snowflake’s planned AI capabilities.

In this post, I review the Q1 results within the context of Snowflake’s current business. I also look forward towards Snowflake’s product development roadmap as they position the platform to address the next wave of data processing use cases to support AI and ML. We will learn more from the Snowflake Summit conference the week of June 26th.

Snowflake Product Roadmap

I have written extensively about Snowflake’s product vision in the past. They aspire to become the single cloud-based data store for all of an enterprise’s data. This involves combining application and analytical workloads into one Data Cloud. While this sounds far-fetched, it is actually how data storage worked from the beginning. Analytics and transactional workloads were separated in the 90’s for performance reasons, when OLAP became its own data processing branch separate from OLTP. Given that compute and data storage capacity were expensive in fixed on-premise data centers, this separation of concerns made sense.

Elastic cloud-based infrastructure introduced the opportunity to consolidate the two functions again. To be clear, Snowflake is not inventing a new technology to efficiently store and query data to serve analytics and transactional workloads in the same process space. Rather, they are recognizing that the cloud allows them to present the same entry point for data analysts and developers. Data is still duplicated in memory and optimized for the type of query pattern the workload requires. On cloud infrastructure, Snowflake can balance these workloads efficiently, providing customers with simplicity and reduction of infrastructure management overhead.

This model provides a strong foundation for a single data store serving multiple processing engines. Looking forward, I think we can expect further platform expansion with the introduction of more workloads. The Snowflake platform has the ability to efficiently store data of all types (structured, semi-structured, unstructured). Their one-to-many access pattern is enabled by the Snowgrid technology layer. Snowgrid works across multiple cloud providers to consolidate the underlying data, manage governance policies, ensure continuity and facilitate sharing with partners.

Snowgrid encompasses many of the competitive advantages that Snowflake’s platform enjoys over similar capabilities from the hyperscalers and independents like Databricks. Those platforms address many of the same core storage/access technologies around a data warehouse/lakehouse model, but are light on the capabilities that facilitate doing anything with the data external to the organization. These include governance, data sharing, native applications and third-party data feeds through a complete marketplace. Additionally, seamlessly spanning data on multiple cloud providers is a fundamental requirement for any global enterprise.

After getting the basic platform down, the next phase of Snowflake’s evolution was grounded in building industry ecosystems around it. If enterprises can consolidate all of their data into a single cloud-based platform, then business application development and data sharing with partners can be greatly simplified and better secured. While Snowflake continues to improve their capabilities in enabling core analytics and data science workloads, future disruption revolves around two primary initiatives:

  • Data Collaboration. Enabling secure data sharing between companies, with granular governance and frictionless management layered in. This effort was started in 2018 and has been the catalyst for Snowflake’s growth in data sharing and enabling industry ecosystems for customers. By providing customers with seamless mechanisms to distribute data securely to their industry partners, Snowflake is building strong network effects. These can’t be easily duplicated by competitors who are either on a single cloud (hyperscalers) or offer rudimentary solutions to data sharing that still create copies (Databricks). Strictly governed data sharing without copies will be even more critical as enterprises seek to enhance LLMs and foundation models with their proprietary data.
  • Native App Development. Allow developers to build applications directly over a customer’s data set within the Snowflake environment. This represents the next driver of Snowflake’s growth. The rationale is simple. It is more expensive and less secure for an enterprise application provider to maintain their own copy of an enterprise’s data in their cloud hosting environment. Rather than trusting your CRM (or other SaaS application variant) provider to keep your data secure and pass on any cost efficiencies, why not allow CRM app developers to host their solutions within Snowflake’s environment on top of your enterprise data housed in one location? This is the crux of Snowflake’s strategy to “disrupt” application development. While early, the value proposition makes sense.

Both of these growth strategies provide the benefit of eliminating copying of data to another destination. For data sharing, two companies can exchange data without requiring complex APIs or rudimentary file transfer processes. More importantly, the scope of the data can be limited to just what is needed with a fixed duration. The recipient can’t “keep” a copy of the data after the partnership ends. The same benefit applies to customer data for a CRM app, employee data for HRM and every other SaaS enterprise app derivative.

While data sharing is often ignored by analysts and investors, it continues to surface as one of Snowflake’s stickiest features. The percent of customers with active data sharing relationships continues to increase. Utilization is even higher for large $1M+ customers. Surprisingly, there are still many reasons for enterprises to exchange data with partners and customers, which are often handled through inefficient processes like FTP and APIs.

With a renewed focus on maintaining control over an enterprise’s proprietary data as an input to AI model training/inference, strong governance of data is even more important. Snowflake has built granular controls into their data sharing methodology. Most importantly, data is not shared by making a copy, unlike some competitive solutions. Access to data for any partner can be immediately revoked without having to request that the partner “delete their copies”.

One Data Cloud

If Snowflake can realize their vision for a single cloud-based enterprise data store (the Data Cloud), they will unlock an enormous market opportunity. To size the opportunity, Snowflake leadership identifies a set of workloads that the data platform can address currently. Those represent the serviceable opportunity with Snowflake’s existing product set and go-to-market focus.

They size the market at $248B for this year, while projecting revenue representing just over 1% of that. The core of the market still encompasses Snowflake’s foundational workloads in analytics, data science and data sharing. They are slowly adding new workload targets, like security analytics, which they estimate as a $10B opportunity. The reasoning for this addition is straightforward – enterprises and third-party app developers can build security scanning solutions on top of Snowflake’s data cloud, taking advantage of the large scale data processing platform that Snowflake has already built.

This new workload in cybersecurity (with more workloads coming) is supported by Snowflake’s Powered By program. For cybersecurity, they already have several partners including Securonix, Hunters, Panther Labs and Lacework. The benefit to Snowflake with workloads like cybersecurity is twofold. First, these application builders generate revenue for Snowflake through their consumption of compute and storage resources. During their Investor Day in 2022, leadership revealed that 9% of their $1M customers were in the Powered By program. Second, having these capabilities available to enterprise customers provides one more reason to consolidate their data footprint onto Snowflake.

Given its growth, I would even speculate that in the future, revenue from Powered By approaches revenue from regular customer use of the Snowflake platform. This is because Powered By program participants are building their entire business on the Snowflake platform. We have already seen several sizable security vendors take this approach. During Snowday in late 2022, the SVP of Product shared that the 4 fastest growing companies from $1M to $100M in ARR are built on top of Snowflake. This could become a significant revenue driver if we consider that a typical SaaS vendor might spend 10-20% of revenue on their software infrastructure. Not all of that would go to Snowflake, but a good portion of their $10M-$20M+ in annual IT spend could.

While a $248B TAM is one of the largest in software, Snowflake leadership isn’t capping it there. They project a bigger market opportunity if they realize the full vision of the Data Cloud. The rapidly evolving opportunity to enable AI workloads hasn’t been fully sized yet. Some of this would be covered under the Data Science and ML category, but likely would grow from the $51B estimate currently. The Snowflake Summit conference and Investor Day on June 27th will provide more guidance here.

In fact, this year’s Summit conference appears that it will revolve around AI and the potential for Snowflake to provide a critical foundation for enterprises to harness it. The agenda is chock-full of AI content, including a who’s-who of luminaries from the space. The conference kicks of with a fireside chat between Snowflake CEO Frank Slootman and NVIDIA Founder and CEO Jensen Huang on Generative AI’s Impact on the Enterprise.

From the press release, here is a list of other highlights (copied from the press release):

  • A Thursday keynote panel featuring Andrew Ng, Landing AI Founder & CEO; Ali Dalloul, Microsoft VP Azure AI; Jonathan Cohen, NVIDIA VP of Applied Research; and moderated by Snowflake SVP of Product Christian Kleinerman.
  • Keynote presentations from Frank Slootman, Snowflake Co-Founder and President of Products, Benoit Dageville, Snowflake SVP of Product, Christian Kleinerman, unveiling the next wave of Snowflake’s product innovations including bringing 3rd party LLMs to your data, delivering 1st party LLMs as services, and creating LLM-enhanced product experiences.
  • New details around how Snowflake’s recent acquisition of Neeva will enable AI-driven search and conversational experiences in enterprises.
  • Technical deep dives into the latest Data Cloud advancements with generative AI and LLMs, Apache Iceberg, security & privacy, programmability, application development, clean rooms, streaming and much more.
  • Dozens of partner-led sessions about leveraging generative AI within an organization’s tech stack to drive long-term business impact.
  • 100+ Data Cloud Ecosystem announcements to support all aspects of an organization’s AI/ML strategies from new applications to technology integrations to services and more.

Beyond the AI opportunity, Snowflake has long recognized their advantage to move beyond the core data platform to build a robust set of data services and native applications on top of the customer’s data, keeping everything in one place. This has the benefits of lower cost, better controls and a simpler system architecture. Customers are gravitating towards these advantages, recognizing that Snowflake’s reach across all hyperscalers gives them optionality.

To track their progress in building an ecosystem of data sharing and native applications, Snowflake leadership regularly publishes a set of “Data Cloud Metrics”. These give investors a sense for their progress in data sharing, the Marketplace and the Powered by program.

To capture data sharing activity, Snowflake reports a measure called “stable edges”. Snowflake leadership sets a high bar for considering a data sharing relationship between two companies as actively being used. In order to be considered a stable edge, the two parties must consume 40 or more credits of Snowflake usage each day over a 6 week period for the data sharing relationship. I like this measure, as it separates empty collaboration agreements from actual value creation.

In Q1, 25% of total customers had at least one stable edge. This is up from 23% in Q4 and 20% a year ago. If we apply these percentages to total customer counts in the period, we get the chart below. While total customers grew by about 29% y/y in Q1, the number of customers with at least one stable edge grew by 61%.

To me, that growth represents an important signal for the value-add of data sharing. If we assume that new customers take at least one year to get around to setting up a stable edge, then almost 32% of customers over a year old have a stable edge in place (total edges / customer count Q1 FY2023).

Facilitating these data sharing relationships represents a competitive advantage for Snowflake. They increase customer retention, generate network effects to attract new customers and drive incremental utilization as shared data sets are filtered, cleansed and combined with other third party data. This network of data sharing relationships elevates Snowflake’s value proposition for customers onto a higher plane beyond focusing on tooling for analytics and ML/AI workloads within a single company.

The other area experiencing high interest is the Snowflake Powered By program. This represents companies that have decided to build their data-driven product or service on top of Snowflake’s platform, that they then sell to their customers. For Q1 FY2023, Snowflake announced there were 425 Powered by Snowflake partners, representing 48% growth over the prior quarter’s count of 285 in Q4. For Q2, Powered By participation took another large jump forward, increasing by 35% q/q to reach 590 registrants. In Q3, Snowflake reported another 20% q/q growth, hitting 709 registrations by quarter’s end. In Q4, they reported 16% sequential growth to reach 822.

Finally, in Q1, sequential growth accelerated sequentially to 18%, reaching 971 participants. This represents growth of 2.3x over the past year. As part of Investor Day in June 2022, leadership revealed that 9% of their $1M+ customers were in the Powered By program. Snowflake ended Q1 with 373 $1M+ customers, implying that almost 34 Powered By participants were generating more than $1M in annual product revenue.

Transitioning to AI Workloads

The future direction of large data storage and processing will be to generate ever more sophisticated intelligence, efficiency and automation from it. These represent the next steps of the evolution into AI. While the ability to apply machine learning to create self-improving algorithms and harvest richer insights from analytics have existed for a while, the new capabilities introduced by LLMs and chat-based interfaces have inspired a new push. Enterprises are scrambling to find ways to utilize the new capabilities emerging from generative AI and LLMs, even while specific use cases outside of ChatGPT interfaces are still developing.

At a high level, I think enterprises will harness these new capabilities in a couple of ways, generating incremental usage of data stores like Snowflake, along with other companies supporting a modern data infrastructure (Confluent, MongoDB, etc.).

More activity from better user interfaces. While this seems simple on the surface, I think that more efficient user interaction will drive much higher utilization of existing software services. LLMs and ChatGPT like interfaces allow humans to interact with software applications through an interface that is based on natural language. Rather than being bound to traditional GUIs with preset choices or requiring use of a scripted language (like SQL) to define tasks, chat interfaces allow users to engage software applications through text-based prompts.

As an example, Snowflake itself built a new interface internally to their traditional executive dashboards for the senior leadership team. This unlocks data to a whole new set of users. A natural language interface allows any employee to instantly become a power user. This should result in many more queries to the underlying analytics engine, driving more consumption.

As another example on the consumer side, Priceline is working with Google AI to create a virtual travel concierge as the entry point for users to plan a trip. A simple text-based instruction with some rough parameters could kick off a large number of queries to multiple application data services (flight, car, hotel, entertainment, dining, etc). This replaces complex interfaces with many combinations of drop-downs, selectors, submit buttons, etc., which are then repeated for each aspect of trip planning. The efficiency querying for all of this in one or two sentences would result in more overall usage. Not to be outmaneuvered, other travel providers like Expedia are working on similar features.

More activity from more applications. Developer co-pilots derived from training LLMs on software code allows programmers to be more productive. Anecdotally, some developers have reported productivity increases of 2-3x. Even a more conservative increase of 30-50% for a software development team would be significant. The outcome for enterprises would be the creation of more software applications (they all have a long backlog of digital transformation projects). These would require hosting, monitoring, security and data. The data generated by these new applications would flow through the same pipes and be consolidated into a central store for further processing.

More activity from enterprises running foundation models over their internal data. One major challenge with public LLMs is in the control of the enterprises’ proprietary data. Enterprises don’t want employees feeding public services with their private data, because that data becomes part of the model once it is shared. There will be a large opportunity to enable use of LLMs with enterprise specific data that maintains governance and security.

More activity from the application of foundation models to specific domains. This will likely become the largest driver of future growth for Snowflake. Recall that Snowflake has invested heavily in the past in the creation of industry verticals. These represent an ecosystem of participants in a particular industry segment, who can interact through systems of shared data and special product offerings curated for that particular vertical. As each of these verticals tries to apply AI to their domain, Snowflake will be well-positioned to offer domain-specific capabilities. They can feed foundation models to create unique offerings that possess specific industry context with the latest data.

Snowflake can help ecosystem participants securely assemble larger data sets through controlled sharing. They can extend foundation models with domain specific context and offer them to ecosystem participants. While individual enterprises will closely guard their proprietary data, they will realize that collaborating with a couple of other participants in the same sector might result in an even better AI-driven offering. We should see the emergence of tightly controlled enterprise alliances that revolve around data sharing to create better AI models for their industry that disrupt other participants. Snowflake’s sophisticated data sharing capabilities (again without making copies) will become a huge advantage here.

Opens up a new set of services to offer in the Snowflake Marketplace. While providers in the Snowflake Marketplace have been focusing on selling curated data sets, AI models provide a whole new layer of services that Snowflake can offer through the Marketplace. As we saw with Q1’s Data Cloud Metrics, sequential growth of Marketplace offerings is slowing down. I’m not surprised, as there are likely only so many generic demographic, weather, financial, etc. datasets that can be sold. Sophisticated, contextually-aware AI models distributed through the Marketplace could provide a new growth vector for vendors.

For all of these benefits, you see the core outcome summarized as “more activity”. That is why AI could represent a new catalyst for data service providers at all layers of the stack, like Confluent for pipelines, MongoDB for the transactional data and of course Snowflake for the consolidated data cloud.

Software and data service providers would like to power as many of the steps in the AI value chain as possible. Foundation models are becoming ever more available through open source, the public domain and commercial offerings with API interfaces. The generic steps of providing the data inputs (structured and unstructured), training, adaptation and inference could be powered by a single platform. This platform would provide the foundation for an ever-increasing number of domain specific, AI-enhanced tasks that are incorporated into enterprise application functions. For Snowflake, they could enable several of these steps, if not power the whole platform at some point.

For a more clear view of how modern data infrastructure providers could benefit from the increased use of newer foundational models by enterprises, we can refer to a diagram provided by Confluent at their Investor Day. With traditional machine learning processes, the primary focus by enterprises was on performing custom training and feature engineering. These models were created by loading large enterprise data sets through a batch function from a data lake or data warehouse (or lakehouse). Once the base enterprise model was established, inference would customize results for each request. Inference generated some data access, but not at the same volume as the original model construction.

With Generative AI, LLMs and other foundation models, a third party often provides the generic model pre-trained with public data. The enterprise then applies much heavier inference to inject its contextual data into the generic model. This inference can be applied in real-time for every user interaction. Given the scope of functions and data addressed from a chat-based interface, the volume of data accessed in real-time to deliver a customized response for each user (based on their history) could actually be much larger and broader in scope than what was required to build a standard ML model under the prior method.

This all implies that a cloud-based data store like Snowflake that houses all of an enterprises’ data would be very useful for adding user-specific context to any pre-trained model. As this data is often requested in near real-time, overall consumption of Snowflake storage and compute resources would logically increase for the customer. This also highlights the need for Unistore, Snowflake’s solution to support transactional workloads.

Going back to the examples of travel companies like Priceline and Expedia updating the user interface with a natural language prompt, the model needs access to user data in near real-time in order to address all possible queries. For example, a customer who wants to check the status of their flights or make a change to some aspect of their trip would require that the model be aware (through inference) of their specific history. A pre-trained model with data loaded once in a batch manner would not be as useful in this context.

At Snowflake’s upcoming Summit conference, we will learn more about their plans to address AI use cases. Highlighting the event, is the announcement of a fireside chat between Snowflake CEO, Frank Slootman and NVIDIA Founder and CEO, Jensen Huang on “Generative AI’s Impact on the Enterprise.” This will likely outline the large opportunity for Snowflake to leverage their existing enterprise relationships to drive growth from the rush to deliver new generative AI digital experiences.

In the near term, their Snowpark development environment is experiencing increased usage. In his opening remarks, Snowflake’s CEO shared that in Q1 more than 800 customers used Snowpark for the first time. About 30% of all customers are now using Snowpark on at least a weekly basis, up from 20% in the prior quarter. Consumption of Snowpark has increased nearly 70% q/q.

Recent acquisitions, like Applica, are enabling new AI capabilities for Snowflake. As part of Q1 earnings, they also announced the acquisition of Neeva, which brings search capabilities layered over generative AI that allow users to query data more efficiently. Snowflake plans to infuse their core search capabilities across the Data Cloud. Neeva brings more human capital with a strong search background as well.

"Applica’s language model solves a real business challenge, understanding unstructured data. Users can turn documents such as invoices or legal contracts into structured properties. These documents are now a reference table for analytics, data science, and AI, something that is quite challenging in today’s environment. Streamlit is the framework of choice for data scientists to create applications and experiences for AI and ML. Over 1,500 LLM-powered Streamlit apps have already been built. GPT Lab is one example. GPT lab offers pre-trained AI assistants that can be shared across users.
We announced our intent to acquire Neeva, a next-generation search technology powered by language models. Engaging with data through natural language is becoming popular with advancements in AI. This will enable Snowflake users and application developers to build rich search-enabled and conversational experiences. We believe Neeva will increase our opportunity to allow nontechnical users to extract value from their data." Snowflake Q1 FY2024 Earnings Call

The key for Snowflake will be to extend their reach beyond being the data source for these new AI-driven experiences to providing the environment to also perform a larger share of the overall AI workflow. We will hear more about all these plans at Summit.

Competitive Positioning

I have written extensively in the past about Snowflake’s competitive advantages. While they cooperate with the hyperscalers to various degrees, they also compete with hyperscaler ambitions to control all data for an enterprise within their environment. Where the hyperscaler solutions overlap with Snowflake’s in many ways, the primary competitive differentiation for Snowflake has to do with the benefits they derive from being independent from the hyperscalers.

Being a neutral third party allows Snowflake to focus on the features built around the core data processing engine that generate network effects for customers. These include the most robust data sharing capabilities and comprehensive industry-specific ecosystems. Additionally, an extensive data marketplace and soon library of native apps make Snowflake the center for an enterprise to secure all their data.

The hyperscalers still make money from consumption of Snowflake resources by customers. This is because Snowflake is hosted on each of the hyperscalers. Usage of compute and storage generates revenue for the underlying infrastructure provider. This provides an additional incentive for the hyperscalers to support Snowflake’s business – it’s easy money without the overhead of supporting a product.

On the other hand, the hyperscalers recognize that they can make more money by owning the entire customer software stack. As enterprise gravity is heaviest at the data layer, the hyperscalers appear the most competitive with independent providers to control access to large enterprise data sets. This posturing relative to Snowflake has traditionally been strongest with Google Cloud Platform, followed by Microsoft Azure. AWS has has the opposite approach, actively collaborating with Snowflake to win more enterprise business overall.

As Snowflake is rapidly evolving their platform to address new use cases for AI, the hyperscalers are as well. Microsoft made some recent announcements, including the introduction of Microsoft Fabric as a new end-to-end data and analytics platform. Google is rolling out new ways to connect applications and share data across cloud providers. Yet, these efforts still represent either repackaging of existing services or rely on server infrastructure to be maintained by the customer or another partner.

More importantly, at a high level, they don’t address the risks of relying on making copies of data between partners or the implied lock-in. As an independent provider, Snowflake’s solution works seamlessly across all hyperscalers, without requiring extra hardware to be provisioned to proxy requests between providers. For data sharing between partners, there are no copies to track down if the relationship ends. The creation of shared AI models between industry partners could be facilitated by Snowflake’s Clean Rooms, exposing the minimal amount of proprietary data. Made available through a Native App within the Snowflake platform, that data wouldn’t even need to leave the environment.

With that said about Microsoft Azure and GCP maintaining an arm’s length relationship with Snowflake, on the Q1 earnings call, management actually mentioned that Snowflake’s adoption rate on Azure had been improving.

"Yeah, you know, just Microsoft relationship has been growing faster than the other two cloud platforms that we support. It’s been very clear from the top Microsoft that they’re viewing Azure as a platform, not as a sort of a single integrated proprietary Microsoft stack. And they’ve said over and over that we’re about choice, we’re about innovation. And, yes, we will compete with Microsoft from day one, and that will — and we’ve been very successful in that regard for a whole bunch of different reasons.

But people keep on coming, and that’s — and we expect that. And I think that’s sort of a net benefit for the world at large as they get better and better, you know, products and they get, you know, more choice. The good news is that I think the relationship is relatively mature, meaning that, you know, when there is friction or people who are not following the rules, we have good established processes for addressing and resolving that. And that’s incredibly important, right, as we sort of get out of the juvenile state, where things are dysfunctional at the field level. So, I have no reason to believe that that will not continue, you know, in that manner. So, I think Azure will continue to grow and grow faster than the other platforms." Snowflake Q1 FY2024 Earnings Call

Based on the CEO’s comments, it appears that Microsoft is backing away a bit from their “all in one” product strategy and opening up to collaboration with third party providers. This is similar to the transition that AWS went through over the past 5 years, where they previously deliberately competed with independent providers, even hosting open source offerings as their own product (MongoDB, Elasticsearch, Kafka, etc).

Since then, AWS realized the incremental value of offering their enterprise customers the option to use third party software services on their platform. This resulted in a better outcome than disenfranchising them by pushing an internal product. AWS appreciated that third party solutions built on their infrastructure still generated revenue from the underlying storage, compute and network services. It appears that Microsoft Azure may be arriving at a similar conclusion.

Databricks

Snowflake’s other large independent competitor, Databricks, isn’t standing still either. They too have a user conference scheduled for the end of June (perhaps deliberately). Given the two companies have major conferences during the same week in two different locations, I imagine competition for speakers was fierce. Investors can compare the featured speakers from Databricks and Snowflake and draw their own conclusions. Snowflake landed Jenson, but Databricks has Satya and Eric Schmidt.

Databricks has been moving forward in pursuing the AI opportunity as well. Back in April, they announced the release of Dolly 2.0, which they claimed was the “first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.” With the release, they delivered a dataset with 15,000 human-generated prompt and response pairs designed for instruction tuning of large language models.
These are meant to provide customers with a demonstration of how the Databricks platform can incorporate LLMs for enterprise-specific tasks, representing a starting point for summarization and content generation. They expect users to use this example to bootstrap even more powerful language models. I am sure we will learn much more at the Data + AI Summit in a week.

On the financial side, investors received an update on Databricks’ growth trajectory from an interview with their CEO by Bloomberg on June 13th. In it, the CEO revealed that Databricks reached $1B in annual revenue for their fiscal year ended in January 2023, which grew by 60% over the prior year. He further claimed that makes them “the fastest-growing software company according to our records.”

I was a little perplexed by this comment. Snowflake ended their most recent fiscal year in January 2023 as well. They reported over $2B in revenue for the prior 12 months, growing by 69% annually. So, Snowflake finished the same fiscal period with 2x the revenue and a higher growth rate. Also, consider that at Databricks’ scale, Snowflake was growing revenue even more quickly. They ended FY2022 with $1.22B in revenue, increasing by 106% annually.

Granted, the annual growth rate for Snowflake is decreasing rapidly this year, but we also don’t now how Databricks’ quarterly performance has been progressing in this challenging macro environment. I highlight these comparisons because investor sentiment associated with Databricks would imply a much higher growth rate, given the perception of how disruptive their offering should be for Snowflake. I would have estimated 100% or more annual growth at that scale. In the Bloomberg interview, Databricks’ CEO did highlight that their data warehouse product passed $100M in ARR in April.

Databricks’ growth will be something to watch, along with their continued competitive jockeying with Snowflake. However, given Snowflake’s larger scale and significant FCF generation, they can afford to funnel more money towards R&D and S&M than Databricks. At some point, that should overcome any start-up disruptive advantage that Databricks may have enjoyed.

Q1 Earnings Results

Following Snowflake’s earnings report on May 24th, the stock dropped 16.5%. This was a reaction to weak forward revenue guidance that missed expectations both for Q2 and the full year. However, over the following three weeks, the stock regained that drop, reaching a recent peak near $191 on June 15th, which was about 7.8% higher than the pre-earnings price of $177.14. In fact, this represents the ATH for 2023 and is close to the 52 week high.

Looking at the prior 12 months, we see that SNOW stock has traded in a range between about $120 and a peak of just about $200 around September of 2022. Its historical ATH price is over $400, hit back in November of 2021. It also peaked at a P/S ratio over 100 at that point, but with a much higher growth rate. Looking forward, if the macro environment improves, interest rates moderate and Snowflake’s revenue growth picks up a bit, we may revisit that price over the next couple of years.

As I will discuss, an investment in Snowflake is based on two assumptions. The first assumes that revenue growth will stabilize at some point and potentially re-accelerate slightly. Revenue growth rates have been decreasing quickly over the past 12 months and could drop into the low 30% range through 2023. This has been driven by headwinds from customer usage optimization and delays in ramping up new workloads. Because Snowflake operates on a consumption model, the impact on revenue of these dynamics is immediate.

The most acute problem has been when large customers make changes that lower their overall consumption, causing spend to move backwards. This is clearly a big impact on revenue growth, where the customer isn’t just slowing down spend, but is actually making it sequentially smaller. For a high growth company, this effect is a killer, as it quickly counteracts any benefit from other customers that are expanding their usage.

As an example, Snowflake leadership cited a few large customers who reduced the amount of data being retained, specifically citing at least one that cut back from 5 years of historical data to 3 years. This is a common, but aggressive, cost reduction tactic. The trade-off is less visibility for historical queries, but is likely acceptable for a temporary period. Over time, the data set can be built back up. For Snowflake, this immediately reduces revenue from storage of that data, as well as making large data queries fun faster. Both effects result in incrementally less usage, which means less revenue from resource consumption.

At some point, though, these negative headwinds will abate. Enterprises can only optimize usage to a point. In the prior example, while reducing historical data retention from 5 years to 3 years represented a reasonable trade-off, cutting retention further from 3 years to 1 year is very unlikely. When these negative growth trends dissipate, enterprises will return to their normal expansion cadence and may start embracing recent product additions. At the same time, new customers will be starting their cloud migrations. These could combine to re-accelerate growth. This would likely manifest as a large step-up in sequential quarterly increases.

The second assumption behind the Snowflake investment thesis is that the company will eventually reach its product vision to become the central repository for all of an enterprise customers’ data. This represents a huge TAM and could propel Snowflake’s annual revenue well beyond their $10B target. Snowflake management has already set their TAM at $248B, which I suspect will expand again as they add new AI-focused product offerings.

With that set-up, let’s look at the specifics of their Q1 financial performance and then wrap up with some take-aways for the investment plan.

Revenue

Snowflake’s sequential revenue growth slowed down substantially starting in Q4 and continuing into Q1. For the year prior from Q3 FY2022 through Q3 FY2023 (ending October 2022), sequential revenue growth was consistently over 10%, even approaching 20% in some quarters. In Q4, product revenue growth dropped out of hypergrowth, to hit 6.1% sequentially. For Q1, product revenue sequential growth ticked up slightly to 6.3%. Annualized, this would yield about 28% growth. While representing a huge slowdown, compared to other software infrastructure companies in this environment, it is fairly inline.

The problem has been the trajectory of the descent and SNOW’s valuation. A drop from the 10%+ range to 6% sequential growth is obviously large. It does appear that sequential growth may have stabilized, given this is the second quarter with product revenue above 6%. The Q2 guide even implies a little acceleration, if we assume a couple point beat over the 5.5% preliminary estimate. This could take Q2 sequential product revenue growth up to the 8-9% range, which is necessary to hit the revised full year guidance.

While the improvement in sequential revenue growth is nice to see, a P/S ratio of 26 represents one of the higher valuations in software infrastructure. This has a lot of growth baked into it, with the assumption that Snowflake will become a much, much larger company in the future. While the future is bright, the current valuation multiple doesn’t align well with the revenue growth rate. The trailing FCF multiple of about 100 provides some support. Annualizing Q1’s FCF run rate brings the P/FCF ratio down to 53.

Snowflake delivered $623.6M of total revenue in Q1, which was up 47.7% annually and 5.9% sequentially. This beat the analyst estimate for $608.7M, which would have been 44.1% growth. Product revenue was $590.1M, up slightly more at 49.6% annually and 6.3% sequentially. In Q4, Snowflake guided for product revenue in a range of $568M – $573M for 44%-45% annual growth and 2.7% sequentially at the midpoint. Leadership doesn’t guide to total revenue, but analysts calculate it from guidance for product revenue.

Looking forward to Q2, Snowflake guided for product revenue in a range of $620M-$625M, representing annual growth of 33%-34% and 5.5% sequentially. Analysts were looking for total revenue growth of 38.0%, with product revenue estimated at $646.3M for 38.6% annual growth. Snowflake obviously came in short of this.

Even with the underperformance relative to analyst estimates, we could see a slight acceleration in sequential growth for actual Q2 results. With the 5.5% guide, if we assume a similar beat of 3.4% sequentially from Q1, then Q2 sequential growth could tick up to 8.8% (from 6.3% in Q1).

The small beat in Q1 and lower guide for Q2 caused Snowflake leadership to lower the full year FY2024 guidance. This is the second time Snowflake has reduced annual guidance. As investors will recall, leadership shared a preliminary, optimistic full year target for 47% growth in product revenue as part of the Q3 earnings call. Then, in Q4, they lowered that to 39.5% growth. For Q1, they lowered it yet again to a target for $2.6B, which would represent growth of 34% annually.

Snowflake leadership attributed the need for reduction to a slowdown in customer consumption starting in mid-April and continuing through May. They are now assuming this level of consumption does not improve through the year. Hopefully, this represents a conservative baseline, which Snowflake can beat. However, two subsequent reductions in full year guidance leave analysts and investors with little confidence.

As an aside, management might consider dispensing with the full year guide at this point. Given the variability of their consumption model, it seems that projecting revenue up to 12 months out just results in disappointment. The hyperscalers and other large software providers generally project forward results for the next quarter. Another large consumption business, Twilio, eliminated their full year guide a couple years ago for the same reason.

Shifting to other growth metrics, RPO was $3.41B at the end of Q1, which is up 30.6% annually. However, it was down 6.9% sequentially from Q4’s ending value of $3.66B. Current RPO was 57% of total RPO, which is $1.94B. That is up 40.6% from a year ago, where CRPO made up 53% of total RPO and was $1.38B.

On a sequential basis, CRPO decreased by a little better 3.5% below Q4’s value. Snowflake does experience some seasonality on RPO generation. A year ago, in the transition from Q4 to Q1, total RPO decreased by 1.4% sequentially and current RPO remained about the same. To have both total RPO and current RPO decrease by that much sequentially from Q4 implies that customers are further reducing forward commitments. While in the past customers would front-load larger commitments, in the current environment, they prefer to consume existing commitments until they have to extend their contract obligation.

Profitability

Snowflake’s profitability followed revenue growth, with a bright spot around FCF generation. Non-GAAP gross margin reached 77% in Q1, up from 75% in Q4. This was attributed to improved efficiency and volume discounts from the hyperscalers.

Non-GAAP operating income was $32.6M, representing an operating margin of 5.2%. Management had projected break-even operating margin coming out of Q4, so this represents a nice beat. Management attributed the continued improvement in operating margin to economies of scale and leveraging large customer relationships. It’s generally much cheaper in terms of sales resources to expand an existing customer versus landing a new one.

Adjusted free cash flow was $287M, which increased 58% y/y and represents an adjusted FCF margin of 46%. Actual FCF was $283M for a FCF margin of 45%. Snowflake leadership commented that FCF in Q1 benefited from some advance payments. In Q4, the adjusted FCF margin was 37% and was 43% in the year ago quarter. Q1 delivered the highest FCF margin in over a year. On a Rule of 40 basis using FCF, Snowflake would score over 90.

Looking forward, the Q2 operating margin target was set at 2%. This is lower than the 5% just delivered in Q1, but higher than the prior quarter’s preliminary estimate. If we assume the same-sized beat as Q1, then Q2 actual operating margin could reach 7%.

For the full year, management made a couple of small adjustments to the profitability estimates set in the Q4 guidance. They maintained product gross margin at 76%. After setting preliminary guidance for Non-GAAP operating margin at 6% in Q1, they lowered the full year target to 5%. However, they raised the full year adjusted free cash flow margin target from 25% to 26% . These all represent minor adjustments.

While some software infrastructure peers announced layoffs this year, Snowflake has continued to invest in headcount. This is in spite of their slowing revenue growth rates. I think this reflects management’s ongoing confidence in the growth potential.

In Q1, they added 426 total employees, representing 7.2% growth sequentially and 38.4% annually. The department receiving the largest number of new hires was R&D, with 234 headcount added in the quarter, up 17.0% sequentially. The second largest department recipient was S&M with 128 additions for 4.7% sequential growth.

On the earnings call, management confirmed these trends. Looking forward, they will continue to prioritize hiring in product and engineering. For S&M, they will invest where they experience sufficient ROI. Overall, they have slowed the hiring plan for the year and expect to add about 1,000 employees in FY2024, which includes contributions from acquisitions. On the earnings call, management highlighted the acquisition of Neeva. That will add 40 employees to the company in FY2024, which are primarily in the R&D group.

Snowflake’s CFO also provided an update on the share repurchase program announced in the Q4 report. Snowflake continues to have a strong cash position with $5B in cash, cash equivalents, and short-term and long-term investments. They used approximately $192M to repurchase approximately 1.4M shares to date at an average price of $136. They plan to continue to repurchase shares opportunistically with FCF, up to the $2.6B authorized.

Customer Activity

Revenue growth pressure reflected clearly in Q1 customer activity for Snowflake. While a few software infrastructure peers saw sustained growth in customer additions and large customer counts, Snowflake experienced a bit of contraction.

Snowflake added 317 customers in Q1 to reach 8,167 total. This was up 4.0% sequentially and 28.9% annually. Q1 represented the slowest growth rate in total customer additions over the past 3 years. In the prior quarter, Snowflake added 536 customers for sequential growth of 7.4%. However, there is seasonality to the Q1 customer addition rate, which historically decreases from Q4 to Q1. A year ago, the sequential growth rate from Q4 to Q1 dropped from 9.9% to 6.1%, adding only 361 new customers in Q1 FY2023. Still, adding fewer customers this quarter than the year ago number represents a more pronounced slowdown.

Growth of $1M product revenue customers performed better. Snowflake added 43 of these large customers in Q1, which matched the number of additions from Q4. Further, this number significantly exceeded the 22 added a year ago in Q1 FY2023.

While growth in $1M+ customers is steady on an absolute basis, the sequential growth rates have been ticking down. This is impacting the dollar-based net revenue retention rate (DBNRR rate), which dropped another 7% in Q1 to reach 151%. Although an absolute DBNRR rate of 151% is best-in-class, it has fallen by 23% over the past year.

If maintained, this high DBNRR rate would provide a lot of support for elevated revenue growth. On the surface, it reflects an increase in spend by existing customers of 51% on a y/y basis. With revenue growth projected to be 34% for the full year of FY2024, we can expect the DBNRR rate to decrease further from here. The rate of descent will likely slow down and could bottom within the next year.

Investment Plan

Following the accelerated investment in cloud infrastructure during Covid and the current macro pressures on IT budgets, software service providers continue to experience headwinds from enterprises looking to constrain their spending. This has translated into the theme of optimization, where any opportunity to reduce cost is considered, even if that temporarily lowers the effectiveness of the solution. This mode has impacted all spending categories to some extent. Even security companies have reported contract delays and portioning out commitments.

Snowflake has not escaped this gravity. This pressure started in the second half of 2022. In Q1, they continued to experience headwinds from customer optimization, where enterprises are still looking for mechanisms to lower costs. Reducing data retention was the latest trick, applying a double whammy of cutting storage revenue and reducing compute consumption as queries run faster across less data.

However, these optimization tweaks can only go so far and will come to an end at some point. Further, as interest rate increases stabilize, enterprises will likely loosen budget constraints a little. These changes would return growth to the “normal” tailwinds of digital transformation and cloud migration. I don’t expect a large acceleration from prior spending levels, just the removal of these ongoing negative adjustments. Added to a normalization could be a new, secular tailwind from AI, which could drive up Snowflake consumption in a few ways.

First, new natural language interfaces would expand the user base and make complex queries easier to create. Where the Snowflake user audience was previously limited to analysts familiar with a query language, a natural language prompt allows them to retrieve large amounts of information through a simple question (or series of iterative questions). In fact, I imagine these queries will be so easy to run that Snowflake’s existing alerts for large consumption jobs will become even more critical.

Second, increased productivity from AI-powered developer co-pilots will result in more enterprise application releases. That will drive up IT budget allocation for hosting, monitoring, security and data storage. As development becomes more efficient, we might even see a shift of IT budget from developer salaries towards increased investment in infrastructure. This trend could increase the overall investment for cloud hosting and data services from enterprises.

Finally, every enterprise has the opportunity to customize publicly available foundation models with their own proprietary data to create contextual competitive advantage. This makes the consolidation, cleansing and organization of all their data even more of a priority. Snowflake is well-positioned to become the data source for these AI workloads. Additionally, their investment in data sharing, native apps, governance and industry ecosystems places Snowflake as a unique enabler of strategic AI model collaborations between enterprises.

We will find out more about Snowflake’s product aspirations at their upcoming Summit conference the week of June 26th, which may feature some big announcements including something brewing with Nvidia. Competitors, of course, are not standing still and are announcing their own new capabilities to harness the opportunity stemming from AI. Similar to the rush of investment towards cloud infrastructure over the past decade, we may see a repeat effect where multiple providers in each category benefit.

While the valuation remains high, I think Snowflake maintains the ability to generate durable revenue growth with improving cash flow margins as they march towards their $10B product revenue target. The market appears to be aligned with this view, at least so far, as SNOW stock now exceeds its pre-earnings price. It is still well below the ATH peak of $400 from late 2021.

As I already have a large allocation of my portfolio in SNOW and the stock price has recovered from the post-earnings dip, I don’t plan to add to my holdings. I will continue to monitor Snowflake’s progress, particularly as they position the company to capitalize on new demand from AI workloads. The Summit user conference may provide another catalyst for growth.
post mediapost mediapost mediapost media
investors.snowflake.com
Snowflake Reports Financial Results for the First Quarter of Fiscal 2024
Product revenue of $590.1 million in the first quarter, representing 50% year-over-year growth 373 customers with trailing 12-month product revenue greater than $1 million Net revenue retention rate of 151% 590 Forbes Global 2000 customers Remaining performance obligations of $3.4 billion, representing 31% year-over-year growth Snowflake (NYSE: SNOW), the Data Cloud company, today announced financial results for its first quarter of fiscal 2024, ended April 30, 2023. Snowflake Q1 FY24 Infographi

My $MDB trades
Performing some portfolio rebalancing. I wanted to add to my position in MDB, after their blow-out results from Q1. Ultimately, I would like MDB to be as large a position as NET, SNOW and DDOG, as I think the company will benefit from similar tailwinds as enterprise IT spend recovers and AI drives new use cases and workloads. AI also has the potential to increase the productivity of developers, resulting in more applications. These applications will require a database, monitoring, security and delivery, which should increase consumption of those services broadly.

Specific to MongoDB's outstanding Q1 report, the highlights in my view were:
  • Potential revenue growth acceleration going into Q2 (estimated growth rate already same as prior quarter - similar beat would represent upside).
  • Record total customer additions
  • Significant number of new customers are AI stand-alone companies (200 of them, including marquee names). These companies are benefiting from rapid growth and seemingly unlimited funding (similar to 2021).
  • Huge improvement in profitability measures. FCF margin hit 14%. Large beats on Non-GAAP EPS.

The only problem is that the stock is getting pricey, but probably still has some upside over the long term. I will continue to add on dips where possible.

@heyrico06/15/2023
love to see record total customer additions with revenue expansion (unsure of their net revenue retention, but assuming it's pretty good?)
+ 1 comment
My $CFLT trades
Performing some rebalancing of the portfolio after recent quarterly results. My allocation to Confluent was lower than other positions. I wanted to increase that by rotating some funds out of S. I have written extensively about Confluent and think they have a lot of potential to capitalize on the demand for data to fuel new AI-driven capabilities. They are the only stand-alone commercialized data streaming provider, piggy-backing on the broad penetration of Apache Kafka. Their Cloud product is growing faster (89% in the last quarter) than overall revenue and will contribute almost 50% of total revenue by end of this year. Additionally, they are progressing to bring the Apache Flink product offering (from Immerok acquisition) to GA later this year. That should ultimately deliver an addressable market that could be as large as core data streaming.

The stock has appreciated significantly this month, so I am not adding too much yet. There was also some fall-out from their Investor Day, in which analysts concluded that management lowered their long term revenue guidance. The CFO simply committed to a minimum target for 30% growth over the mid-term, versus >30% at time of IPO. In the Q&A session, the CFO clarified that they could exceed the 30% growth rate, with AI, FedRamp and any macro improvement being catalysts. In this environment, I have no issue with some conservatism in a long term guide.

I still haven't bought $CFLT but continue to keep an eye on it. Given valuation here I would look for a dip before nibbling on shares.

Image upload
+ 1 comment
My $S trades
Performing some rebalancing of the portfolio after recent quarterly results. SentinelOne delivered a disappointing quarter, highlighted by some accounting issues in their ARR calculation. The investment thesis is still intact for this company, but it may take longer to realize. Their use of AI/ML to automate more aspects of security monitoring and response results in a low TCO. But, in this spending environment, CIOs are looking for consolidation opportunities to lower costs, like a broader platform offering from PANW or will settle for bundled solutions from MSFT. This posture will change as macro improves and SentinelOne expands their offerings. Additionally, security spend will remain resilient.

In the meantime, I took advantage of the price recovery to sell some of the position and re-allocate it to MDB and CFLT. Given the demand for data to fuel new AI-driven capabilities, I think these two companies (and SNOW) are well positioned.

Datadog Q1 2023 Earnings Review $DDOG
Datadog stock surged 45% over the month of May, following their earnings report on May 4th. The results aligned with the common theme of “better than expected”, shared with several other software companies reporting results recently. This outperformance appears to have set a baseline across the software sector, with upward momentum building as more companies report results. A new tailwind has been excitement around the potential for AI to drive an incremental demand cycle for software and security infrastructure.

While AI holds promise, it will require several quarters or even years to play out. In the meantime, enterprise IT spend moderation, workload optimization and deal scrutiny have blunted the continuing secular trends of digital transformation and cloud migration. The market is eagerly trying to anticipate when optimization headwinds might abate, which could drive a re-acceleration of growth. AI’s impact on software infrastructure, if it materializes, would be through more consumption of supporting services as additional applications and digital experiences are brought online.

Datadog is navigating these same trade-offs. Over the last few quarters, their results have been impacted by the slowdown in cloud migration, workload optimization and even spend reduction in products with variable consumption like log retention. To account for these factors, management set 2023 revenue guidance conservatively, projecting just 24% annual growth this year, down from 63% in 2022.

While this represents a huge deceleration in growth, the market is looking for signs that revenue growth in the 20% range may represent the bottom. That explains why a slight beat to earnings estimates is generating an outsized reaction. Datadog stock jumped over 14% the day after the earnings report. One side effect of the revenue growth slowdown has been an increase in profitability. Datadog, and other software companies, began moderating staffing and other operational costs in anticipation of a slowdown. These reductions, compounded by revenue outperformance, are driving higher operating margins.

In at least one positive sign around the demand environment, software companies are still reporting “record customer pipelines”, with new customer additions roughly tracking with prior quarters, albeit on the lower end. The challenge has been in extracting larger contracts from those existing customers in the near term.

In Datadog’s case, their most important business metric, in my opinion, has been resilient. That metric is the growth in customers adopting multiple Datadog products. As the Datadog team keeps expanding the platform offering into new areas like security and developer experience, it’s critical that customers continue to add these product subscriptions to their contracts. If they weren’t, then Datadog’s outsized growth potential would be significantly limited. For Q1 at least, growth in customers subscribing to 2 or more, 4 or more and 6 or more products progressed almost linearly. Further, management shared anecdotes of customer contract renewals with subscriptions of 10 or more products, topping out at 14 for a large FinTech company in a 7-figure upsell.

In this post, I review Datadog’s product strategy and competitive advantages, along with how they are positioned to benefit from broader application growth from AI. I will parse the Q1 results and discuss the relevant trends that are likely driving the surge in stock price over the past month.

Datadog's Product Strategy

At a high level, Datadog’s product strategy can be boiled down to bringing as many “observability” experiences into the same interface as possible. I put observability in quotes because the term generically refers to instrumenting any system or process to improve its visibility and performance. This has been applied to software infrastructure monitoring, but could as easily expand into any aspect of business operations. We have been conditioned to apply the context of infrastructure metrics, logs and traces because the software industry absconded the term.

Datadog’s primary purpose is to break down organizational silos by encouraging disparate teams to view their business through a common interface. Datadog’s founders came from two departments that had traditionally been at odds – software development and operations. By getting these two teams to work from a common dataset and single “pane of glass”, Datadog removed significant friction from the organization, resulting in faster troubleshooting, easier decision making and better performance.

This grounding in organizational cohesion is important to consider as investors evaluate Datadog and their product roadmap. At founding, their goal was not to combine the three pillars of software observability (logs, traces and metrics) into one tool. That was the means, not the end. Rather, it was to help the software development and operations teams collaborate to meet their business goals by instrumenting their digital operations within a single platform.

This approach has several benefits:

  • Value of Single Experience. Starting with infrastructure, APM and log management, Datadog proved that DevOps teams value having all types of monitoring in one consolidated view. This was disruptive at the time, as most providers focused on just one (Splunk for logs, New Relic for APM, etc.). Since then, Datadog has rapidly expanded into many other important aspects of instrumenting and securing modern digital experiences. This expansion has pulled in other potentially silo’ed teams, like security, product and even finance. All these users benefit from making decisions based on data in a common tool.
  • Consolidation Argument. With pressure on IT budgets, enterprises can reduce cost by replacing several point solutions and DIY projects with the Datadog platform. When each organizational silo requisitions its own toolset, redundancy quickly explodes. Inefficiency increases as multiple teams try to reconcile different interpretations of data stemming from having separate tools. Datadog’s goal of breaking down organizational silos gets everyone on the same page and has the side effect of reducing tool redundancy.
  • Focus Leads to Depth of Capabilities in Each Segment. In addition to incorporating new modules into a unified experience, Datadog is continuing to improve each existing module. After introducing APM in 2017, Datadog progressed up the Gartner MQ over several years, moving from Challenger to Leader, passing legacy providers in the process. They are applying this to each segment they occupy. Discerning buyers appreciate the difference. When minutes of downtime can cost tens or hundreds of thousands of dollars, why would a DevOps team skimp on their monitoring tools?

Some investors and analysts worry that Datadog is too expensive. Yet, enterprises are willing to pay a lot of money for these benefits, as they save the organization time and reduce costs. Some Datadog customers spend more than $10M annually across multiple product modules. On the surface, paying $10M+ a year seems more expensive than spinning up a few open source tools. However, no open source project can address all of the solutions that Datadog offers, forcing teams back onto disparate and often conflicting datasets and interfaces. The total cost of ownership in staff to run all of those tools quickly reaches the expense of paying Datadog to do it.

This collapsing of observability into one interface continues as Datadog charts its product development plan going forward. After getting developers and operations onto the same page, the next logical step was to address security use cases. Datadog scopes security to the same applications and cloud workloads that its other tools monitor. Since the Datadog agent is already collecting relevant data, security use cases for cloud workloads are a natural extension. The key is that addressing these facets of application and workload security bring the security team members into the same conversation as the developers and operations personnel.

Datadog’s foundation in observability provides them with competitive advantages as they enter the security space. They are selectively applying these in the security segments that have a clear parallel. Fundamentally, the large volumes of data they collect in metrics (Infrastructure), traces (APM) and logs (Log Management) can be uniquely applied to improve the effectiveness of their Cloud Security, Application Security and SIEM products.

Datadog is also being thoughtful about where it chooses to compete in the security space. For example, they don’t intend to address endpoints (like employee devices) outside of cloud infrastructure. That makes sense, as it drifts beyond their competitive moat. Because of this, Datadog wouldn’t be relevant as the full-featured security platform for a Global 2000 company with a large employee base that has a smaller application footprint. On the other hand, a digital native that derives the majority of its revenue from online operations (Airbnb, Uber, Doordash, etc.) would find the combined security and observability platform appealing. New stand-alone AI companies fall into this buyer category as well.

I think it is this focus on the relevant aspects of security within the right organization types that has led to Datadog’s success so far. In the Q1 earnings report, the CEO announced that their security products have been adopted by over 5,000 customers. That represents pretty rapid progress as the bulk of their cloud security offering was introduced during 2021-2022. Management has signaled that there is much more to come in security and I expect this to grow into a major contributor to revenue in the future.

Competitive Advantages

Datadog’s competitive advantage lies in their rapid and highly efficient product development process. They build and release new product offerings faster than any of their competitors. Their speed of module development, roll-out and iterative improvement is only accelerating.

What is truly unique about Datadog is the efficiency of their sales motion. For the full year of 2022, approximately 75% of incremental revenue over the prior year was attributable to growth from existing customers. This is a consequence of Datadog’s robust land and expand model. Not only do customers increase their utilization of product module subscriptions as their businesses grow, they add new modules at a fairly linear rate. And, Datadog is continuously creating new relevant product modules for them to consider in the future. I like to call this flywheel for Datadog “land and expand… and expand.”

Investors sometimes get hung up on Datadog’s ability to enter adjacent product categories and gain customer traction. They worry about Datadog’s relevance in each and their ability to compete. They compare feature sets or point out how an open source tool could address some monitoring or security use case more cheaply. Yet, they forget the overarching goal – to reduce friction and speed decision making by getting all teams onto the same toolset. This efficiency of a consolidated view trumps any savings from implementing a point solution or open source project.

From experience, I can tell you that this cohesion is a big benefit. Organizations can waste excruciating cycles trying to reconcile the interpretation of operational or security data generated by two different toolsets. Additionally, even if the dataset is shared, having multiple user interfaces to track is highly inefficient, as users toggle from one screen to another. With Datadog, all data and visualizations are in one interface, allowing users to drill in, out and across any indicator seamlessly.

More directly, if there were a problem with Datadog’s product expansion strategy, we would see it in the numbers. Specifically, Datadog management regularly reports the percentage of customers who adopt two or more, four or more and six or more product modules. These percentages continue to progress in an almost linear fashion. If the Datadog product team were selecting irrelevant categories or getting beat out by open source or competitive offerings, these expansion rates would slow down.

Datadog currently lists 20 products for sale on its Pricing page. This is up from 10 at the beginning of 2021. Yet, over that period, the product adoption expansion rates for 2+, 4+ and 6+ subscriptions have not changed. This continued over 2022 and into Q1 2023. Anecdotally, management even mentions customers with more than 10 products. On the Q1 earnings call, the CEO highlighted a large FinTech company that now subscribes to 14 Datadog products and spends 8-figures in ARR (>$10M).

Given their mission to break down silos and create organizational efficiencies by delivering a single platform for managing digital operations, Datadog has plenty of adjacent product offerings to address. The product team clearly has the insight to identify and execute on adjacencies, as evidenced by ever expanding product adoption rates. Yet, Datadog has another competitive advantage that allows them to continue to build and improve product offerings at a rapid clip. This has to do with their capital allocations to R&D and S&M.

As a consequence of their highly efficient sales motion, Datadog can afford to spend significantly more on Research and Development than Sales and Marketing. In Q1, R&D spend was 1.58x larger than S&M on a GAAP basis (1.24x on a Non-GAAP). Comparing to Datadog’s competitive set, they are the only company that can do this. Dynatrace spends less than half as much on R&D as S&M (0.49x on a GAAP basis) to achieve a lower revenue growth rate. Splunk is a little better, allocating $0.58 to R&D for every $1.00 spend on S&M. Elastic spends $0.66 on R&D for every dollar in S&M.

The same circumstance generally applies to other companies in software infrastructure. For example, Crowdstrike employs a similar expansion motion of selling additional product module subscriptions to existing customers. Yet, they spent $281.1M on S&M and $179.0M on R&D (0.64x R&D to S&M). Datadog’s favorable ratio also benefits them on an absolute basis, with $229.5M spent on R&D in Q1. This has almost caught up to Splunk’s $236.9M total allocation to R&D in their most recent quarter, even though Splunk’s total revenue was more than 1.5x larger.

This is all to say that Datadog appears to have developed a very efficient GTM model, that supports high revenue growth and elevated investment in R&D. The latter is critical to continue expanding Datadog’s addressable market, supporting a long runway of growth. It also provides the foundation for their value proposition in consolidation of tools onto a single platform, allowing all teams within an organization to share a common view of the business.

Going forward, a lot of product development effort will be invested in rounding out capabilities across DevSecOps, with security and developer tooling getting the majority of attention. On the Q1 earnings call, the CEO mentioned that Datadog now has over 5,000 customers using their cloud security products. I would expect a similar trajectory for developer experience. We will likely see another product grouping introduced over the next year, as the last one was launched in 2021.

AI Tailwinds

AI is the buzzword of the day currently. It seems that every SaaS provider is trying to work an AI story into their narrative to gain some of the momentum in this space. On their earnings call and subsequent analyst events, Datadog highlighted a couple of ways they would benefit from the rush to harness AI. Some of these themes were not isolated to just Datadog, however, as competitors like Dynatrace and peers like MongoDB made similar arguments.

More applications to monitor. Multiple software infrastructure companies point to the expectation that generative AI will make developers more productive. This will drive an increase in the number of applications created. All enterprises have a long backlog of software projects to work through. If these can be built faster, then there will be more applications to be hosted, secured and monitored. Additionally, if developer resources are more efficient, then the cost of development will decrease, allowing more IT budget to shift to cloud infrastructure.

"From a market perspective, over the long term, we believe AI will significantly expand our opportunity in observability and beyond. We think massive improvements in developer productivity will allow individuals to write more applications and to do so faster than ever before. And as with past productivity increases, we think this will further shift value from writing code to observing, managing, fixing, and securing live applications." Datadog Q1 2023 Earnings Call

New AI companies as customers. Datadog and other software infrastructure providers experienced a surge in demand during Covid-19 from the many new companies created to address problems specific to the pandemic. These included services like food delivery, telehealth and remote fitness. All of these companies were flooded with VC investment and encouraged to grow quickly to pursue a large audience.

A similar flood of investment is flowing into new AI companies. While their core AI software stack may be different, the digital experiences delivered over the Internet to end users still need to be secured, accelerated and monitored. Datadog can provide observability for these AI services. On the earnings call, Datadog’s CEO kicked off his typical customer wins segment by highlighting an expansion deal with a leading AI company that brings ARR into the 8-figures (> $10M).

There are probably more. As another example, MongoDB called out 200 new customers in Q1 that were “AI or ML companies.” This helped contribute to their record total customer additions for the quarter. Cloudflare has cited OpenAI as a large customer, recently surpassing $1M in annual spend. These additions are all incremental to the ongoing secular trend of enterprise digital transformation and cloud migration. Twelve months ago, these companies didn’t exist or had negligible spend.

AI Specific Products. In May, Datadog announced a new integration that monitors OpenAI API usage patterns, costs and performance for various OpenAI models, including GPT-4. Datadog’s observability capabilities simplify the process of data collection through tracing libraries so that customers can easily and quickly start monitoring their OpenAI usage. Developers can now gain visibility into ChatGPT usage, latency and token consumption in order to optimize the performance, end user experience and cost of their AI applications. Products tailored to AI service monitoring and contextual use cases provide more opportunities to monetize the growth of AI.

Q1 Earnings Results

Datadog reported Q1 earnings on May 4th, before markets opened. While the results weren’t a blow out, they were better than expected. DDOG stock gained 14.5% that day. Since then, the stock has continued appreciating, recently hitting a new high for all of 2023 over $98. The stock is up about 34% for the year. To be fair, prior to earnings, DDOG stock was trading at a nearly 3 year low in the $60’s. We have to go back to early 2020 for DDOG to trade in this range.

While Datadog stock is up nicely from these lows, it is still far below its trading range for 2021. Like other software infrastructure stocks, DDOG peaked in November 2021 at $196. Of course, its P/S ratio was also extreme, pushing above 60 at that point. Currently, Datadog’s P/S ratio has dropped to a bit more reasonable value of 17, but is much higher than the 11 ratio it touched in late April. The big question for investors will be if Datadog can continue appreciating over the next couple of years, potentially getting back to that ATH price from 2021.

Revenue

Datadog’s Q1 revenue performance represented a reasonable beat and small raise. I suspect the market was expecting a worse result. After growing by 7.5% sequentially in Q4, Datadog’s Q1 revenue was up only 2.8% or by $12.3M over the prior quarter. There is a little seasonality at play here, as Q1 sequential growth has been lower than Q4 for the prior two years. Nonetheless, that level of sequential growth represents a new low for Datadog.

Q1 revenue was $481.7M, which was up 32.7% annually. This beat the company’s prior guidance for $466M – $470M, which would have delivered no sequential growth from Q4. Ironically, the actual Q1 result was about what analysts had estimated prior to the Q4 results, where Datadog initially set Q1 and FY2023 guidance below analyst expectations.

It is noteworthy that Datadog’s Q1 result includes a $5M impact from the service outage experienced in March. This was a fairly unusual event, where Datadog services were unavailable for nearly a day. If a day is 1/90th of the quarter, then a $5M impact makes sense. If it weren’t for this $5M hit, Datadog would have delivered $486.7M in revenue for Q1, up 34.1% annually and a slightly better 3.7% sequentially.

For Q2, Datadog leadership set preliminary revenue guidance in a range of $498M – $502M for growth of 23.2% annually and 3.8% sequentially at the midpoint. This hit the analyst estimate for $502M at the top end of the range. Some analysts called out the slight sequential acceleration implied. With a preliminary guide for 3.8% growth over Q1, and a similar beat as for Q1, Datadog’s sequential revenue growth would approach 7%. This is needed to reach (and hopefully beat) the full revenue target.

For 2023, management raised the revenue target by $10M to a range of $2.08B – 2.10B, representing annual growth of 24.8%. This beat the analyst estimate for $2.085B by $5M. While it was good to see a full year revenue raise, the $10M bump was less than the approximately $13M beat on the Q1 revenue guide. Management wants to be conservative in not raising the full year guide.

Summarizing the revenue performance, Datadog management highlighted the ongoing pressure from workload optimization and budget scrutiny. They did point out that Q1 usage growth was better than in Q4, but still below levels during the rest of 2022. In the Q&A session, leadership also said that April trends were roughly inline. During subsequent analyst events in May, they didn’t flag any further deterioration (or marked improvement).

"Overall, we experienced business conditions that were similar to the previous several quarters. In Q1, usage growth from existing customers came in roughly as expected. We saw existing customer usage growth in Q1 improved from the levels we saw in Q4 but remain a bit lower than the levels we experienced in Q2 and Q3. And as in recent quarters, we continue to see customers optimize their cloud spend, particularly those further along in their cloud migration and hosting a larger portion of their infrastructure in the cloud." Datadog Q1 2023 Earnings Call

Management is not anticipating a macro recovery or an end to workload optimization in setting their full year guidance. I think they are being sufficiently conservative and hope that optimization headwinds ease in second half of 2023, allowing growth to resume to more organic rates. These would be driven by the continued pace of cloud migration and digital transformation, versus having to counteract negative adjustments as customers find new ways to reduce spend.

On the earnings call, management made a few other points relative to growth:

  • New logo acquisition and bookings in Q1 were solid, keeping in mind that Q1 is a seasonally slower quarter.
  • Total ARR exceeded $2B for the first time. The APM suite and log management products together exceeded $1B in ARR. This demonstrates the expansion of Datadog’s business well beyond their first infrastructure monitoring product and successful execution on the broader observability platform.
  • Continued to make steady progress in cloud security with growth in ARR and in customers. Announced more than 5,000 customers using cloud security products.
  • Observed the slowest growth in the consumer discretionary vertical, particularly in e-commerce and food delivery.
  • Faster year-over-year growth in international than in North America.

Billings and RPO provide some supporting signals, but management cautions against relying on them too heavily. Billings were $511M in Q1, up 15% annually. However, adjusting for a large upfront bill in Q1 2022, billings growth would have been in the low 30% range year/year.

Total RPO reached $1.14B in Q1, up 33% annually. This increased by $80M or 7.5% over the total RPO value of $1.06B at the end of Q4 (which was up 30% y/y). Current RPO growth was in the “high 20%” range. So, total RPO grew faster in Q1 than Q4, while current RPO growth slowed a little. The large sequential growth in total RPO likely reflected some of the record bookings mentioned earlier on the call.

The CEO made an interesting reference at the end of his prepared remarks, underscoring the long term growth view and how Datadog plans to keep expanding into it.

"Now, switching gears. Let me speak to our longer-term outlook. Overall, we continue to see no change to the multiyear trend toward digital transformation and cloud migration. We do continue to see customers optimizing their cloud usage, and visibility remains limited as to when this optimization cycle will end, but we firmly believe it will. As before, we remain confident that we will continue to deliver value to more customers in their digital transformation and cloud migration journeys. And it is increasingly clear with each wave of technical innovation that every company in every industry and every geographic region has to take advantage of the cloud, microservices, container, generative AI, and more. By relentlessly broadening the Datadog platform, we will continue to help our customers save on costs, execute with better engineering efficiency, drive competitive differentiation and deliver value to their own customers." Datadog Q1 2023 Earnings Call

This highlights the investment thesis for Datadog. While their current financial performance is lagging, the market is likely anticipating improvement later in 2023 and early 2024. This anticipation hinges on a few assumptions:

  • At some point, the optimization of existing cloud workloads by large enterprises will taper off. It is not possible to optimize forever, and optimization generally involves one time adjustments to resource consumption to utilize services more efficiently. Once the wave of optimization ends, then the headwind of negative revenue growth will dissipate.
  • As Datadog continues to add new products and customers increase the number of product module subscriptions, Datadog will capture a larger share of customer spend. This incremental capture will compound the growth re-acceleration when enterprises end their optimization and return to a normal spending posture.
  • As I discussed in the product strategy section, AI should further drive more spend on cloud infrastructure. New AI-infused applications will still require monitoring. Additionally, increased developer productivity should result in many more applications being created. With fewer developers producing more applications, IT budget can shift from developer salaries to infrastructure services to manage all the applications being produced.

These factors could generate a favorable set up for Datadog in the second half of 2023 and going into 2024. On an annual basis, revenue comparisons will get easier. Additionally, sequential revenue growth rates should start to accelerate. We may not get back to the spending heyday of 2021 (unless AI really blows things up), but should see sustained revenue growth move back into a healthy 30% range for the near future.

Profitability

Coming out of Q4, Datadog management guided for Q1 Non-GAAP operating income of $68M – $72M for about 15% operating margin at the midpoint of revenue guidance. Datadog actually delivered $86.4M for an operating margin of 18%. This drove $0.28 of EPS, which beat the analyst estimate for $0.24. Operating income was up sequentially from Q4’s value of $83.1M, which was also an operating margin of 18%.

This outperformance on operating income was aided by continued moderation in spending. Q1 non-GAAP operating expense grew by 45% y/y, which was down 900 bps from the prior quarter. Datadog management said they are continuing to increase headcount, particularly in R&D and GTM functions, but at slower pace than previously. They expect OpEx to grow by 30% for the fully year, which implies that Q4 will reach the low 20% range.

This expense management helped free cash flow surge in Q1. Free cash flow was up substantially, hitting $116.3M in Q1 for a FCF margin of 24%. In Q4, FCF was $96.4M for a margin of 20.5%. On a Rule of 40 basis using FCF, Datadog logged a 57 in Q1. This is down from Q4’s 64, with the revenue deceleration contributing to most of the gap.

Looking forward, Datadog management expects to deliver Non-GAAP operating income of $82M – $86M in Q2, representing an operating margin of 16% at the midpoint. Given the outperformance from Q1, the actual value could hit $100M for an operating margin reaching 20% of the midpoint of revenue guidance. For the full year of 2023, management raised the expected range of operating income by $40M from $300M-$320M to $340M-$360M. At the midpoint, this would represent an operating margin of 16.7%.

For Q2, this translated into a Non-GAAP EPS guidance range of $0.27 – $0.29. That beat the analyst estimate for $0.26. For the full year of 2023, management raised the EPS target from $1.02 – $1.09 set in Q4 to a new range of $1.13 – $1.20. This represents a raise of $0.10 or about 10%. While we tend to focus on the beat/raise cadence for revenue growth, it’s nice to see increases to earnings forecasts as well. This outperformance on the profitability side probably helped bolster the post-earnings market response.

Customer Activity

Customer activity continued to reflect pressure on IT spending, but also showed a nice uptick in product module adoption. Datadog total customer growth surged in Q1 to 25,500, which was up an amazing 2,300 q/q. However, 1,400 of those new customers came from the Cloudcraft acquisition. This means that Datadog added just 900 customers organically for growth of 3.9% sequentially and 21.7% annually.

This falls below the 1,000 total customer additions that Datadog logged for the past two quarters. We have to go all the way back to Q2 2020 for total customer growth to drop below the 1,000 mark. With that said, an incremental 1,400 new customers from Cloudcraft to target for cross-sell is a good thing and perhaps some small percentage of those might have been net new additions as well outside of the acquisition.

Shifting to large customer additions, Datadog ended the quarter with 2,910 customers with ARR over $100k. This represented an increase of just 130 over Q4 for 4.5% sequential growth. Like with total customers, we have to go back to 2020 to see a quarterly increase this low.

The calculation of this metric could explain the magnitude of the drop. The threshold of $100k ARR is determined by annualizing the monthly recurring revenue for each customer in the final month of the quarter (March). MRR is determined by aggregating “revenue from committed contractual amounts, additional usage, usage from subscriptions for a committed contractual amount of usage that is delivered as used, and monthly subscriptions.” This means that any reduction in usage (or lack of additional usage) for just that month would impact the total large customer count. Given trends with ongoing customer optimization, this sudden drop doesn’t surprise me. I suspect that a number of customers are hovering around the $100k threshold and either dropped out this quarter, or didn’t make the final step up.

Datadog’s 12-month dollar-based net retention rate (NRR) remained above 130% for the quarter, as customers increased usage and subscribed to more Datadog products. At the same time, we don’t know the exact value, but can assume NRR has been decreasing over time. We got a hint that this is the case, as management said that they expect NRR to drop below 130% in Q2. This would be the first quarter ever in which that happens.

Fortunately, Datadog’s dollar-based gross retention rate remained stable in the mid to high 90% range. This implies that customers are not churning off the platform or switching to competitive solutions at an accelerating rate. If gross retention is fixed, then a drop in NRR would be caused by a slowdown in customer spend expansion.

As I discussed in the product strategy section, a big part of Datadog’s growth thesis revolves around their ability to continue adding adjacent product offerings to their platform of services. These products appeal to the same audience with the underlying theme of bringing disparate teams together onto a single dataset and view of their digital operations. These offerings started with traditional observability functions like infrastructure monitoring, APM and log management.

From there, Datadog branched out into other aspects of software infrastructure, then digital experience monitoring, security and most recently developer tooling. Each of these product offerings is sold separately, allowing customers to construct a product subscription that works best for them. Datadog has long eschewed a packaged subscription for all products or other monetization model under the belief that their customers prefer to purchase Datadog offerings on an a la carte basis. All products with pricing are listed on the Datadog web site.

To track the success of this land and expand motion, Datadog provides metrics on the percentage of customers who subscribe to multiple product modules. These continue to grow at a nearly linear clip. For Q1, management again reported the percentage of customers subscribing to 2 or more, 4 or more and 6 or more products.

Interestingly, the sequential growth in the percentages of customers for each segment (2+, 4+, 6+) actually slowed down for the first time this quarter. Previously, these percentages had been increasing fairly steadily, with 1% gains for the 2+ segment and 2-3% increases for the 4+ and 6+ product module segments. In Q1 2023, the 2+ segment percentage remained the same at 81%. The 4+ and 6+ segments only increased by 1% sequentially.

At first, I thought this represented cause for concern, as a reflection that expansion of product module adoption was slowing down quickly. Datadog’s revenue growth over the past year has been constrained by large customers trying to optimize their spend on existing module subscriptions, like reducing consumption of variable services such the duration of log retention. In spite of this, they would continue adding new module subscriptions at a linear rate. Until Q1, or so I thought.

In reality, the large jump in total customers in Q1, due to the 1,400 inherited from Cloudcraft, skewed the percentages. If we calculate the total customers in each segment, we find that the sequential additions of customers with 2+, 4+ and 6+ product module subscriptions actually accelerated nicely over Q4.

This demonstrates that Datadog’s product development roadmap is right on track. With 20 product modules with individual pricing listed on their Pricing page (up from 10 in January 2021), customers have a lot of options to choose from. In addition to landing new customers and growing usage of products already under subscription, Datadog’s NRR is driven by customers adding new module subscriptions.

As long as Datadog keeps building new product modules for customers that are relevant and replaces some existing tool (commercial or open source), then Datadog will have a continued path for revenue growth. Seeing the count of customers subscribing to 2+, 4+, 6+ and maybe one day 8+ or more product modules increase in this linear fashion gives investors confidence about the durability of Datadog’s revenue growth.

Investment Plan

Overall, Datadog delivered Q1 results that were better than expected. While pressure from enterprise budget scrutiny and cloud workload optimization continues, we may be approaching the peak of those headwinds. As the year progresses, moderation of workload optimization would allow software infrastructure providers like Datadog to resume growth rates driven by ongoing digital transformation and cloud migration initiatives.

Additionally, Datadog continues their strong land and expand… and expand motion. They add new customers, increase usage of existing module subscriptions and maintain new module adoption at healthy rates given the macro environment. After doubling the number of products listed for sale from 10 in January 2021 to 20 now, Datadog’s customers have not slowed down their adoption of new module subscriptions. The growth in total customers with 2+, 4+ and 6+ product subscriptions ticked up in Q1. Management even reported one $10M+ ARR customer with 14 product subscriptions.

While growth has moderated, profitability measures are tracking well. Datadog raised operating income targets for the full year and are delivering healthy margins. This is being driven by appropriate cost management, including a tempering of headcount growth. In Q1, Datadog still logged a 57 on the Rule of 40 measure using FCF margin.

Looking forward, I think Datadog can maintain strong revenue growth with favorable margins. While they started in software infrastructure observability, they are rapidly pursuing adjacent product categories, including solid traction in security and developer experience. Given that observability broadly applies to the instrumentation of any business process, I suspect that Datadog will announce some interesting new product categories over the next few years.

These factors all provide a favorable set-up going into 2024. While the stock has appreciated nicely in May, I don’t consider it grossly overvalued at this point. I plan to maintain my position as a “hold”. If the price were to drop significantly, I would add to my position. For new investors, you could look for opportunities to buy DDOG on temporary pullbacks.
post mediapost mediapost mediapost media
www.datadoghq.com
Pricing | Datadog
Flexible, transparent pricing designed to scale with your business

@stackinvesting Great update, Peter! Thanks for sharing. Do you think there is a limit at all on the number of product categories Datadog $DDOG and similar type of companies can add to their offerings within the next 5 years?
Add a comment…
Confluent Q1 2023 Earnings Review $CFLT
Following their Q1 earnings report on May3rd, Confluent stock jumped by 16% the next day. Since then, CFLT has continued appreciating, recently passing their previous high for 2023. While the report itself was pretty good (but not outstanding), the market appears to be anticipating more growth to come. Perceived AI tailwinds are likely at play. In order to capitalize on the potential advantages from advanced insights and new proprietary AI models, enterprises need access to all their data in one place. It should be filtered, consistent and recent. As the leading independent provider of data streaming, and soon stream processing capabilities, Confluent is well positioned to address this demand.

Fortunately, Confluent doesn’t have to convince most enterprises of the value of real-time data streaming. Over 75% of the Fortune 500 already use Apache Kafka at some level to accomplish this. Confluent’s task is to demonstrate that their data streaming platform, which offers many enhanced capabilities over self-managed open source Kafka, is worth the incremental cost. While this may have been a more difficult sell in the corporate data center, Confluent Cloud provides enterprises with a managed solution on their hyperscaler of choice, eliminating the need to maintain a large team of operations engineers with Kafka expertise.

Additionally, stream processing tools allow data engineers to filter, transform and aggregate data in flight, front-loading processing before it arrives at its destination. The most popular open source solution for stream processing is Apache Flink. With their acquisition of Immerok in January, Confluent is now integrating a managed Flink stream processing toolset into their core Cloud platform. Management expects that revenue from stream processing could eventually match that of core data streaming, effectively doubling their TAM.

While their Q1 results continued to reflect the pressure of elongated deal cycles and enhanced scrutiny, Confluent managed to deliver subscription revenue growth over 40% y/y. For the remainder of 2023, they maintained revenue targets at conservative levels. Profitability measures dipped in Q1 due to a couple of one-time charges, but are tracking towards break-even operating margin in Q4. That will represent another 2000 bps of annual improvement. Long term, the Confluent management team sees FCF margin reaching 25%.

Revenue growth is primarily being driven by expansion of Confluent’s largest customers, with continued increases in $1M+ commitments and even anecdotal references to $5M-$10M of annual spend. Confluent’s NRR rate remains above 130% overall, with NRR for just Cloud well above that. Confluent needs to keep the new customer pipeline flowing, as focus has shifted to Kafka migrations and workload expansions within existing customers.

Confluent’s revenue target for this year represents about 1% of their calculated $100B addressable market, leaving plenty of room for future expansion.
In this post, I review Confluent’s product strategy and discuss several new announcements. Most exciting is the early access program for Confluent’s stream processing offering. Then, I will parse the Q1 results and discuss the trends that appear to be driving the recent outperformance in CFLT stock. For interested readers, I have published a couple of prior posts on Confluent, which provide more background on the investment thesis.

Confluent’s Product Strategy

As enterprises consider how to leverage new AI models to generate better business insights, they are realizing they have access to mountains of proprietary data internally. Competitive advantage in the effectiveness of AI is conferred to the organization with access to the richest dataset. For companies in specialized fields, like healthcare, communications, transportation, finance, media and others, their proprietary data sets can perform better than AI models loaded with data generally available on the Internet (if that even exists in some cases).

In order to harness this data, the first step for enterprises is to ensure it is readily available in a central store. Besides providing an argument for a modern data warehouse, centralizing data requires establishing feeds from all the hundreds of systems and applications that produce it. Ideally, data flows are as up-to-date as possible, with filtering, transformation and aggregation applied in flight. This system of real-time data pipelines can be served by a data streaming platform, with stream processing support built into it.

Fortunately, this use case has been addressed for many years by Apache Kafka for data streaming and Apache Flink for stream processing. These solutions have been adopted by large enterprises, with Kafka already enjoying greater than 75% adoption by the Fortune 500. Flink is a popular solution with enterprise developers as well, and complements Kafka nicely.

Confluent sells a cloud-native, feature complete and fully managed service that provides development teams with all the tools needed to support real-time data streaming operations. The platform has Apache Kafka at the core, supplemented by Confluent’s additional capabilities to make Kafka easier to scale, manage, secure and extend. Some of these value-add features include visualization of data flows, data governance, role-based access control, audit logs, self-service capabilities, user interfaces and visual code generation tools.

In order to deepen their offerings for stream processing, Confluent announced their intent to acquire Immerok back in January. Immerok was founded in 2022 and includes the leading contributors to Apache Flink. Immerok’s primary product is a cloud-native version of Apache Flink. As a fully managed, serverless offering, Immerok Cloud provides customers with an easy way to run Apache Flink without setting up their own infrastructure.

The Confluent team has been rapidly working to integrate a managed service for Flink into the Confluent Cloud offering. They recently announced an early access program for Confluent’s new managed services with Apache Flink. When discussing the acquisition, Confluent management predicted that the revenue stream from the Apache Flink managed service could approach that for core Kafka over time.

All of these features are designed to simplify the effort of streaming data from multiple sources to multiple destinations within an enterprise’s data infrastructure. Further, stream processing allows data engineering teams to act on the data while it is in flight, preventing delays in applying transformations on arrival.

As more producers and consumers are introduced into an organization’s data infrastructure, flows cannot be managed through traditional one-to-one, batched data pipelines. A many-to-many system based on the publish-subscribe model becomes necessary.

For organizations that rely on a continuous stream of data to function, Confluent provides the underlying data infrastructure. Use cases extend beyond obvious applications like in financial markets and AdTech. As enterprises realize the benefits of optimizing their business operations and performance through data mining and machine learning, then broader and faster distribution of data in near real-time yields incremental benefits. Confluent is leveraged by leaders in multiple industries to address an expanding set of use cases like fraud analysis, transportation/logistics, healthcare, supply chain management and telecommunications.

At a high level, the investment thesis for Confluent’s growth revolves around the reality that Kafka usage is already widespread among enterprises. If you agree that data represents a critical input to competitive advantage in AI models, then Confluent occupies an important component of a modern data infrastructure. During their Investor Session at the annual Current conference in October 2022, leadership shared that more than 100k organizations worldwide currently use Apache Kafka. This includes over 75% of the Fortune 500.

Of the Fortune 500, most of the largest companies in each industry utilize Kafka for data streaming. Kafka was developed at LinkedIn and open sourced in 2011. It graduated from the Apache incubator program in October 2012 and has been broadly available since then, allowing it to grow into the most popular data streaming platform. As a data platform, Kafka is sticky. Once an enterprise builds data infrastructure on top of it, the cost of switching to another solution is high. This explains the broad adoption. Additionally, its timing was favorable, as the decade of 2010 was when most enterprises updated their data infrastructure to enable this type of distribution for new Internet and mobile-delivered applications.

Since Kafka usage is prolific, Confluent’s challenge and opportunity is to convert this installed base to the commercial version of Kafka that they provide. To justify the upgrade, the Confluent team has built additional, proprietary functionality around the core Kafka distribution. These address tricky problems in scalability, security, management and ease of integration with data producers and consumers.

These capabilities are only available to customers who subscribe to Confluent’s distribution. If the customer is self-hosted, they download and install the enhanced Confluent package onto their servers. If using Confluent Cloud, management is seamless as the Confluent team handles all operations for the customer. Of all open source software infrastructure options, I can tell you that Kafka is one of the most difficult to manage, requiring dedicated engineers well-versed in the nuances of running large data infrastructure at scale.

Besides Confluent, the hyperscalers offer hosted Kafka services. AWS has Amazon Managed Streaming for Apache Kafka (MSK), which is a fully managed, highly available Kafka implementation. AWS offers a separate proprietary product called Kinesis, that is a data streaming processing engine. This is not based on Kafka, but leverages many of the same concepts around the publish-subscribe pattern. Azure and GCP offer data streaming solutions, but also have strong partnerships with Confluent for managed Kafka.

Amongst the hyperscalers, Confluent Cloud distinguishes itself by offering more add-on services and broader availability than any of the cloud providers’ solutions. The hyperscaler products are usually just cloud-hosted versions of Kafka with a few extensions and integrations into their internal services. For customers with a multi-cloud deployment, being able to access the same Kafka installation across different cloud providers and on-premise systems represents a big advantage. This cross-cloud capability would be limited for any of the individual hyperscaler offerings, as the breadth of working connectors would always be lacking (even if they claim otherwise).

To attract these Kafka users to the Confluent offering, the product team emphasizes three categories of advantage:

  • Cloud Native. Kafka has been redesigned to run in the cloud. The Confluent Cloud solution comes with tools for DevOps, serverless configuration, unlimited storage capacity and high availability.
  • Complete. Confluent has added features in a number of areas that provide supplemental capabilities not available in open source Kafka. These are designed to appeal to developers, allowing them to reliably and securely build next generation applications more quickly. These include over 120 connectors to other popular systems, stream governance, enterprise security, monitoring, ksqlDB and stream processing. As mentioned, Apache Flink will soon be generally available to further enhance stream processing capabilities in the managed service.
  • Everywhere. Confluent Cloud is available on AWS, Azure and GCP. Confluent Platform is also available for customers that self-host their infrastructure. Cluster linking provides a bridge to connect instances across different cloud providers or to physical data centers. For any large enterprise with infrastructure that spans multiple cloud providers and their own self-hosted systems, Confluent provides the broadest reach of integrations. It would be risky for an organization to rely just on a hyperscaler solution for data streaming in the event they require working connectors to other cloud services.

For Confluent Cloud, the advantage to customers is that all aspects of managing the Kafka instance and adjacent services are handled by Confluent. With self-managed installations or Kafka cloud hosting from other providers, the IT organization is left to manage most or some of the functions in the Kafka installation.

With Confluent Cloud, the customer outsources all aspects of managing the Kafka instance to the Confluent team. These include partitioning, scaling, software patches and monitoring. Provisioning, configuring and running a Kafka cluster at high data volumes can be very complex. Having the engineering team that wrote the software managing one’s infrastructure is a big help. Resources that were dedicated to running the Kafka installation can be redeployed onto projects that create real competitive advantage for enterprises.

Due to this inherent advantage, the hyperscalers have been increasingly partnering with Confluent. AWS signed an expanded agreement in January 2022. Azure soon followed in April 2022. GCP and Confluent have a long-standing relationship. As with other independent software companies, the hyperscalers are discovering the advantages of partnering with these providers versus trying to compete with them. For Confluent, they enjoy the benefit of healthy and productive relationships with all three major hyperscalers (versus gravitating towards one or two).

To calculate the market opportunity, the Confluent team used a bottoms-up approach. They looked at their three customer segments, calculated the target population for each and then assigned an average ARR target. Admittedly, their budget estimates are on the high side, but they do have existing customers with these spending levels.

This exercise generated a $60B market opportunity in 2022. Using Gartner market data, they further predict a CAGR of 19% from 2022 to 2025. This yields a target addressable market of $100B by 2025, of which Confluent is currently less than 1% penetrated.

To reach these spending levels, Confluent employs a typical “land and expand” motion. Customers start by experimenting with a single data flow. They proceed to add more data flows that are disconnected. Often, the initial cases are non-critical or can tolerate some latency. Over time, expectations for reliability and speed increase. Eventually, Confluent is handling all real-time data flows and becomes the high availability backbone for the organization’s data operations.

Recent Product Announcements

In mid-May, Confluent made a couple of significant product announcements. The most exciting development for investors is an early access program for Confluent’s new managed service for Apache Flink. This product offering makes advanced stream processing capabilities available over Confluent’s core data streaming platform. Stream processing enables users to filter, join and aggregate data in real-time, while it is being streamed. It allows downstream data consumers to make use of the data immediately, versus needing to perform these data processing functions first. For high throughput and low latency data distribution systems, in-flight processing is necessary to scale.

These capabilities were gained by Confluent through their acquisition of Immerok in January. Confluent leadership estimates that the revenue contribution for stream processing could eventually reach that of their core data streaming products. Leadership had previously indicated that a managed Flink service would be available by end of 2023. This early access program represents the first step towards the GA product, allowing select Confluent Cloud customers to try the service and help shape the product roadmap. Given the potential to generate a new revenue stream in 2024, I am happy to see Confluent take this first step.

Other new product announcements include:

  • Data Quality Rules. As organizations scale their use of real-time data streaming, it is critical that data contracts between upstream and downstream components can be enforced. Data quality rules provide a method to ensure data compatibility and integrity. Part of Confluent’s Stream Governance module, data quality rules support validation of individual fields, mechanisms to resolve incompatibilities and migration rules for transformation of message formats.
  • Custom Connectors. Some organizations maintain their own custom applications and systems that they want to connect to data streaming. Previously, configuring and managing connectors from Kafka to these custom data sources created a lot of manual overhead for the infrastructure team. With Custom Connector support, Confluent has made this process more automated. An organization can upload their custom connector into Confluent Cloud, where it can be accessed and managed like any other connector. The team will also have access to logs and metrics in order to monitor the performance of their connector.
  • Stream Sharing. This capability provides organizations with tools to share data across teams internally and externally, leveraging the same data streaming platform. This use case makes a lot of sense for Confluent to address, as the data is already being formatted and streamed for real-time distribution. In order to facilitate data sharing use cases, Confluent improved controls for authenticated sharing, access management and layered encryption. They also extended data quality and compatibility rules to data sharing pipelines.
  • Kora Engine. The Confluent team has invested heavily to build their own processing engine for Apache Kafka, customized for the cloud. Called Kora, it supports multi-tenancy and serverless abstractions, decoupled networking-storage-compute layers, automated operations and high availability globally. With the Kora engine, Confluent Cloud customers can scale with 30x faster performance, leverage unlimited retention and protect data with a 99.99% SLA.

These capabilities should help organizations speed up their transition to real-time data streaming. In parallel to these announcements, Confluent released the results of a survey on data streaming trends. Conducted by a third-party, the survey queried 2,250 IT leaders who are using data streaming in their organizations. Respondents spanned seven countries and represented mid-size to large enterprises across all major industries. The goal was to understand how data streaming is being adopted, future plans and any obstacles to use.

Investors can read the full report without a registration required. Here are some of the primary take-aways that support the Confluent investment case:

  • Most Users of Data Streaming are Still Early on the Maturity Scale. For each of the 2,250 respondents who are using data streaming in some capacity, Confluent established 5 levels, indicating their level of maturity. The majority of users are categorized as Level 2 or 3, indicating that there is still ample expansion in these organizations as they mature their usage.

  • Prioritization of Data Streaming is High. Of respondents, 44% ranked data streaming as one of their top strategic priorities and another 45% described it as important.

  • Favorable ROI. As organizations establish more mature data streaming practices, they report higher levels of return on investment (ROI) from their efforts. Level 4 and 5 users experience an ROI of 5x or more on average. Even less mature organizations see significant ROI, with Level 2 and 3 organizations averaging 2-5x ROI.

While the survey is scoped to a controlled audience of data streaming users, I find the results encouraging. The primary take-away is that Confluent can use this data to demonstrate to new customers the value in expanding their usage up the maturity curve.

Increasing the maturity of customer adoption of data streaming correlates to revenue expansion. In the past, Confluent management has provided many examples of customers with significant elasticity in their spending levels. At their Current conference in October 2022, leadership highlighted several companies. These examples include multiples in spend of 8x to 31x over a relatively short period of time (5-6 years for the largest increases).

If we look at one example with the Fortune 50 Bank, Confluent leadership laid out the rapid expansion progression over 4 years to $10M in annual spend. They even think the customer has a line of sight to $20M in ARR. This customer illustrates the progression through the 5 levels of maturity in data streaming, where the organization moved from level 1 to level 4. During this period, their spend increased by a factor of 10 in the transition from level 2 to level 4, and are now a $10M+ customer.

Confluent leadership thinks this customer could reach $20M+ in spend as they progress to level 5 maturity. To have a single customer approaching $20M in annual spend is significant. That one customer would represent over 3% of total revenue for 2022. With 133 customers now above $1M+ spend and a doubling of $5M+ customers in FY2022, we can see why Confluent’s focus on growing their largest customers makes sense. Based on the survey data, Confluent sales can make a compelling case to the customer’s engineering leadership that they will gain a high return on their investment as they continue to expand use of Confluent solutions.

Q1 Earnings Results

Confluent reported Q1 earnings on May 3rd. The market liked the results, as the stock shot up 16.2% from its 3 month low under $20 to just over $23. For the week following earnings, CFLT stock stayed in the same post earnings range of $22-$23. Recently, it has surged above $29 and is near its 2023 high. CFLT’s 52 week peak is above $34 and the all-time high from November 2021 is $93.60.

I include the stock price chart, so that investors can see how far CFLT has dropped since its peak in 2021. Granted, several other software infrastructure stocks exhibit a similar pattern, but Confluent’s drop is more pronounced. It has traded in the same range for about a year now, yet continued to grow revenue and improve profitability over that period.

The big change since 2021 has been in the valuation multiple. At its peak price over $90, the P/S ratio approached 60. Currently, the stock’s price has dropped to about $29 (down nearly 70%) and the P/S ratio stands near 13 (80% drop). We could argue that the valuation is reasonable this point and further stock appreciation would be driven by the company’s growth going forward, versus further multiple compression.

Revenue

Confluent delivered $174.3M in total revenue for Q1, up 38.2% annually. Analysts were looking for $167.4M or 32.7% growth. The company had set guidance for a range of $166M – $168M coming out of the Q4 report. On a sequential basis, Q1 revenue was up $5.6M or 3.3% over the Q4 revenue result of 168.7M.

Going back a year to Q1 2022, Confluent delivered sequential revenue growth of $6.2M for a q/q growth rate of 5.2%. The transition from Q4 to Q1 typically results in some sequential deceleration due to seasonality. In Q1 2023, we saw about the same gross dollar amount of sequential growth, but 190 bps lower on a percentage basis.

Subscription revenue grew at a faster pace than total revenue, up 40.9% y/y to $160.6M. Subscriptions account for 92% of total revenue, with the other 8% made up by services. Services includes consulting work, implementation assistance and training. These are lower margin offerings necessary to help some customers with their transition to Confluent. Subscription revenue is more indicative of demand for Confluent’s core product offering and the higher growth rate here is a benefit.

RPO provides another performance indicator for future demand. In Q1, total RPO was $742.6M, up 34.7% y/y. Total RPO for Q1 increased slightly over Q4’s RPO of $740.7M, which was up 48.0% y/y. Current RPO, which represents those performance obligations that are expected to close in the next 12 months, provides a better story. It was $477M in Q1, representing about 64% of total RPO. This was up 44% y/y and accelerated from Q4’s $456.2M or 43% annual growth. Sequentially, current RPO grew by 4.6%, outpacing the sequential revenue and total RPO growth.

On the earnings call, management referred to the same levels of budget scrutiny and elongated sales cycles that they had highlighted in Q4. In March, they observed some pressure on consumption trends in the Cloud business during the second half of the month, primarily isolated to the financial vertical. This would line up with the failure of SVB earlier in the month. Management shared that demand recovered in April (and remained stable into May, based on a recent analyst conference). For forward guidance, though, management is being conservative and are not assuming any improvement in budget scrutiny trends or the overall macro environment for the remainder of the year.

Looking forward, they project revenue for Q2 in a range of $181M – $183M, which would represent growth of 30.5% annually and 4.4% sequentially or $7.7M over the Q1 actual at the midpoint. This beat the analyst target of $181.3M by $0.7M. While this represents a small beat, the 4.4% sequential raise over Q1 is pretty favorable. Confluent beat their Q1 guidance by $7.3M or 4.4%. This means that the next bar to reach Q3’s analyst revenue estimate for $196.4M is achievable.

For the full year, Confluent management reiterated their 2023 revenue target for $760M-$765M that had been set in Q4. They could have raised this by the $7M of outperformance to their Q1 revenue estimate. However, management chose to leave it in place, in order to be conservative for the remainder of the year.

"Our point of view on the full year remains unchanged from last quarter. The demand environment remains healthy, even though it’s a tough macro out there. So, we’re reaffirming our guide for the full year, growing revenue at 30%, and plan to achieve breakeven on a non-GAAP operating margin basis in Q4.

We did not flow through the amount of the overperformance we had in the top line this quarter to the full year guide, which is really a byproduct of the macro environment and factors I’ve called out before, which we’re trying to prudently take into consideration while formulating guidance. You asked for some puts and takes. We expect cloud to continue its growth momentum with the highest NRR and an increase in sequential revenue add every quarter for the remainder of the year. And then the CRPO growth that I pointed out before, it continues to be robust." Confluent Q1 2023 Earnings Call

Like some other software infrastructure providers based on an “open core” model (MDB, ESTC), Confluent initially generated all of its revenue from licensing of their packaged software product. Customers could host this on their infrastructure and manage the operations themselves (self-managed). Several years ago, similar to peers, they realized that customers would find value in a fully-managed cloud-hosted offering and launched Confluent Cloud. It is currently available on all three hyperscalers.

Confluent Cloud has demonstrated rapid growth, outpacing the company’s revenue performance and increasing its share of total revenue over time. In Q1, Confluent Cloud revenue grew by 89% y/y and now makes up 42% of total revenue. Confluent Cloud contributed 41% of revenue last quarter. A year ago, it accounted for 31% and just 18% two years ago. The majority of new revenue is being generated by Confluent Cloud, as it contributed greater than 50% of total new ACV bookings for the sixth consecutive quarter.

Sequentially, Confluent Cloud added $5.3M in revenue in Q1. Recall that total revenue was up $5.6M from Q4, implying that Cloud is responsible for the majority of sequential growth. On a percentage basis, Q1’s sequential growth isolated to Cloud was 7.7%. This also beat management’s expectation coming out of Q4 for $5.0M in sequential Cloud revenue. Compared to a year ago, Confluent demonstrated lower sequential growth in Q1 2022 as compared to Q4 2021, adding $5.1M in Cloud revenue. The Q1 2023 total sequential growth was higher than a year ago, but lower on a percentage basis.

Looking forward in 2023, management expects Confluent Cloud sequential revenue growth to increase each quarter, similar to the pattern exhibited in 2022. For Q2, they are estimating that Confluent Cloud sequential revenue growth is in the range of $7.5M – $8M, or 10.5% over the $73.6M delivered in Q1. This target was recently reiterated at the JP Morgan TMC Conference. That would bring Confluent Cloud to represent about 44.7% of total revenue. For the full year, management expects Cloud to reach a contribution of 48%-50% in Q4.

Another growth driver for Confluent is rapid expansion of the international market. In Q1, revenue from the U.S grew 32% y/y and now represents 60% of total revenue. This is down from 63% a year ago. Revenue from outside the U.S. great much faster at 49% y/y and contributes 40% of total revenue. This rate has increased from 37% a year ago. The faster growth internationally should help pull up overall revenue growth.

Additionally, improvement in FX (neutral to weaker U.S. Dollar) should provide a little tailwind for international growth. While Confluent does not perform a constant currency calculation (because it charges in U.S. dollars), international customers would experience higher prices as the Dollar appreciates. When the opposite effect happens, Confluent prices feel lower for international customers. Other software infrastructure providers have reported a similar effect (Datadog, Cloudflare, etc.) and anticipate a small tailwind as FX improves.

Profitability

As investors will recall, Confluent announced an 8% staff reduction on January 26th. While a layoff is generally a disruptive event, the company appeared to maintain its growth trajectory through Q1. This is actually a pretty bullish signal, as the period around the layoff would represent a distraction for most staff members. Management did indicate that quota carrying sales capacity and key areas of R&D were being preserved.

The objective positive financial benefit of Confluent’s 8% workforce reduction is a full year improvement in the path to profitability. Leadership expects to exit Q4 of 2023 with breakeven Non-GAAP operating margin. Previously, they had signaled that event would happen in 2024. Additionally, they expect FCF margin to follow a similar trajectory, in spite of the large step down in FCF margin in Q1, associated with the layoff.

Break-even in Q4 2023 would be a significant improvement from the -23% Non-GAAP operating margin delivered in Q1. This near 20% improvement in operating margin matches the y/y improvement relative to Q1 2022, which had -41% Non-GAAP operating margin. Guidance for Q2 2023 is for -16% Non-GAAP operating margin, representing an improvement of 700 bps from Q1.

Besides the adjustment to staffing levels, gross margin improvements are helping the drive towards profitability. In Q1, Confluent’s Non-GAAP gross margin increased to 72.2%, up 250 bps y/y. Isolated to just subscription revenue, Non-GAAP gross margin is higher at 77.5% in Q1, up 200 bps over the prior year. The difference is explained by the low gross margin of Services revenue.

As Confluent Cloud makes up a larger percentage of total revenue, it will pull down gross margin. At the same time, Confluent Cloud realizes volume discounts as it scales and the team is continually optimizing performance. Over the long term, Confluent management has targeted Non-GAAP gross margin in a range of 72%-75%, which is about 130 bps above current levels.

Moving down the income statement, Non-GAAP operating loss was $(40.3M) for an operating margin of -23.1% in Q1. This was down slightly from Q4’s -21.5%, but much improved from -41.0% in Q1 2022. The improvement year/year can be attributed to reductions in total OpEx as a percent of revenue. Contributions from R&D, S&M and G&A all decreased from Q1 2022. Most notable was the 1320 bps improvement in S&M as a percent of revenue.

Looking forward, management projects a Non-GAAP operating margin of -16% in Q2. This would be a 700 bps improvement over Q1. For the full year, they increased the operating margin target to a range of (14%) – (13%), which is up 1% from the (15%) – (14%) range set in Q4. They also reiterated the expectation that Q4 2023 reaches break-even Non-GAAP operating margin within the quarter.

Similarly, Non-GAAP Net Loss Per Share was $(0.09) in Q1, which beat the analyst target for $(0.14) by $0.05. Coming out of Q4, the company had set a range for $(0.15)-$(0.13). Looking forward to Q2, management expects net loss per share of $(0.08)-$(0.06). Analysts had been looking for $(0.10), representing a raise. For the full year, Confluent also increased their EPS target to a range of $(0.20) to $(0.14), which is a large increase from their Q4 guidance of $(0.28) to $(0.22).

Free cash flow margin didn’t show the same level of improvement in Q1, dropping 120 bps y/y to -47.5% from -46.3% a year ago. In Q4, FCF margin was -18.3%, having improved by almost 400 bps from -22.3% in Q4 2021. The large sequential drop in Q1 was expected, however, and discussed on the Q4 earnings call. FCF in Q1 was negatively impacted by charges related to the staff restructuring, the Immerock acquisition, corporate bonus payout and the employee stock purchase plan.

The good news with cash flow is that management expects FCF margin to follow the same path as operating margin. This means we can expect it to proceed towards break-even through 2023. Additionally, management provided mid-term and long-term profitability targets. In the near to mid-term, while revenue growth remains above 30%, management expects to continue investing for high growth. Non-GAAP operating margin should reach 5% and FCF margin can hit 10%. Over a longer period, management thinks operating margin can trend towards a range of 20%-25% with FCF margin exceeding 25%.

Customer Activity

Confluent’s customer activity showed momentum, but at slower levels than in the past, as sales cycles lengthened and finance teams scrutinized expansion deals. Total customer growth had been impacted early in 2022 by a change to the sign-up process for Confluent Cloud to remove the requirement to enter a credit card. Now, new customers can access Cloud services to prepare for a migration, before incurring charges. Following the change, the paying customer count froze between Q1 and Q2 2022 and then marginally increased in Q3. In Q4, customer growth jumped to 290 in the quarter, versus 120 in Q3.

In Q1 2023, Confluent added 160 total paying customers. This is better than the 120 added in Q3, but below the 290 new customers in Q4. On a sequential basis, total customer growth was 3.5% and 14% annually. This growth in total customers has slowed down substantially and needs to be watched. A company of Confluent’s size should be increasing total customers at a higher rate, more like 20-30% year/year. This is important to provide a funnel for growth into future large customers. It’s likely that new customers are delaying the transition from free to paid to defer costs. The new sign-up process for Cloud makes this easier to drag out, as payment starts with meaningful utilization.

Confluent defines a large customer as generating more than $100k in ARR. In Q1, the company increased their large customer count by 60 sequentially or 5.9% q/q, and 34% year/year. In Q4, Confluent added 70 customers of this size, with 83 in Q3 and 59 in Q2. In the transition from Q4 2021 to Q1 2022, Confluent added 69 large customers sequentially. This means that the large customer growth of 60 in the most recent quarter falls to the lower end of the historical range. These large customers contributed more than 85% of total revenue in the quarter.

Growth of $1M+ customers is proceeding at a faster rate, which is likely the source of Confluent’s sustained revenue growth. In Q1, they added 8 more $1M+ ARR customers, growing 6.3% sequentially and 53% annually. The prior three quarters in 2022 registered high sequential growth in the low teens as well, while the transition from Q4 2021 to Q1 2022 generated no new $1M+ customers. Taking the annual cycle into account, the increase in Q1 2023 was healthy.

Management has also anecdotally pointed to strong growth in $5M+ and $10M+ customers. I think these mega customers are driving a large portion of the overall revenue growth. I like that Confluent is demonstrating high elasticity of spend to push to these levels. Their 2023 revenue target of $760M becomes easier to reach with growth in $5M – $10M customers. Their impact on total revenue can be 50-100x greater than $100k+ ARR customers.

Confluent measures the increase in spend from existing customers as their Dollar-based Net Retention Rate (NRR). This metric makes clear the source of Confluent’s continued revenue growth, in spite of the slowdown in total customer activity. Starting in 2023, Confluent management introduced a new definition for NRR. While the motivation for changing the calculation of a key financial metric is questionable, in this case, I think it makes sense.

In the past, Confluent measured NRR expansion based on the contracted spend amount. This was applied to both Confluent Platform (license subscription) and Confluent Cloud (contracted commitment). For Platform, this is fine. For Cloud, though, revenue is realized based on customer consumption. The change this year is to utilize actual consumption on Confluent Cloud for the NRR calculation. This represents a more realistic measure of customer activity, as it is directly tied to revenue. It also aligns with the methodology used by peers for NRR, including Snowflake, Datadog and MongoDB.

For Q1, total NRR (the combination of Confluent Platform and Cloud) was above 130%. In their Investor Presentation, Confluent provided historical data for both the old and new method of NRR calculation. Under the new method, NRR was over 130% for the past year (they don’t provide exact values). Using the old method, NRR was over 130% for most of 2022, dropping to “just under” 130% in Q4 and then “just under” 125% in Q1.

The CFO discussed this on the earnings call. The difference can be explained by the fact that some customers are exceeding their committed contract spend on Confluent Cloud in their actual consumption of the service. This likely reflects their hesitancy to commit to large contracts in the current environment, but also the reality around the need to continue heavy use of Confluent Cloud resources.

We will want to watch the trajectory of this new NRR measure closely. It could take one of two paths. Either customers start limiting their consumption to match what they had contracted (which would lower NRR) or they acknowledge the consumption and increase their contracts (which would maintain NRR). In either case, the current level of NRR over 130% is pretty robust in this spend environment.

Confluent also shares that NRR isolated to just the Cloud product is even greater than the 130% for total NRR. This would be expected given Cloud’s higher revenue growth rate. As Cloud makes up a greater portion of total revenue, this higher NRR for just Cloud may help prop up overall growth rates.

Finally, Confluent management shared that gross retention was over 90%. Where NRR accounts for all changes in customer spend (expansion, contraction, attrition), gross retention removes expansion, reflecting the year/year impact of attrition and contraction of spend. In this context, gross retention over 90% is pretty good and also highlights that just expansion would be much greater than 130%.

Combining the NRR metrics with those around customer growth, it becomes clear that Confluent is driving a lot of their revenue from expansion of existing large customers. This is the ideal case, as helping existing customers expand their use of Confluent Cloud is easier than hunting for new customers. At the same time, this growth from existing customers will eventually taper off. I would like to see Confluent increase the rate of new customer growth. For comparison, both MongoDB and Datadog have maintained total customer growth above 20% annually in the most recent quarter.

Investment Plan

Confluent is pursuing a large addressable market that is anchored in the broad adoption of Apache Kafka across enterprises. The expected return on investment from improved data infrastructure is bolstered by the case for enterprises to leverage their proprietary data sets to create new AI models. These make an argument for raising the priority of data streaming projects in C-level punch lists, as evidenced by Confluent’s recent survey data.

As Confluent continues to add more capabilities to their packaged solution around core Apache Kafka, organizations will feel an increased pull to migrate from self-managed Kafka to Confluent’s offering. Confluent Cloud makes this TCO argument more compelling, as the Confluent team adds infrastructure management to the mix. Enterprises often need to maintain their own sizable teams of operations engineers to configure and manage their large Kafka installations. This can all be outsourced to Confluent, allowing those resources to be shifted to real value creation.

The addition of stream processing capabilities through a managed service for Flink will increase Confluent’s market opportunity. Management estimates that stream processing revenue could eventually approach that of core data streaming. Confluent plans to have a generally available Flink managed service ready by end of 2023 and recently announced an early access program for select customers.

In Q1, Confluent delivered favorable results in spite of the same demand pressures highlighted by other software infrastructure providers. Revenue growth exceeded targets in Q1 and the company maintained their projections for the remainder of the year. Profitability showed a nice improvement with beats on EPS and significant improvement expected in operating and FCF margin through the remainder of the year. Q4 is still expected to reach operating margin break-even, with FCF margin following the same trajectory. The long-term financial model was updated to include a projection for FCF margin to reach 25% at scale.

Customer activity in Q1 was mixed. Total customer additions are lagging and need to increase into the 20% range to keep the pipeline full. Customer spend expansion, on the other hand, is healthy with NRR exceeding 130%. Granted, Confluent realized a bump in NRR by shifting measurement to a consumption basis, but this is how peers measure spend expansion. Isolated to Cloud, NRR is even higher than the 130% threshold.

As I am wrapping up this post, Snowflake just reported their latest results. While they beat on Q1 product revenue targets by about 500 bps of annual growth, they once again lowered their full year product revenue target from 39.5% annual growth to 34%. They attributed the cause to continued impact from customer optimization. Specifically, the CFO called out a slowdown in consumption starting in early April (Easter on April 9th) through May. Some of their largest customers also reduced data retention from 5 years to 3 years.

Providing a check on this trend, Confluent’s CEO was interviewed at the JP Morgan Global Technology, Media and Communications conference the day before on May 23rd. The analyst asked about customer demand trends in May. He pointed out that in Confluent’s Q1 report on May 3rd, they indicated a dip in consumption trends in March, but that had since recovered in April. The CEO replied that they “feel pretty good about what (guidance they) gave in the earnings call.” He further reiterated the sequential Confluent Cloud revenue growth target of between $7.5M to $8.0M.

All of this provides a positive backdrop for Confluent going forward. Since earnings, the stock has appreciated from just below $20 to reach its ATH for 2023 of nearly $30 recently, approaching a 50% gain. Relative to peers growing in the 30% range with improving profitability, CFLT appears reasonably priced. They enjoy a favorable competitive position in a growing addressable market, with AI activity likely providing another tailwind.

Since the last earnings report, I have increased my allocation to CFLT slightly and will continue to monitor progress through the year. Given that CFLT is still far below their ATH of $93.60 and the demand backdrop (in spite of macro) remains strong, I will likely keep gradually increasing the portion of my portfolio that CFLT occupies over time.
post mediapost mediapost mediapost media
jpmorgan.metameetings.net
MetaMeetings v2

Watchlist
Something went wrong while loading your statistics.
Please try again later.
Already have an account?