By: Cameron Tierney, Research Analyst

@cameron_tierney


Over the last decade, Amazon, Google, and Microsoft blazed a trail as hyperscale cloud service providers, resulting in enormous growth—and stock price outperformance. Of course, the share price appreciation of the big three U.S.-based hyperscalers is not solely due to cloud outperformance, but it’s undeniably been a large factor in their overall success.

And yet—times have changed.

Sign up below to receive The Nightcrawler, a weekly investing newsletter by our Director of Research, Eric Markowitz.

    Cloud revenue growth is now showing signs of maturing and slowing: AWS’s 2022 revenue growth dipped below 30% for the first time since Amazon started breaking out the unit in 2013, and in 2023 it slowed to 13%. Microsoft doesn’t cleanly isolate Azure’s revenue each quarter, but its Intelligence Cloud segment (which includes Azure) posted declining growth over the last several quarters. Google’s Cloud Platform segment exhibits a similar trend. Particularly since the middle of 2022, the growth in these hyperscale public clouds fell off.

    Source: Bloomberg

    There are plenty of reasons for the trend depicted above. Beyond the “law of large numbers” to naturally compress growth rates over time as businesses scale, the stalling growth in mid-2022 coincided with the sharpest interest rate increases at the Federal Reserve in decades. Companies pulled back on spending – shuttering some projects and optimizing other workloads to be less resource-intensive. Executives at all three large hyperscalers have said recently on earnings calls that customers are continuing to optimize their cloud spending. In other words, cost cutting continues for many companies.

    These trends beg a worthwhile question for long-term investors: Is cloud growth dying? And if so, what does that mean for the future earnings potential of these businesses? And will the future returns of hyperscalers underperform passive market benchmarks?

    We can’t predict the future with certainty, but our analysis suggests cloud growth is far from dead. We see a few key factors driving cloud profits for years:

    1. The AI gold rush is real, and hyperscaler clouds are like Levi Strauss or shovel salesmen in the gold rush – they stand to benefit just as much as the gold prospectors.
    2. A combination of open software tools, a renaissance in chip design, and overall data center optimization will lower the cost structure of hyperscalers, improving profitability over time. 

    The AI sandbox needs sand

    Despite the idea of artificial intelligence floating around Silicon Valley for decades, it feels like we’re just beginning to crack what’s possible with AI applications. Generative AI apps like ChatGPT and Midjourney are likely the tip of the iceberg compared to what will be commonplace a decade from now. 2024 in AI is like 1994 on the Internet.

    No one can say for sure how AI apps will develop in the future, but we do know AI will require new approaches to computing.

    For instance, code no longer runs on one computer (or CPU) like traditional applications. Instead, AI training is distributed simultaneously across thousands of GPUs—or other specialized chips. This shift requires new hardware, new software to optimize those new chips, new networking and memory solutions, and much more.

    It’s total a rearchitecting of the data center.

    Yet, the developer core of AI requires a simple value proposition: developers need user-friendly access to cutting-edge hardware at competitive prices. We believe hyperscalers are best positioned to deliver on this value proposition.

    The most advanced AI models are trained on petabytes of data and require petaflops (or even exaflops!) of processing power to train. Model inference is more lightweight, but it is still high-performance computing. All that high-performance computing requires a complex web of computing hardware – or infrastructure.

    Think of infrastructure as the most fundamental level of cloud services: a customer rents access to server resources (storage, processing power, and networking) and can ratchet its use up or down as needed. The next level up is platform services, where cloud providers offer software tools on top of their infrastructure to develop, test, and host apps. This serves to reduce the overhead of infrastructure management and make it more user-friendly for developers. The additional value to customers also means higher margins for cloud providers on platform services, as is well-documented by industry observers and analysts.

    According to the hyperscalers, AI platform services like AWS Sagemaker, Microsoft’s Azure ML, and GCP’s Vertex AI have garnered tens of thousands of customers. They enable AI engineers to build their models faster and with less overhead than bare metal infrastructure instances.

    Above is an illustration from a Google ML research paper from 2015. It depicts the overhead AI models require, in addition to the machine learning code itself. This is why AI platform services are so popular. Instead of using these services, an AI developer could rent or buy raw GPUs, but they’d be responsible for properly linking the storage components and optimizing them.

    At the petabyte scale, that is extremely unruly. It also requires specialized talent. So, we anticipate that AI platform services will enable the next generation of AI applications for the bulk of cloud customers.

    To that end, the big three hyperscalers already offer a multitude of AI platform services. However, even the third-party cloud services in this space—including MosaicML, DataRobot, C3.ai and others—offer their services on top of and/or to complement hyperscalers.

    So, no matter which AI platform service a developer chooses to use, the big three hyperscalers stand to benefit from a secular tailwind as more AI apps enter development.

    Adoption of Open-Source AI tools could improve Hyperscaler margins

    For the most advanced AI models to get trained and infer at the scale they do today, state-of-the-art hardware must be optimized by specialized software.

    In 2015, Google open-sourced its proprietary AI software toolkit, TensorFlow. The company also designed a custom AI chip called a tensor processing unit, or TPU, best optimized by the TensorFlow toolkit. This was Google’s attempt at controlling the nascent AI hardware market at that time. The company continues to invest in its TPU program today, as its most powerful iteration was introduced in December, and many of Google’s first party service like YouTube heavily utilize swaths of TPUs every day.

    But, today Nvidia hardware is the most common for both AI training and inference. The company commands 95% market share in training according to New Street Research. Nvidia’s CUDA software toolkit (whose origins date back to 2007) has remained relevant as the best toolkit to optimize Nvidia GPUs for high-performance computing. For example, an AI engineer might primarily develop a model with Tensorflow, but some sections of code will still need to compile to CUDA to run efficiently on GPU hardware.

    Throughout our research, we’ve found this is especially true when optimizing distributed computing across several GPUs, which affects all modern AI models.

    For the layman, these software tools are like choosing different routes on a road trip: depending on your choice, your experience will differ vastly. For more road trips, people tend to select the smoothest, least congested route to get to their destination. Similarly, the marketplace for AI software tools is coalescing around the easiest-to-use library of tools.

    TensorFlow and CUDA are great at optimizing specific hardware – but they lack generalization and a slick UI for debugging, according to to some developers. This is why, increasingly, the most cutting-edge AI models use PyTorch, an open-source AI software library developed primarily by Meta. 10x the number of AI research papers cite PyTorch over TensorFlow, according to a prominent AI industry website.

    Our research has shown this gap is inverted and narrower in commercial applications, meaning TensorFlow is still more widely used in actual model deployment. However, as more research makes its way into deployments, that balance could shift in favor of PyTorch. 

    PyTorch vs. TensorFlow research citations over time

    AI software library preferences among developers are important because they can indicate which hardware will be most in demand. PyTorch in particular has leaned into flexibility. Within the PyTorch stack TorchDynamo, the Triton compiler, and PyTorch XLA make it easier for AI models to be deployed on various hardware options.

    Ultimately, the software library battle itself is less important than the overall trend – AI models are getting easier to optimize agnostic to which hardware one is using. This means the availability of high-end Nvidia chips remains a significant bottleneck to AI training today, but the future may feature a more diverse set of hardware options.

    The big three hyperscalers each have custom AI-training chip programs, and as Nvidia’s software stranglehold on the industry loosens, more AI workloads are poised to shift to custom hyperscaler silicon. As Amazon CEO Andy Jassy put it in his May 2023 annual shareholder letter, AWS’s vertically integrated AI chip Trainum is 140% faster and 70% lower cost than comparable GPU-based instances. In other words, the value proposition is there for customers. I already mentioned how Google is doubling down on its custom AI chips – TPUs. Microsoft is slower to the custom AI chip game, but reports indicate its entry into the arena may not be far off.

    Table compiled by The Information detailing the hyperscaler’s custom AI and server chips

    For the hyperscalers, the less they must rely on third-party chip suppliers like Nvidia, the better for their cost structures.

    If the trends in open-source AI software are anything to go by, we believe more AI cloud customers will opt for vertically integrated hyperscaler AI chips, improving their operating margins as AI apps proliferate.

    Electrons and money: How Hyperscalers optimize server farms

    Data centers are massive constructs meant to precisely channel electrons through semiconductor hardware.

    The cost structure of a data center is primarily influenced by 1.) the price of electricity, and 2.) how well-optimized and efficient the data center is architected to do useful calculations.

    In many mature cloud product categories (think infrastructure services and mature platform services), product differentiation across the major vendors is marginal. In these categories, hyperscalers primarily compete on cost.

    Of course, economies of scale are in play in the electricity market: prices for a single-family household in the U.S. is higher than for an industrial-scale electricity user in the same area, according to the EIA. CSPs are industrial-scale users of electricity compared to individual on-premise data centers, and thus have a lower energy cost structure.

    AWS and the other hyperscalers are also investing in renewable energy projects to power their data centers. The cost of electricity as sourced from renewables continues to come down and it does not rely on an extractive input subject to market volatility (natural gas, coal, etc.).  

    According to a BloombergNEF report, Amazon and Microsoft were two of the largest commercial buyers of renewable energy in 2021. Google was on the list as well, albeit quite a bit lower in overall power consumption. As hyperscalers seek access to cheaper and cheaper electricity, the value proposition of the cloud strengthens over on-premise data centers, which are hindered by less scale and higher energy costs.

    The second major ingredient in the operating cost structure of a data center is energy efficiency, or how much output a unit of electricity produces. Hyperscalers invest many resources in data center engineering efforts to optimize for large-scale virtualized computing. Companies without deca-billion-dollar cloud businesses to support lack the incentive and expertise to invest in the same way. Hyperscalers build highly specialized and talented teams to engineer top-performing data center architectures.

    This differentiated investment in data center engineering goes all the way down to the individual chip level. AWS offers customers its in-house Graviton ARM-based CPUs that are 40% more energy efficient than Intel x86 CPUs, according to Amazon CEO Andy Jassy on the company’s Q2 2023 earnings call. There are programs like this across Microsoft and Google as well, and they will likely continue to innovate in these areas to improve their offerings.

    AWS also rearchitected its data centers to integrate a custom ‘Nitro’ networking chip. Instead of reserving a portion of server CPUs for networking compute, Amazon’s data center engineers realized it is more efficient to place a discrete networking chip between the CPUs available for rent. This change resulted in more server space dedicated to inventory and a fundamentally better cost structure for Amazon. The hyperscalers also offer in-house AI chips, as I discussed a bit in the previous section.

    General purpose virtualized CPU instance costs have fallen over time. Compare this to GPU instances which are more nascent. It is imperative to cost-optimize in more mature cloud categories in order to remain competitive in the marketplace. Source

    Hyperscalers have an advantage not only in the procurement of cheap electricity, but also in the procurement of large-scale semiconductor volumes. According to our analysis, whether it is homegrown silicon or off-the-shelf processors, being one of the largest consumers of chips in the world is beneficial in pricing negotiations, or in the case of GPU shortages: getting ahead in line.

    In our view, hyperscalers are well-positioned to continue driving down the costs of virtualized computing – across several vectors. Trends in renewable energy, investments in custom silicon and data center architectures culminate in an improved cost structure for the public cloud. We think this will perpetuate improving operating margins over time.

    Data from historical AMZN SEC filings

    Final Thoughts

    At a high level, we think the public cloud still has plenty of room to grow earnings over the next decade—thanks to its improving cost structure and the AI platform shift.

    Web companies that came of age in the last decade—like Netflix, Uber and Airbnb—all benefited from the cost improvements and proliferation of the public cloud.

    All-star customers like these have a symbiotic relationship with hyperscalers – as they grow, so does the cloud’s business. If migrations are slowing and existing customers are optimizing their spend, organic growth from novel applications may be required for cloud to continue its blitzing growth.

    Today, it looks like the AI platform shift in computing will create a boom for new applications to blossom, and we think hyperscalers are likely best positioned to capture that growth in the coming decade.


    **Disclosures:**
    This has been prepared for information purposes only. This information is confidential and for the use of the intended recipients only. It may not be reproduced, redistributed, or copied in whole or in part for any purpose without the prior written consent of Nightview Capital.
    The opinions expressed herein are those of Arne Alsin and Nightview Capital and are subject to change without notice. The opinions referenced are as of the date of publication, may be modified due to changes in the market or economic conditions, and may not necessarily come to pass. Forward looking statements cannot be guaranteed. This is not an offer to sell, or a solicitation of an offer to purchase any fund managed by Nightview Capital. This is not a recommendation to buy, sell, or hold any particular security. There is no assurance that any securities discussed herein will remain in an account’s portfolio at the time you receive this report or that securities sold have not been repurchased. It should not be assumed that any of the securities transactions, holdings or sectors discussed were or will be profitable, or that the investment recommendations or decisions Nightview Capital makes in the future will be profitable or equal the performance of the securities discussed herein. There is no assurance that any securities, sectors or industries discussed herein will be included in or excluded from an account’s portfolio. Nightview Capital reserves the right to modify its current investment strategies and techniques based on changing market dynamics or client needs. Recommendations made in the last 12 months are available upon request.
    Nightview Capital Management, LLC (Nightview Capital) is an independent investment adviser registered under the Investment Advisers Act of 1940, as amended. Registration does not imply a certain level of skill or training. More information about Nightview Capital including our investment strategies and objectives can be found in our ADV Part 2, which is available upon request. WRC-20-09